Compatibility Matrix#

The table below shows every supported model and the optimizations supported for them.

The symbols used have the following meanings:

  • ✅ = Full compatibility

  • ❌ = No compatibility

  • ⭕ = Does not apply to this model

Models x Optimization#

The HuggingFace Model ID can be passed directly to from_pretrained() methods, and sglang-diffusion will use the optimal default parameters when initializing and generating videos.

Video Generation Models#

Model Name

Hugging Face Model ID

Resolutions

TeaCache

Sliding Tile Attn

Sage Attn

Video Sparse Attention (VSA)

Sparse Linear Attention (SLA)

Sage Sparse Linear Attention (SageSLA)

Sparse Video Gen 2 (SVG2)

FastWan2.1 T2V 1.3B

FastVideo/FastWan2.1-T2V-1.3B-Diffusers

480p

FastWan2.2 TI2V 5B Full Attn

FastVideo/FastWan2.2-TI2V-5B-FullAttn-Diffusers

720p

Wan2.2 TI2V 5B

Wan-AI/Wan2.2-TI2V-5B-Diffusers

720p

Wan2.2 T2V A14B

Wan-AI/Wan2.2-T2V-A14B-Diffusers

480p
720p

Wan2.2 I2V A14B

Wan-AI/Wan2.2-I2V-A14B-Diffusers

480p
720p

HunyuanVideo

hunyuanvideo-community/HunyuanVideo

720×1280
544×960

FastHunyuan

FastVideo/FastHunyuan-diffusers

720×1280
544×960

Wan2.1 T2V 1.3B

Wan-AI/Wan2.1-T2V-1.3B-Diffusers

480p

Wan2.1 T2V 14B

Wan-AI/Wan2.1-T2V-14B-Diffusers

480p, 720p

Wan2.1 I2V 480P

Wan-AI/Wan2.1-I2V-14B-480P-Diffusers

480p

Wan2.1 I2V 720P

Wan-AI/Wan2.1-I2V-14B-720P-Diffusers

720p

TurboWan2.1 T2V 1.3B

IPostYellow/TurboWan2.1-T2V-1.3B-Diffusers

480p

TurboWan2.1 T2V 14B

IPostYellow/TurboWan2.1-T2V-14B-Diffusers

480p

TurboWan2.1 T2V 14B 720P

IPostYellow/TurboWan2.1-T2V-14B-720P-Diffusers

720p

TurboWan2.2 I2V A14B

IPostYellow/TurboWan2.2-I2V-A14B-Diffusers

720p

Wan2.1 Fun 1.3B InP

weizhou03/Wan2.1-Fun-1.3B-InP-Diffusers

480p

Helios Base

BestWishYsh/Helios-Base

720p

Helios Mid

BestWishYsh/Helios-Mid

720p

Helios Distilled

BestWishYsh/Helios-Distilled

720p

LTX-2 (one/two-stage/TI2V)

Lightricks/LTX-2

768×512
1536×1024

LTX-2.3 (one/two-stage/TI2V/HQ)

Lightricks/LTX-2.3

768×512
1536×1024
1920×1088 (HQ default)

Note:

  1. Wan2.2 TI2V 5B has some quality issues when performing I2V generation. We are working on fixing this issue.

  2. SageSLA is based on SpargeAttn. Install it first with pip install git+https://github.com/thu-ml/SpargeAttn.git --no-build-isolation

  3. LTX pipeline selection:

    • One-stage: --pipeline-class-name LTX2Pipeline

    • Two-stage: --pipeline-class-name LTX2TwoStagePipeline

    • Two-stage HQ: --pipeline-class-name LTX2TwoStageHQPipeline (HQ defaults to 1920×1088; you can still override --width/--height)

    • LTX-2 and LTX-2.3 support both T2V and TI2V (--image-path) on one-stage and two-stage pipelines (including HQ).

    • The spatial upsampler and distilled LoRA are auto-resolved from the model snapshot by default, and can still be overridden with --spatial-upsampler-path and --distilled-lora-path.

    • For LTX models, the Resolutions column uses output video width×height semantics, matching sglang generate --width ... --height ....

  4. LTX-2 / LTX-2.3 two-stage also supports --ltx2-two-stage-device-mode {original,snapshot,resident}:

    • snapshot is the default and recommended mode.

    • resident usually provides the best latency/throughput but uses much more VRAM.

    • original keeps official two-stage semantics without the premerged stage-2 transformer path.

    • Example (one prior run): original 154.67s, snapshot 114.05s, resident 75.71s; peak VRAM trend is original < snapshot < resident.

Image Generation Models#

Model Name

HuggingFace Model ID

FLUX.1-dev

black-forest-labs/FLUX.1-dev

FLUX.2-dev

black-forest-labs/FLUX.2-dev

FLUX.2-dev-NVFP4

black-forest-labs/FLUX.2-dev-NVFP4

FLUX.2-Klein-4B

black-forest-labs/FLUX.2-klein-4B

FLUX.2-Klein-9B

black-forest-labs/FLUX.2-klein-9B

Z-Image

Tongyi-MAI/Z-Image

Z-Image-Turbo

Tongyi-MAI/Z-Image-Turbo

GLM-Image

zai-org/GLM-Image

Qwen Image

Qwen/Qwen-Image

Qwen Image 2512

Qwen/Qwen-Image-2512

Qwen Image Edit

Qwen/Qwen-Image-Edit

Qwen Image Edit 2509

Qwen/Qwen-Image-Edit-2509

Qwen Image Edit 2511

Qwen/Qwen-Image-Edit-2511

Qwen Image Layered

Qwen/Qwen-Image-Layered

SD3 Medium

stabilityai/stable-diffusion-3-medium-diffusers

SD3.5 Medium

stabilityai/stable-diffusion-3.5-medium-diffusers

SD3.5 Large

stabilityai/stable-diffusion-3.5-large-diffusers

Hunyuan3D-2

tencent/Hunyuan3D-2

SANA 1.5 1.6B

Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusers

SANA 1.5 4.8B

Efficient-Large-Model/SANA1.5_4.8B_1024px_diffusers

SANA 1600M 1024px

Efficient-Large-Model/Sana_1600M_1024px_diffusers

SANA 600M 1024px

Efficient-Large-Model/Sana_600M_1024px_diffusers

SANA 1600M 512px

Efficient-Large-Model/Sana_1600M_512px_diffusers

SANA 600M 512px

Efficient-Large-Model/Sana_600M_512px_diffusers

FireRed-Image-Edit 1.0

FireRedTeam/FireRed-Image-Edit-1.0

FireRed-Image-Edit 1.1

FireRedTeam/FireRed-Image-Edit-1.1

ERNIE-Image

baidu/ERNIE-Image

ERNIE-Image-Turbo

baidu/ERNIE-Image-Turbo

Supported Components#

SGLang Diffusion supports overriding individual pipeline components with --<component>-path. The value can be either a Hugging Face repo ID or a local component directory.

The same overrides can also be provided in config files through component_paths.<component>.

Common Syntax#

CLI:

sglang generate \
  --model-path black-forest-labs/FLUX.2-dev \
  --vae-path black-forest-labs/FLUX.2-small-decoder \
  --transformer-path /models/flux2/transformer

Config file:

model_path: black-forest-labs/FLUX.2-dev
component_paths:
  vae: black-forest-labs/FLUX.2-small-decoder
  transformer: /models/flux2/transformer

Use the component name from the pipeline’s model_index.json or the native pipeline’s registered module name:

Component Type

Supported Keys

Notes

VAE

vae, video_vae, audio_vae

vae is the common image-generation override

Transformer / DiT

transformer, video_dit, audio_dit

transformer is the standard override for the main denoiser

Text / Preprocess

text_encoder, text_encoder_2, tokenizer, processor, image_processor

Replacement encoders often need matching preprocessing assets

Auxiliary

scheduler, spatial_upsampler, vocoder, connectors, dual_tower_bridge, image_encoder, vision_language_encoder

Only valid for pipelines that expose these components

Known Component Repos#

The table below lists concrete Hugging Face component repos that are already used in SGLang Diffusion docs or tests. It is not an exhaustive catalog of all compatible component repos.

Base Model

Override Key

Example Repo

Notes

black-forest-labs/FLUX.2-dev

vae

black-forest-labs/FLUX.2-small-decoder

Decoder-only FLUX.2 VAE override

black-forest-labs/FLUX.2-dev

vae

fal/FLUX.2-Tiny-AutoEncoder

Existing tested custom VAE path

VAE#

  • --vae-path is the common image-generation override.

  • --video-vae-path and --audio-vae-path are only relevant for pipelines with separate video or audio VAEs.

Transformer / DiT#

  • --transformer-path is the standard override for the main denoising transformer.

  • For quantized transformers, prefer --transformer-path or --transformer-weights-path; see quantization.md.

  • --video-dit-path and --audio-dit-path are only for pipelines that split denoisers by modality.

Text Encoders and Preprocessors#

  • --text-encoder-path and --text-encoder-2-path override primary and secondary text encoders.

  • --tokenizer-path, --processor-path, and --image-processor-path are useful when the replacement encoder requires matching preprocessing assets.

Auxiliary Components#

  • --scheduler-path is only relevant when the pipeline exposes a scheduler component.

  • --spatial-upsampler-path is mainly for two-stage pipelines such as LTX2TwoStagePipeline.

  • --vocoder-path, --connectors-path, --dual-tower-bridge-path, --image-encoder-path, and --vision-language-encoder-path are only valid for pipelines that expose those components.

Notes#

  1. Component overrides are only valid when the target pipeline actually uses that component.

  2. The override key should match the component name in the pipeline’s model_index.json or the native pipeline’s registered module name.

Verified LoRA Examples#

This section lists example LoRAs that have been explicitly tested and verified with each base model in the SGLang Diffusion pipeline.

Important: LoRAs that are not listed here are not necessarily incompatible. In practice, most standard LoRAs are expected to work, especially those following common Diffusers or SD-style conventions. The entries below simply reflect configurations that have been manually validated by the SGLang team.

Verified LoRAs by Base Model#

Base Model

Supported LoRAs

Wan2.2

lightx2v/Wan2.2-Distill-Loras
Cseti/wan2.2-14B-Arcane_Jinx-lora-v1

Wan2.1

lightx2v/Wan2.1-Distill-Loras

Z-Image-Turbo

tarn59/pixel_art_style_lora_z_image_turbo
wcde/Z-Image-Turbo-DeJPEG-Lora

Qwen-Image

lightx2v/Qwen-Image-Lightning
flymy-ai/qwen-image-realism-lora
prithivMLmods/Qwen-Image-HeadshotX
starsfriday/Qwen-Image-EVA-LoRA

Qwen-Image-Edit

ostris/qwen_image_edit_inpainting
lightx2v/Qwen-Image-Edit-2511-Lightning

Flux

dvyio/flux-lora-simple-illustration
XLabs-AI/flux-furry-lora
XLabs-AI/flux-RealismLora

Special requirements#

Sliding Tile Attention#

  • Currently, only Hopper GPUs (H100s) are supported.