Environment Variables#

Runtime#

Environment Variable

Default

Description

SGLANG_DIFFUSION_TARGET_DEVICE

cuda

Target device for inference (cuda, rocm, xpu, npu, musa, mps, cpu)

SGLANG_DIFFUSION_ATTENTION_BACKEND

not set

Override attention backend via env var (e.g. fa, torch_sdpa, sage_attn)

SGLANG_DIFFUSION_ATTENTION_CONFIG

not set

Path to attention backend configuration file (JSON/YAML)

SGLANG_DIFFUSION_STAGE_LOGGING

false

Enable per-stage timing logs

SGLANG_DIFFUSION_SERVER_DEV_MODE

false

Enable dev-only HTTP endpoints for debugging

SGLANG_DIFFUSION_TORCH_PROFILER_DIR

not set

Directory for torch profiler traces (absolute path). Enables profiling when set

SGLANG_DIFFUSION_CACHE_ROOT

~/.cache/sgl_diffusion

Root directory for cache files

SGLANG_DIFFUSION_CONFIG_ROOT

~/.config/sgl_diffusion

Root directory for configuration files

SGLANG_DIFFUSION_LOGGING_LEVEL

INFO

Default logging level

SGLANG_DIFFUSION_WORKER_MULTIPROC_METHOD

fork

Multiprocess context for workers (fork or spawn)

SGLANG_USE_RUNAI_MODEL_STREAMER

true

Use Run:AI model streamer for model loading

Platform-Specific#

Apple MPS#

Environment Variable

Default

Description

SGLANG_USE_MLX

not set

Set to 1 to enable MLX fused Metal kernels for norm ops on MPS

ROCm (AMD GPUs)#

Environment Variable

Default

Description

SGLANG_USE_ROCM_VAE

false

Use AITer GroupNorm in VAE for improved performance on ROCm

SGLANG_USE_ROCM_CUDNN_BENCHMARK

false

Enable MIOpen auto-tuning for VAE conv layers on ROCm

Quantization#

Environment Variable

Default

Description

SGLANG_DIFFUSION_FLASHINFER_FP4_GEMM_BACKEND

not set

FlashInfer FP4 GEMM backend for generic NVFP4 fallback

Caching Acceleration#

These variables configure caching acceleration for Diffusion Transformer (DiT) models. SGLang supports multiple caching strategies - see caching documentation for an overview.

Cache-DiT Configuration#

See cache-dit documentation for detailed configuration.

Environment Variable

Default

Description

SGLANG_CACHE_DIT_ENABLED

false

Enable Cache-DiT acceleration

SGLANG_CACHE_DIT_FN

1

First N blocks to always compute

SGLANG_CACHE_DIT_BN

0

Last N blocks to always compute

SGLANG_CACHE_DIT_WARMUP

4

Warmup steps before caching

SGLANG_CACHE_DIT_RDT

0.24

Residual difference threshold

SGLANG_CACHE_DIT_MC

3

Max continuous cached steps

SGLANG_CACHE_DIT_TAYLORSEER

false

Enable TaylorSeer calibrator

SGLANG_CACHE_DIT_TS_ORDER

1

TaylorSeer order (1 or 2)

SGLANG_CACHE_DIT_SCM_PRESET

none

SCM preset (none/slow/medium/fast/ultra)

SGLANG_CACHE_DIT_SCM_POLICY

dynamic

SCM caching policy

SGLANG_CACHE_DIT_SCM_COMPUTE_BINS

not set

Custom SCM compute bins

SGLANG_CACHE_DIT_SCM_CACHE_BINS

not set

Custom SCM cache bins

Cache-DiT Secondary Transformer#

For dual-transformer models (e.g., Wan2.2 with high/low-noise experts), these variables configure caching for the secondary transformer. Each falls back to its primary counterpart if not set.

Environment Variable

Default

Description

SGLANG_CACHE_DIT_SECONDARY_FN

(from primary)

First N blocks to always compute

SGLANG_CACHE_DIT_SECONDARY_BN

(from primary)

Last N blocks to always compute

SGLANG_CACHE_DIT_SECONDARY_WARMUP

(from primary)

Warmup steps before caching

SGLANG_CACHE_DIT_SECONDARY_RDT

(from primary)

Residual difference threshold

SGLANG_CACHE_DIT_SECONDARY_MC

(from primary)

Max continuous cached steps

SGLANG_CACHE_DIT_SECONDARY_TAYLORSEER

(from primary)

Enable TaylorSeer calibrator

SGLANG_CACHE_DIT_SECONDARY_TS_ORDER

(from primary)

TaylorSeer order (1 or 2)

Cloud Storage#

These variables configure S3-compatible cloud storage for automatically uploading generated images and videos.

Environment Variable

Default

Description

SGLANG_CLOUD_STORAGE_TYPE

not set

Set to s3 to enable cloud storage

SGLANG_S3_BUCKET_NAME

not set

The name of the S3 bucket

SGLANG_S3_ENDPOINT_URL

not set

Custom endpoint URL (for MinIO, OSS, etc.)

SGLANG_S3_REGION_NAME

us-east-1

AWS region name

SGLANG_S3_ACCESS_KEY_ID

not set

AWS Access Key ID

SGLANG_S3_SECRET_ACCESS_KEY

not set

AWS Secret Access Key

CUDA Crash Debugging#

These variables enable kernel API logging and optional input/output dumps around diffusion CUDA kernel call boundaries. They are useful when tracking down CUDA crashes such as illegal memory access, device-side assert, or shape mismatches in custom kernels.

Environment Variable

Default

Description

SGLANG_KERNEL_API_LOGLEVEL

0

Controls crash-debug kernel API logging. 1 logs API names, 3 logs tensor metadata, 5 adds tensor statistics, and 10 also writes dump snapshots.

SGLANG_KERNEL_API_LOGDEST

stdout

Destination for crash-debug kernel API logs. Use stdout, stderr, or a file path. %i is replaced with the process PID.

SGLANG_KERNEL_API_DUMP_DIR

sglang_kernel_api_dumps

Output directory for level-10 kernel API dumps. %i is replaced with the process PID.

SGLANG_KERNEL_API_DUMP_INCLUDE

not set

Comma-separated wildcard patterns for kernel API names to include in level-10 dumps.

SGLANG_KERNEL_API_DUMP_EXCLUDE

not set

Comma-separated wildcard patterns for kernel API names to exclude from level-10 dumps.