Environment Variables#
Runtime#
Environment Variable |
Default |
Description |
|---|---|---|
|
|
Target device for inference ( |
|
not set |
Override attention backend via env var (e.g. |
|
not set |
Path to attention backend configuration file (JSON/YAML) |
|
false |
Enable per-stage timing logs |
|
false |
Enable dev-only HTTP endpoints for debugging |
|
not set |
Directory for torch profiler traces (absolute path). Enables profiling when set |
|
|
Root directory for cache files |
|
|
Root directory for configuration files |
|
|
Default logging level |
|
|
Multiprocess context for workers ( |
|
true |
Use Run:AI model streamer for model loading |
Platform-Specific#
Apple MPS#
Environment Variable |
Default |
Description |
|---|---|---|
|
not set |
Set to |
ROCm (AMD GPUs)#
Environment Variable |
Default |
Description |
|---|---|---|
|
false |
Use AITer GroupNorm in VAE for improved performance on ROCm |
|
false |
Enable MIOpen auto-tuning for VAE conv layers on ROCm |
Quantization#
Environment Variable |
Default |
Description |
|---|---|---|
|
not set |
FlashInfer FP4 GEMM backend for generic NVFP4 fallback |
Caching Acceleration#
These variables configure caching acceleration for Diffusion Transformer (DiT) models. SGLang supports multiple caching strategies - see caching documentation for an overview.
Cache-DiT Configuration#
See cache-dit documentation for detailed configuration.
Environment Variable |
Default |
Description |
|---|---|---|
|
false |
Enable Cache-DiT acceleration |
|
1 |
First N blocks to always compute |
|
0 |
Last N blocks to always compute |
|
4 |
Warmup steps before caching |
|
0.24 |
Residual difference threshold |
|
3 |
Max continuous cached steps |
|
false |
Enable TaylorSeer calibrator |
|
1 |
TaylorSeer order (1 or 2) |
|
none |
SCM preset (none/slow/medium/fast/ultra) |
|
dynamic |
SCM caching policy |
|
not set |
Custom SCM compute bins |
|
not set |
Custom SCM cache bins |
Cache-DiT Secondary Transformer#
For dual-transformer models (e.g., Wan2.2 with high/low-noise experts), these variables configure caching for the secondary transformer. Each falls back to its primary counterpart if not set.
Environment Variable |
Default |
Description |
|---|---|---|
|
(from primary) |
First N blocks to always compute |
|
(from primary) |
Last N blocks to always compute |
|
(from primary) |
Warmup steps before caching |
|
(from primary) |
Residual difference threshold |
|
(from primary) |
Max continuous cached steps |
|
(from primary) |
Enable TaylorSeer calibrator |
|
(from primary) |
TaylorSeer order (1 or 2) |
Cloud Storage#
These variables configure S3-compatible cloud storage for automatically uploading generated images and videos.
Environment Variable |
Default |
Description |
|---|---|---|
|
not set |
Set to |
|
not set |
The name of the S3 bucket |
|
not set |
Custom endpoint URL (for MinIO, OSS, etc.) |
|
us-east-1 |
AWS region name |
|
not set |
AWS Access Key ID |
|
not set |
AWS Secret Access Key |
CUDA Crash Debugging#
These variables enable kernel API logging and optional input/output dumps around diffusion CUDA kernel call boundaries. They are useful when tracking down CUDA crashes such as illegal memory access, device-side assert, or shape mismatches in custom kernels.
Environment Variable |
Default |
Description |
|---|---|---|
|
|
Controls crash-debug kernel API logging. |
|
|
Destination for crash-debug kernel API logs. Use |
|
|
Output directory for level-10 kernel API dumps. |
|
not set |
Comma-separated wildcard patterns for kernel API names to include in level-10 dumps. |
|
not set |
Comma-separated wildcard patterns for kernel API names to exclude from level-10 dumps. |