Performance#
This section covers the main performance levers for SGLang Diffusion: attention backends, caching acceleration, and profiling.
Overview#
Optimization |
Type |
Description |
|---|---|---|
Cache-DiT |
Caching |
Block-level caching with DBCache, TaylorSeer, and SCM |
TeaCache |
Caching |
Timestep-level caching based on temporal similarity |
Attention Backends |
Kernel |
Optimized attention implementations (FlashAttention, SageAttention, etc.) |
Profiling |
Diagnostics |
PyTorch Profiler and Nsight Systems guidance |
Start Here#
Use Attention Backends to choose the best backend for your model and hardware.
Use Deployment Cookbook to choose CPU offload, FSDP, CFG parallelism, SP, and TP.
Use Caching Acceleration to reduce denoising cost with Cache-DiT or TeaCache.
Use Profiling when you need to diagnose a bottleneck rather than guess.
Caching at a Glance#
Current Baseline Snapshot#
For Ring SP benchmark details, see: