Performance

Contents

Performance#

This section covers the main performance levers for SGLang Diffusion: attention backends, caching acceleration, and profiling.

Overview#

Optimization	Type	Description
Cache-DiT	Caching	Block-level caching with DBCache, TaylorSeer, and SCM
TeaCache	Caching	Timestep-level caching based on temporal similarity
Attention Backends	Kernel	Optimized attention implementations (FlashAttention, SageAttention, etc.)
Profiling	Diagnostics	PyTorch Profiler and Nsight Systems guidance

Start Here#

Use Attention Backends to choose the best backend for your model and hardware.
Use Deployment Cookbook to choose CPU offload, FSDP, CFG parallelism, SP, and TP.
Use Caching Acceleration to reduce denoising cost with Cache-DiT or TeaCache.
Use Profiling when you need to diagnose a bottleneck rather than guess.

Caching at a Glance#

Cache-DiT is block-level caching for diffusers pipelines and higher speedup-oriented tuning.
TeaCache is timestep-level caching built into SGLang model families.

Current Baseline Snapshot#

For Ring SP benchmark details, see:

Ring SP Performance

References#