Architecture#
SGLang-Omni is the multi-stage runtime for omni models: models that accept mixed text, image, audio, and video inputs and may emit text, audio, or other modalities.
System Overview#
HTTP API -> Client -> Coordinator -> Stage -> Scheduler -> ModelRunner -> model forward
Layer |
Duty |
|---|---|
OpenAI-compatible request and response schemas, SSE framing, HTTP errors |
|
|
|
Request lifecycle, entry-stage submission, terminal result collection, abort broadcast |
|
Control-plane IO, relay IO, fan-in, stream routing, scheduler inbox/outbox bridging |
|
Per-stage execution loop and failure propagation to stage outbox |
|
AR forward preparation, model forward dispatch, output extraction |
|
Control-plane messages and relay data transfer between stages |
|
Checklist and lifecycle rules for adding TTS model families |
Refer to the layer-specific document for specific design details.
Directory Layout#
sglang_omni/
|-- pipeline/ # Inter-stage orchestration, stages, coordinator, processes
|-- scheduling/ # Scheduler loops and inbox/outbox message types
|-- model_runner/ # Shared model runner abstractions for AR stages
|-- models/ # Model-specific configs, stages, request builders, modules
|-- config/ # PipelineConfig, StageConfig, config manager, topology
|-- relay/ # Data transfer backends
|-- serve/ # HTTP server and OpenAI-compatible API adapter
|-- client/ # Internal client used by API adapters
`-- proto/ # Request, payload, stage, and control-plane message types
Model Directory Convention#
Model-specific code should stay under sglang_omni/models/<model>/.
Recommended layout:
models/<model>/
|-- config.py # PipelineConfig subclass and StageConfig list
|-- stages.py # stage factories
|-- routing.py # optional data-driven routing helpers
|-- request_builders.py # inter-stage payload transforms
|-- payload_types.py # typed model-specific payload state
|-- callbacks.py # feedback callbacks or strategy, when needed
`-- components/ # model modules, processors, vocoders, adapters
Only model-local behavior belongs here. The framework-owned layers are still
Stage, Coordinator, schedulers, model-runner bases, relay, runtime prep, and
runners.