Architecture#

SGLang-Omni is the multi-stage runtime for omni models: models that accept mixed text, image, audio, and video inputs and may emit text, audio, or other modalities.

System Overview#

HTTP API -> Client -> Coordinator -> Stage -> Scheduler -> ModelRunner -> model forward

Layer

Duty

HTTP API

OpenAI-compatible request and response schemas, SSE framing, HTTP errors

Client

GenerateRequest to OmniRequest, result aggregation, audio encoding

Coordinator

Request lifecycle, entry-stage submission, terminal result collection, abort broadcast

Stage

Control-plane IO, relay IO, fan-in, stream routing, scheduler inbox/outbox bridging

Scheduler

Per-stage execution loop and failure propagation to stage outbox

ModelRunner

AR forward preparation, model forward dispatch, output extraction

Communication

Control-plane messages and relay data transfer between stages

TTS Integration

Checklist and lifecycle rules for adding TTS model families

Refer to the layer-specific document for specific design details.

Directory Layout#

sglang_omni/
|-- pipeline/       # Inter-stage orchestration, stages, coordinator, processes
|-- scheduling/     # Scheduler loops and inbox/outbox message types
|-- model_runner/   # Shared model runner abstractions for AR stages
|-- models/         # Model-specific configs, stages, request builders, modules
|-- config/         # PipelineConfig, StageConfig, config manager, topology
|-- relay/          # Data transfer backends
|-- serve/          # HTTP server and OpenAI-compatible API adapter
|-- client/         # Internal client used by API adapters
`-- proto/          # Request, payload, stage, and control-plane message types

Model Directory Convention#

Model-specific code should stay under sglang_omni/models/<model>/.

Recommended layout:

models/<model>/
|-- config.py             # PipelineConfig subclass and StageConfig list
|-- stages.py             # stage factories
|-- routing.py            # optional data-driven routing helpers
|-- request_builders.py   # inter-stage payload transforms
|-- payload_types.py      # typed model-specific payload state
|-- callbacks.py          # feedback callbacks or strategy, when needed
`-- components/           # model modules, processors, vocoders, adapters

Only model-local behavior belongs here. The framework-owned layers are still Stage, Coordinator, schedulers, model-runner bases, relay, runtime prep, and runners.