Whisper ASR#

Whisper ASR checkpoints can be started through the OpenAI-compatible /v1/audio/transcriptions endpoint, but this path is experimental in the current SGLang-Omni tree. Prefer Qwen3-ASR for validated ASR serving.

Prerequisites#

Install sglang-omni by following Installation, then download a Whisper checkpoint:

hf download openai/whisper-large-v3

Server Configuration#

Whisper ASR runs a single ASR stage on one GPU.

sgl-omni serve \
  --model-path openai/whisper-large-v3 \
  --port 8000

Transcribe Audio#

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F model=openai/whisper-large-v3 \
  -F file=@tests/data/query_to_cars.wav \
  -F response_format=json

import requests

with open("tests/data/query_to_cars.wav", "rb") as f:
    resp = requests.post(
        "http://localhost:8000/v1/audio/transcriptions",
        data={
            "model": "openai/whisper-large-v3",
            "response_format": "json",
        },
        files={"file": ("query_to_cars.wav", f, "audio/wav")},
        timeout=300,
    )

resp.raise_for_status()
print(resp.json()["text"])

Request Parameters#

Parameter	Type	Default	Description
`file`	file	required	Audio file uploaded as multipart form data
`model`	string	server default	Model identifier
`language`	string	unset	Optional language hint
`response_format`	string	`json`	Use `json` for the current Whisper path
`temperature`	float	`0.0`	Sampling temperature; defaults to greedy decoding

The request builder also supports task (transcribe by default) and max_new_tokens, but the public transcription endpoint currently exposes only the fields above. The route uses the ASR stage default unless the pipeline is configured another way. For smoke tests, keep the request minimal and use response_format=json.

Known Limitations#

This path is experimental and not yet correctness-validated. Prefer Qwen3-ASR for validated ASR serving.
Keep Whisper ASR at encoder batch size 1.
Use response_format=json; other response formats are not validated for this experimental path.
First startup can take several minutes.
The endpoint accepts one uploaded file per request.
Audio is resampled to 16 kHz before transcription.
prompt is accepted by the HTTP endpoint for OpenAI compatibility, but Whisper ASR currently does not pass it into decoding.

Whisper ASR

Contents