Whisper ASR#

Whisper ASR checkpoints can be started through the OpenAI-compatible /v1/audio/transcriptions endpoint, but this path is experimental in the current SGLang-Omni tree. Prefer Qwen3-ASR for validated ASR serving.

Prerequisites#

Install sglang-omni by following Installation, then download a Whisper checkpoint:

hf download openai/whisper-large-v3

Server Configuration#

Whisper ASR runs a single ASR stage on one GPU.

sgl-omni serve \
  --model-path openai/whisper-large-v3 \
  --port 8000

Transcribe Audio#

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F model=openai/whisper-large-v3 \
  -F file=@tests/data/query_to_cars.wav \
  -F response_format=json
import requests

with open("tests/data/query_to_cars.wav", "rb") as f:
    resp = requests.post(
        "http://localhost:8000/v1/audio/transcriptions",
        data={
            "model": "openai/whisper-large-v3",
            "response_format": "json",
        },
        files={"file": ("query_to_cars.wav", f, "audio/wav")},
        timeout=300,
    )

resp.raise_for_status()
print(resp.json()["text"])

Request Parameters#

Parameter

Type

Default

Description

file

file

required

Audio file uploaded as multipart form data

model

string

server default

Model identifier

language

string

unset

Optional language hint

response_format

string

json

Use json for the current Whisper path

temperature

float

unset

Optional sampling temperature

The request builder also supports task (transcribe by default) and max_new_tokens, but the public transcription endpoint currently exposes only the fields above. The route uses the ASR stage default unless the pipeline is configured another way. For smoke tests, keep the request minimal and use response_format=json.

Known Limitations#

  • This path is experimental and not yet correctness-validated. Prefer Qwen3-ASR for validated ASR serving.

  • Keep Whisper ASR at encoder batch size 1.

  • Use response_format=json; other response formats are not validated for this experimental path.

  • First startup can take several minutes.

  • The endpoint accepts one uploaded file per request.

  • Audio is resampled to 16 kHz before transcription.

  • prompt is accepted by the HTTP endpoint for OpenAI compatibility, but Whisper ASR currently does not pass it into decoding.