Qwen3-ASR#
Qwen3-ASR is an audio transcription model served through the OpenAI-compatible /v1/audio/transcriptions endpoint. It accepts one uploaded audio file per request and returns text.
Prerequisites#
Install sglang-omni by following Installation, then download the model:
hf download Qwen/Qwen3-ASR-1.7B
Server Configuration#
Qwen3-ASR runs a single ASR stage on one GPU.
sgl-omni serve \
--model-path Qwen/Qwen3-ASR-1.7B \
--port 8000
Transcribe Audio#
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-F model=Qwen/Qwen3-ASR-1.7B \
-F file=@tests/data/query_to_cars.wav \
-F language=en \
-F response_format=json
import requests
with open("tests/data/query_to_cars.wav", "rb") as f:
resp = requests.post(
"http://localhost:8000/v1/audio/transcriptions",
data={
"model": "Qwen/Qwen3-ASR-1.7B",
"language": "en",
"response_format": "json",
},
files={"file": ("query_to_cars.wav", f, "audio/wav")},
timeout=300,
)
resp.raise_for_status()
print(resp.json()["text"])
Request Parameters#
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
file |
required |
Audio file uploaded as multipart form data |
|
string |
server default |
Model identifier |
|
string |
|
Language hint; |
|
string |
|
|
|
float |
|
Sampling temperature; |
verbose_json is accepted, but currently returns the same minimal JSON shape as json:
{"text": "..."}.
max_new_tokens is supported inside the model request builder, but the public transcription endpoint does not currently expose it as a form field. The route uses the ASR stage default unless the pipeline is configured another way.
Benchmarking#
The current repository does not contain benchmarks/eval/benchmark_qwen3_asr_concurrency.py. Qwen3-ASR correctness and concurrency coverage is in tests/test_model/test_qwen3_asr_ci.py, which transcribes SeedTTS samples through /v1/audio/transcriptions and checks WER, throughput, latency, and RTF.
QWEN3_ASR_CI_CONCURRENCY=32 pytest tests/test_model/test_qwen3_asr_ci.py -s
Known Limitations#
The endpoint accepts one uploaded file per request.
promptis accepted by the HTTP endpoint for OpenAI compatibility, but Qwen3-ASR currently ignores it.Audio is resampled to 16 kHz before transcription.