Embedding Model#
Launch A Server#
[1]:
import subprocess
import time
import requests
# Equivalent to running this in the shell:
# python -m sglang.launch_server --model-path Alibaba-NLP/gte-Qwen2-7B-instruct --port 30010 --host 0.0.0.0 --is-embedding --log-level error
embedding_process = subprocess.Popen(
[
"python",
"-m",
"sglang.launch_server",
"--model-path",
"Alibaba-NLP/gte-Qwen2-7B-instruct",
"--port",
"30010",
"--host",
"0.0.0.0",
"--is-embedding",
"--log-level",
"error",
],
text=True,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
while True:
try:
response = requests.get(
"http://localhost:30010/v1/models",
headers={"Authorization": "Bearer None"},
)
if response.status_code == 200:
break
except requests.exceptions.RequestException:
time.sleep(1)
print("Embedding server is ready. Proceeding with the next steps.")
Embedding server is ready. Proceeding with the next steps.
Use Curl#
[2]:
# Get the first 10 elements of the embedding
! curl -s http://localhost:30010/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer None" \
-d '{"model": "Alibaba-NLP/gte-Qwen2-7B-instruct", "input": "Once upon a time"}' \
| python3 -c "import sys, json; print(json.load(sys.stdin)['data'][0]['embedding'][:10])"
[0.0083160400390625, 0.0006804466247558594, -0.00809478759765625, -0.0006995201110839844, 0.0143890380859375, -0.0090179443359375, 0.01238250732421875, 0.00209808349609375, 0.0062103271484375, -0.003047943115234375]
Using OpenAI Compatible API#
[3]:
import openai
client = openai.Client(
base_url="http://127.0.0.1:30010/v1", api_key="None"
)
# Text embedding example
response = client.embeddings.create(
model="Alibaba-NLP/gte-Qwen2-7B-instruct",
input="How are you today",
)
embedding = response.data[0].embedding[:10]
print(embedding)
[0.00603485107421875, -0.0190582275390625, -0.01273345947265625, 0.01552581787109375, 0.0066680908203125, -0.0135955810546875, 0.01131439208984375, 0.0013713836669921875, -0.0089874267578125, 0.021759033203125]