Embedding Model#

Launch A Server#

[1]:

import subprocess
import time
import requests

# Equivalent to running this in the shell:
# python -m sglang.launch_server --model-path Alibaba-NLP/gte-Qwen2-7B-instruct --port 30010 --host 0.0.0.0 --is-embedding --log-level error
embedding_process = subprocess.Popen(
    [
        "python",
        "-m",
        "sglang.launch_server",
        "--model-path",
        "Alibaba-NLP/gte-Qwen2-7B-instruct",
        "--port",
        "30010",
        "--host",
        "0.0.0.0",
        "--is-embedding",
        "--log-level",
        "error",
    ],
    text=True,
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

while True:
    try:
        response = requests.get(
            "http://localhost:30010/v1/models",
            headers={"Authorization": "Bearer None"},
        )
        if response.status_code == 200:
            break
    except requests.exceptions.RequestException:
        time.sleep(1)

print("Embedding server is ready. Proceeding with the next steps.")

Embedding server is ready. Proceeding with the next steps.

Use Curl#

[2]:

# Get the first 10 elements of the embedding

! curl -s http://localhost:30010/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer None" \
  -d '{"model": "Alibaba-NLP/gte-Qwen2-7B-instruct", "input": "Once upon a time"}' \
  | python3 -c "import sys, json; print(json.load(sys.stdin)['data'][0]['embedding'][:10])"

[0.0083160400390625, 0.0006804466247558594, -0.00809478759765625, -0.0006995201110839844, 0.0143890380859375, -0.0090179443359375, 0.01238250732421875, 0.00209808349609375, 0.0062103271484375, -0.003047943115234375]

Using OpenAI Compatible API#

[3]:

import openai

client = openai.Client(
    base_url="http://127.0.0.1:30010/v1", api_key="None"
)

# Text embedding example
response = client.embeddings.create(
    model="Alibaba-NLP/gte-Qwen2-7B-instruct",
    input="How are you today",
)

embedding = response.data[0].embedding[:10]
print(embedding)

[0.00603485107421875, -0.0190582275390625, -0.01273345947265625, 0.01552581787109375, 0.0066680908203125, -0.0135955810546875, 0.01131439208984375, 0.0013713836669921875, -0.0089874267578125, 0.021759033203125]

Embedding Model

Contents

Embedding Model#

Launch A Server#

Use Curl#

Using OpenAI Compatible API#