Skip to main content
Models power your AI applications. Attach a model to a use case to deploy it and make it available for inference.

Deploy a model

adaptive.models.attach(
    model="llama-3.1-8b-instruct",
    wait=True,
)
ParameterTypeRequiredDescription
modelstrYesModel key from the registry
waitboolNoBlock until model is online (default: False)
make_defaultboolNoSet as default model for the use case
The model becomes available within a few minutes. Adaptive supports most transformer-based models including Llama, Qwen, Gemma, Mistral, and DeepSeek. See Integrations for proprietary models.

Run inference

response = adaptive.chat.create(
    model="llama-3.1-8b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    labels={"project": "my-app"},
)
print(response.choices[0].message.content)

# Get completion_id for feedback
completion_id = response.choices[0].completion_id
Requests are logged automatically. Use labels to organize and filter interactions. See Interactions for details.If you omit model, requests route to the use case’s default model.
# Streaming
stream = adaptive.chat.create(
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices:
        print(chunk.choices[0].delta.content, end="", flush=True)
Use the OpenAI Python library with your Adaptive deployment:
from openai import OpenAI

client = OpenAI(
    base_url=f"{ADAPTIVE_URL}/api/v1",
    api_key=ADAPTIVE_API_KEY,
)

response = client.chat.completions.create(
    model="use_case_key/model_key",
    messages=[{"role": "user", "content": "Hello!"}],
)
Set model to use_case_key/model_key. Use metadata instead of labels.
See SDK Reference for all model and chat methods.