Models

Models power your AI applications. Attach a model to a use case to deploy it and make it available for inference.

Deploy a model

adaptive.models.attach(
    model="llama-3.1-8b-instruct",
    wait=True,
)

Parameter	Type	Required	Description
`model`	str	Yes	Model key from the registry
`wait`	bool	No	Block until model is online (default: False)
`make_default`	bool	No	Set as default model for the use case

The model becomes available within a few minutes. Adaptive supports most transformer-based models including Llama, Qwen, Gemma, Mistral, and DeepSeek. See Integrations for proprietary models.

Run inference

response = adaptive.chat.create(
    model="llama-3.1-8b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    labels={"project": "my-app"},
)
print(response.choices[0].message.content)

# Get completion_id for feedback
completion_id = response.choices[0].completion_id

Requests are logged automatically. Use labels to organize and filter interactions. See Interactions for details.If you omit model, requests route to the use case’s default model.

# Streaming
stream = adaptive.chat.create(
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices:
        print(chunk.choices[0].delta.content, end="", flush=True)

OpenAI compatibility

Use the OpenAI Python library with your Adaptive deployment:

from openai import OpenAI

client = OpenAI(
    base_url=f"{ADAPTIVE_URL}/api/v1",
    api_key=ADAPTIVE_API_KEY,
)

response = client.chat.completions.create(
    model="use_case_key/model_key",
    messages=[{"role": "user", "content": "Hello!"}],
)

Set model to use_case_key/model_key. Use metadata instead of labels.

See SDK Reference for all model and chat methods.

Start

Core

Advanced

Deploy

Updates

Deploy a model

Run inference

Deploy a model

Run inference

Start

Core

Advanced

Deploy

Updates

​Deploy a model

​Run inference

​Deploy a model

​Run inference

Deploy a model

Run inference

Deploy a model

Run inference