Inference

Send chat completions to deployed models using the Adaptive SDK, OpenAI Python library, or any HTTP client. You can also chat with models directly in the UI: open your project and click Chat. If you omit model, requests route to the project’s default model, or to a model in an active A/B test. Every interaction (prompt + completion pair) is logged automatically. See Interactions for details.

Chat completions

SDK
OpenAI client
HTTP

response = adaptive.chat.create(
    model="llama-3.1-8b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    labels={"project": "support-bot"},
)
print(response.choices[0].message.content)

Parameter	Type	Description
`model`	str	Model key. Omit to use the project default.
`messages`	list	Chat messages with `role` and `content`
`labels`	dict	Key-value pairs for filtering interactions
`stream`	bool	Enable streaming (default: False)
`temperature`	float	Sampling temperature
`max_tokens`	int	Maximum tokens to generate
`stop`	list	Stop sequences
`top_p`	float	Top-p sampling threshold
`session_id`	str or UUID	Session ID for KV-cache reuse across turns
`store`	bool	Whether to log the interaction (default: True)

Streaming

stream = adaptive.chat.create(
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices:
        print(chunk.choices[0].delta.content, end="", flush=True)

Completion ID

Use completion_id to log Metrics against the response:

completion_id = response.choices[0].completion_id

See SDK Reference for all chat methods.

from openai import OpenAI

client = OpenAI(
    base_url=f"{ADAPTIVE_URL}/api/v1",
    api_key=ADAPTIVE_API_KEY,
)

response = client.chat.completions.create(
    model="project_key/model_key",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    extra_body={"labels": {"project": "support-bot"}},
)
print(response.choices[0].message.content)

Set model to project_key/model_key. Pass Adaptive-specific fields like labels through extra_body.

Streaming

stream = client.chat.completions.create(
    model="project_key/model_key",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

curl "$ADAPTIVE_URL/api/v1/chat/completions" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ADAPTIVE_API_KEY" \
  -d '{
    "model": "project_key/model_key",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    "labels": {"project": "support-bot"}
  }'

Streaming

Set "stream": true in the request body. The response uses server-sent events (SSE).

curl "$ADAPTIVE_URL/api/v1/chat/completions" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ADAPTIVE_API_KEY" \
  -d '{
    "model": "project_key/model_key",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

See API Reference for the full endpoint specification.

Multimodal chat completions

Models with the Multimodal tag accept images alongside text. In the UI, attach images directly in the Chat view. For multimodal requests, content is a list of content fragments instead of a plain string. Each fragment has a type field:

text: a text fragment: {"type": "text", "text": "..."}
image_url: an image fragment: {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}

Images must be base64-encoded data: URIs (PNG, JPEG, GIF, or WebP, up to 10MB). HTTP URLs are not supported. You can combine multiple text and image fragments in a single message.

SDK
OpenAI client
HTTP

import base64

with open("photo.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

response = adaptive.chat.create(
    model="your-vlm-key",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_data}"}},
            ],
        }
    ],
)

Image format difference

The Adaptive SDK also accepts a flat string for image_url:

# Nested object (OpenAI-compatible)
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}

# Flat string (Adaptive shorthand)
{"type": "image_url", "image_url": "data:image/png;base64,..."}

import base64
from openai import OpenAI

client = OpenAI(
    base_url=f"{ADAPTIVE_URL}/api/v1",
    api_key=ADAPTIVE_API_KEY,
)

with open("photo.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="project_key/vlm_key",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_data}"}},
            ],
        }
    ],
)

IMAGE_B64=$(base64 -i photo.png)

curl "$ADAPTIVE_URL/api/v1/chat/completions" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ADAPTIVE_API_KEY" \
  -d '{
    "model": "project_key/vlm_key",
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is in this image?"},
                {"type": "image_url", "image_url": {"url": "data:image/png;base64,'"$IMAGE_B64"'"}}
            ]
        }
    ]
  }'

Start

Core

Advanced

Deploy

Updates

Chat completions

Streaming

Completion ID

Streaming

Streaming

Multimodal chat completions

​Chat completions

​Streaming

​Completion ID

​Streaming

​Streaming

​Multimodal chat completions

Chat completions

Streaming

Completion ID

Streaming

Streaming

Multimodal chat completions