Inference

Send chat completions to deployed models using the Adaptive SDK, OpenAI Python library, or any HTTP client. If you omit model, requests route to the project’s default model, or to a model in an active A/B test. Interactions (prompt + completion pairs) are logged automatically. See Interactions for details.

Chat completions

response = adaptive.chat.create(
    model="llama-3.1-8b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    labels={"project": "support-bot"},
)
print(response.choices[0].message.content)

Parameter	Type	Description
`model`	str	Model key. Omit to use the project default.
`messages`	list	Chat messages with `role` and `content`
`labels`	dict	Key-value pairs for filtering interactions
`stream`	bool	Enable streaming (default: False)
`temperature`	float	Sampling temperature
`max_tokens`	int	Maximum tokens to generate
`stop`	list	Stop sequences
`top_p`	float	Top-p sampling threshold
`session_id`	str or UUID	Session ID for KV-cache reuse across turns
`store`	bool	Whether to log the interaction (default: True)

Streaming

stream = adaptive.chat.create(
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices:
        print(chunk.choices[0].delta.content, end="", flush=True)

Get the completion ID

Use completion_id to log Metrics against the response:

completion_id = response.choices[0].completion_id

Vision requests

Models with the Multimodal tag accept images alongside text. Images must be base64-encoded data URIs (JPEG, PNG, WebP, or GIF, up to 10 MB each).

import base64

with open("photo.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

response = adaptive.chat.create(
    model="your-vlm-key",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": f"data:image/png;base64,{image_data}"},
            ],
        }
    ],
)

Structured output

Pass response_format to constrain a completion to a JSON Schema or a Pydantic model. For internal models, invalid tokens are masked at each generation step — the response is structurally guaranteed to parse. For external providers, the schema is forwarded to the provider’s native structured-output API.

from pydantic import BaseModel

class Classification(BaseModel):
    label: str
    confidence: float

response = adaptive.chat.create(
    model="llama-3.1-8b-instruct",
    messages=[{"role": "user", "content": "Classify: 'great product, fast shipping'"}],
    response_format=Classification,
)
parsed = Classification.model_validate_json(response.choices[0].message.content)

response_format accepts a Pydantic BaseModel class, a raw JSON Schema envelope ({"type": "json_schema", "json_schema": {"name": ..., "schema": ...}}), or None (default). Pydantic models are auto-converted via model_json_schema() and patched for strict-mode compatibility (refs inlined, additionalProperties: false added).The response is a JSON string in response.choices[0].message.content — the SDK does not auto-deserialize. Call Model.model_validate_json(...) to get a typed instance.

Schema features supported

Constrained decoding compiles the schema to a token-mask grammar. The compiler supports:

Types: string, integer, number, boolean, null, object, array, and union types via ["string", "null"] syntax
Composition: oneOf, anyOf, allOf (object merge only), $ref and $defs (inlined during SDK prep)
Strings: minLength, maxLength, pattern (regex), format, enum, const
Numbers: minimum, maximum, exclusiveMinimum, exclusiveMaximum. multipleOf requires explicit bounds — without bounds, it’s silently ignored.
Arrays: items, minItems, maxItems. Arrays must declare items.
Objects: properties, required, additionalProperties (false / true / schema)

Recursive schemas with cyclic $ref are unrolled to depth 4; deeper nesting is truncated. Format keywords without a regex equivalent are dropped.

Streaming with structured output

stream=True and response_format work together. Each chunk delivers partial JSON; buffer until the stream closes, then parse:

buffer = ""
stream = adaptive.chat.create(
    messages=[{"role": "user", "content": "Classify: 'late and damaged'"}],
    response_format=Classification,
    stream=True,
)
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        buffer += chunk.choices[0].delta.content
parsed = Classification.model_validate_json(buffer)

Failure modes

Situation	Behavior
Schema fails to compile	The constraint is dropped with a warning in server logs; the model generates unconstrained. Validate your schema during development.
`max_tokens` exhausted before completion	`finish_reason` is `"length"`. The response is truncated JSON — `model_validate_json` will raise. Check `finish_reason` before parsing.
External provider doesn’t support structured output for the model	The constraint is dropped silently. Stick to providers and models that support structured output natively.
Recursive schema beyond depth 4	Deeper levels are truncated at compile time.

See SDK Reference for all chat methods.

OpenAI compatibility

Use the OpenAI Python library with your Adaptive deployment:

from openai import OpenAI

client = OpenAI(
    base_url=f"{ADAPTIVE_URL}/api/v1",
    api_key=ADAPTIVE_API_KEY,
)

response = client.chat.completions.create(
    model="project_key/model_key",
    messages=[{"role": "user", "content": "Hello!"}],
)

Set model to project_key/model_key. Use metadata instead of labels.

Image format difference

Multimodal image format differs between Adaptive and OpenAI:

# Adaptive format (flat string)
{"type": "image_url", "image_url": "data:image/png;base64,..."}

# OpenAI format (nested object)
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}

HTTP requests

Use any HTTP client to call the chat completions endpoint directly.

requests
curl

import requests

headers = {"Authorization": "Bearer ADAPTIVE_API_KEY"}
payload = {
    "model": "project_key/model_key",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    "labels": {"project": "support-bot"},
}

response = requests.post(
    url="ADAPTIVE_URL/api/v1/chat/completions",
    json=payload,
    headers=headers,
)
completion_text = response.json()["choices"][0]["message"]["content"]

curl "$ADAPTIVE_URL/api/v1/chat/completions" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ADAPTIVE_API_KEY" \
  -d '{
    "model": "project_key/model_key",
    "messages": [{"role": "user", "content": "Hello!"}],
    "labels": {"project": "support-bot"}
  }'

See API Reference for the full endpoint specification.

Start

Core

Advanced

Deploy

Updates

Chat completions

Streaming

Get the completion ID

Vision requests

Structured output

Streaming with structured output

Failure modes

Chat playground

Structured output

OpenAI compatibility

HTTP requests

Start

Core

Advanced

Deploy

Updates

Documentation Index

​Chat completions

​Streaming

​Get the completion ID

​Vision requests

​Structured output

​Streaming with structured output

​Failure modes

​Chat playground

​Structured output

​OpenAI compatibility

​HTTP requests

Chat completions

Streaming

Get the completion ID

Vision requests

Structured output

Streaming with structured output

Failure modes

Chat playground

Structured output

OpenAI compatibility

HTTP requests