Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.adaptive-ml.com/llms.txt

Use this file to discover all available pages before exploring further.

Send chat completions to deployed models using the Adaptive SDK, OpenAI Python library, or any HTTP client. If you omit model, requests route to the project’s default model, or to a model in an active A/B test. Interactions (prompt + completion pairs) are logged automatically. See Interactions for details.

Chat completions

response = adaptive.chat.create(
    model="llama-3.1-8b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    labels={"project": "support-bot"},
)
print(response.choices[0].message.content)
ParameterTypeDescription
modelstrModel key. Omit to use the project default.
messageslistChat messages with role and content
labelsdictKey-value pairs for filtering interactions
streamboolEnable streaming (default: False)
temperaturefloatSampling temperature
max_tokensintMaximum tokens to generate
stoplistStop sequences
top_pfloatTop-p sampling threshold
session_idstr or UUIDSession ID for KV-cache reuse across turns
storeboolWhether to log the interaction (default: True)

Streaming

stream = adaptive.chat.create(
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices:
        print(chunk.choices[0].delta.content, end="", flush=True)

Get the completion ID

Use completion_id to log Metrics against the response:
completion_id = response.choices[0].completion_id

Vision requests

Models with the Multimodal tag accept images alongside text. Images must be base64-encoded data URIs (JPEG, PNG, WebP, or GIF, up to 10 MB each).
import base64

with open("photo.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

response = adaptive.chat.create(
    model="your-vlm-key",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": f"data:image/png;base64,{image_data}"},
            ],
        }
    ],
)

Structured output

Pass response_format to constrain a completion to a JSON Schema or a Pydantic model. For internal models, invalid tokens are masked at each generation step — the response is structurally guaranteed to parse. For external providers, the schema is forwarded to the provider’s native structured-output API.
from pydantic import BaseModel

class Classification(BaseModel):
    label: str
    confidence: float

response = adaptive.chat.create(
    model="llama-3.1-8b-instruct",
    messages=[{"role": "user", "content": "Classify: 'great product, fast shipping'"}],
    response_format=Classification,
)
parsed = Classification.model_validate_json(response.choices[0].message.content)
response_format accepts a Pydantic BaseModel class, a raw JSON Schema envelope ({"type": "json_schema", "json_schema": {"name": ..., "schema": ...}}), or None (default). Pydantic models are auto-converted via model_json_schema() and patched for strict-mode compatibility (refs inlined, additionalProperties: false added).The response is a JSON string in response.choices[0].message.content — the SDK does not auto-deserialize. Call Model.model_validate_json(...) to get a typed instance.
Constrained decoding compiles the schema to a token-mask grammar. The compiler supports:
  • Types: string, integer, number, boolean, null, object, array, and union types via ["string", "null"] syntax
  • Composition: oneOf, anyOf, allOf (object merge only), $ref and $defs (inlined during SDK prep)
  • Strings: minLength, maxLength, pattern (regex), format, enum, const
  • Numbers: minimum, maximum, exclusiveMinimum, exclusiveMaximum. multipleOf requires explicit bounds — without bounds, it’s silently ignored.
  • Arrays: items, minItems, maxItems. Arrays must declare items.
  • Objects: properties, required, additionalProperties (false / true / schema)
Recursive schemas with cyclic $ref are unrolled to depth 4; deeper nesting is truncated. Format keywords without a regex equivalent are dropped.

Streaming with structured output

stream=True and response_format work together. Each chunk delivers partial JSON; buffer until the stream closes, then parse:
buffer = ""
stream = adaptive.chat.create(
    messages=[{"role": "user", "content": "Classify: 'late and damaged'"}],
    response_format=Classification,
    stream=True,
)
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        buffer += chunk.choices[0].delta.content
parsed = Classification.model_validate_json(buffer)

Failure modes

SituationBehavior
Schema fails to compileThe constraint is dropped with a warning in server logs; the model generates unconstrained. Validate your schema during development.
max_tokens exhausted before completionfinish_reason is "length". The response is truncated JSON — model_validate_json will raise. Check finish_reason before parsing.
External provider doesn’t support structured output for the modelThe constraint is dropped silently. Stick to providers and models that support structured output natively.
Recursive schema beyond depth 4Deeper levels are truncated at compile time.
See SDK Reference for all chat methods.

OpenAI compatibility

Use the OpenAI Python library with your Adaptive deployment:
from openai import OpenAI

client = OpenAI(
    base_url=f"{ADAPTIVE_URL}/api/v1",
    api_key=ADAPTIVE_API_KEY,
)

response = client.chat.completions.create(
    model="project_key/model_key",
    messages=[{"role": "user", "content": "Hello!"}],
)
Set model to project_key/model_key. Use metadata instead of labels.
Multimodal image format differs between Adaptive and OpenAI:
# Adaptive format (flat string)
{"type": "image_url", "image_url": "data:image/png;base64,..."}

# OpenAI format (nested object)
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}

HTTP requests

Use any HTTP client to call the chat completions endpoint directly.
import requests

headers = {"Authorization": "Bearer ADAPTIVE_API_KEY"}
payload = {
    "model": "project_key/model_key",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    "labels": {"project": "support-bot"},
}

response = requests.post(
    url="ADAPTIVE_URL/api/v1/chat/completions",
    json=payload,
    headers=headers,
)
completion_text = response.json()["choices"][0]["message"]["content"]
See API Reference for the full endpoint specification.