> ## Documentation Index
> Fetch the complete documentation index at: https://docs.adaptive-ml.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Models

> Deploy and run inference on models in Adaptive

Models power your AI applications. Add a model to a use case to deploy it and make it available for inference.

<Tabs>
  <Tab title="SDK" icon="code">
    ## Deploy a model

    Add a model to your use case, then deploy it:

    ```python theme={null}
    adaptive.models.add_to_use_case(model="llama-3.1-8b-instruct")
    adaptive.models.deploy(model="llama-3.1-8b-instruct", wait=True)

    # Or use attach() to do both in one call
    adaptive.models.attach(model="llama-3.1-8b-instruct", wait=True)
    ```

    | Parameter      | Type | Required | Description                                  |
    | -------------- | ---- | -------- | -------------------------------------------- |
    | `model`        | str  | Yes      | Model key from the registry                  |
    | `wait`         | bool | No       | Block until model is online (default: False) |
    | `make_default` | bool | No       | Set as default model for the use case        |

    The model becomes available within a few minutes. Adaptive supports most transformer-based models including Llama, Qwen, Gemma, Mistral, and DeepSeek. See [Integrations](/v0.12/advanced/integrations) for proprietary models.

    ## Run inference

    ```python theme={null}
    response = adaptive.chat.create(
        model="llama-3.1-8b-instruct",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello!"}
        ],
        labels={"project": "my-app"},
    )
    print(response.choices[0].message.content)

    # Get completion_id for feedback
    completion_id = response.choices[0].completion_id
    ```

    Requests are logged automatically. Use `labels` to organize and filter interactions. See [Interactions](/v0.12/core/interactions) for details.

    If you omit `model`, requests route to the use case's default model.

    ```python theme={null}
    # Streaming
    stream = adaptive.chat.create(
        messages=[{"role": "user", "content": "Hello!"}],
        stream=True,
    )
    for chunk in stream:
        if chunk.choices:
            print(chunk.choices[0].delta.content, end="", flush=True)
    ```

    <Accordion title="OpenAI compatibility">
      Use the OpenAI Python library with your Adaptive deployment:

      ```python theme={null}
      from openai import OpenAI

      client = OpenAI(
          base_url=f"{ADAPTIVE_URL}/api/v1",
          api_key=ADAPTIVE_API_KEY,
      )

      response = client.chat.completions.create(
          model="use_case_key/model_key",
          messages=[{"role": "user", "content": "Hello!"}],
      )
      ```

      Set `model` to `use_case_key/model_key`. Use `metadata` instead of `labels`.
    </Accordion>

    See [SDK Reference](/v0.12/reference/sdk) for all model and chat methods.
  </Tab>

  <Tab title="UI" icon="mouse-pointer">
    ## Deploy a model

    <Frame caption="The Models page lists all models in your registry">
      <img src="https://mintcdn.com/adaptiveml/R5QotOduSKbjj2fS/static/models-0-11.png?fit=max&auto=format&n=R5QotOduSKbjj2fS&q=85&s=b4f4be7b06f9d45f1c117c1f6155c8f4" width="3212" height="1418" data-path="static/models-0-11.png" />
    </Frame>

    Navigate to your use case and open the **Models** tab. Click **Add Model** and select from the registry.

    <Frame caption="Adapters appear indented under their backbone model">
      <img src="https://mintcdn.com/adaptiveml/R5QotOduSKbjj2fS/static/lora_indents-0-11.png?fit=max&auto=format&n=R5QotOduSKbjj2fS&q=85&s=949e681d4feb4fa060150e9a754fdb5b" width="3220" height="1406" data-path="static/lora_indents-0-11.png" />
    </Frame>

    Fine-tuned adapter models appear indented under their base model.

    ## Run inference

    Open your use case and click **Chat** to interact with deployed models directly in the browser.
  </Tab>
</Tabs>
