> ## Documentation Index
> Fetch the complete documentation index at: https://docs.adaptive-ml.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Load models

> Loading models in your recipes

## Loading a model

To load a model in a custom recipe, use the `Model` class from `adaptive_harmony.parameters`. Call `await model.to_builder(ctx)` to get a `ModelBuilder` object, on which you can call several methods to configure how your model will be spawned.

For example, use `builder.with_adapter()` to enable lightweight adapter training instead of full parameter fine-tuning (only use this if you are training the model).

```python theme={null}
from adaptive_harmony.runtime import recipe_main, RecipeContext
from adaptive_harmony.parameters import Model

@recipe_main
async def main(ctx: RecipeContext):
    # Full parameter Llama 3.1 8B
    policy_builder = await Model(model_key="llama-3.1-8b-instruct").to_builder(
        ctx,
        tp=1,
        kv_cache_len=100_000,
        tokens_to_generate=2048,
    )

    # Adapter based Qwen 3 32B
    reward_builder = await Model(model_key="qwen3-32b").to_builder(
        ctx,
        tp=1,
        kv_cache_len=100_000,
        tokens_to_generate=2048,
    )
    reward_builder = reward_builder.with_adapter()
```

### Spawn methods

ModelBuilder also exposes `spawn_train` and `spawn_inference` methods.

Adaptive Engine unifies training and inference. Instead of requiring different frameworks/runtimes for training and inference, you can simply spawn models meant for training or inference with `spawn_train` and `spawn_inference`. A model spawned with `spawn_train` will require more GPU memory upfront, since Adaptive makes sure that enough memory is available at spawn time to fit the required `max_batch_size` during training (model activations, optimizer state, etc...). If you are not training a given model in your recipe (you are spawning a judge model for example), make sure to always spawn it with `spawn_inference` to reduce GPU memory pressure.

The `max_batch_size` parameter defines the maximum number of tokens that can be allocated in a single training batch - i.e a mini batch that is processed by the model in parallel, the batch size corresponding to each optimization step is user-defined and independent from this parameter. It also limits the maximum sequence length that the model is able to train on. In the worst case scenario, for a dataset of samples with length =\~ to `max_batch_size`, the model will train on a single sample at a time. Any sequences larger than `max_batch_size` are simply dropped in the training classes, which also reconcile the desired optimization step batch size in # of samples.

`spawn_train` returns a `TrainingModel`, and `spawn_inference` returns an `InferenceModel`; they are both async methods.

```python theme={null}
from adaptive_harmony.runtime import recipe_main, RecipeContext
from adaptive_harmony.parameters import Model

@recipe_main
async def main(ctx: RecipeContext):
    # Spawn for training
    train_builder = await Model(model_key="llama-3.1-8b-instruct").to_builder(
        ctx,
        tp=1,
        kv_cache_len=100_000,
        tokens_to_generate=2048,
    )
    train_model = await train_builder.spawn_train(name="train_model", max_batch_size=4096)

    # Spawn for inference
    inference_builder = await Model(model_key="llama-3.1-8b-instruct").to_builder(
        ctx,
        tp=1,
        kv_cache_len=100_000,
        tokens_to_generate=2048,
    )
    inference_model = await inference_builder.spawn_inference(name="inf_model")
```

### Tensor parallelism (tp)

Tensor parallelism (`tp`) determines how many GPUs a model is split across during execution.
Choosing the right `tp` value depends on your model size and available hardware: larger models typically require higher `tp` to fit into memory, while smaller models may run efficiently with `tp=1`. Also, as explained above, a given model that fits on 2 GPUs with `tp=2` that was spawned with `inference_spawn` might not fit if it is spawned with `spawn_train`.
Always ensure that the number you set for `tp` matches the number of devices you want to use and is supported by your infrastructure.

## Passing model parameters as config input

In custom recipes, you can pass models in the recipe config using the `Model` class from `adaptive_harmony.parameters`. Adaptive will validate that the user-configured parameter for `model_to_train` below is a valid model key in Adaptive Engine.

### Using default deployment parameters

When you call `await model.to_builder(ctx)`, the model is configured with the **default deployment parameters** as set in the Adaptive platform (KV cache length, tensor parallelism, tokens to generate). You can change these parameters globally for a model by visiting its model details page (click the model in the organizational model registry page), and editing the "Inference Configuration" setting on the right-hand menu.
Often, inference defaults will not make sense for the more memory-intensive training regime, so any default can be overridden by passing parameters directly to `to_builder()`:

**Overridable parameters:**

* `tp` - Tensor parallelism (number of GPUs to split the model across)
* `kv_cache_len` - KV cache length for the model
* `tokens_to_generate` - Maximum tokens to generate per completion

```python theme={null}
from typing import Annotated
from adaptive_harmony.runtime import InputConfig, recipe_main, RecipeContext
from adaptive_harmony.parameters import Model

class MyConfig(InputConfig):
    model_to_train: Annotated[Model]
    tp: Annotated[int]
    max_seq_len: Annotated[int]
    train_adapter: Annotated[bool]

@recipe_main
async def main(config: MyConfig, ctx: RecipeContext):
    # Override default deployment parameters by passing them to to_builder()
    model_builder = await config.model_to_train.to_builder(
        ctx,
        tp=config.tp,                      # Override tensor parallelism
        kv_cache_len=100_000,              # Override KV cache length
        tokens_to_generate=2048            # Override max tokens to generate
    )

    if config.train_adapter:
        model_builder = model_builder.with_adapter()

    await model_builder.spawn_train("model_trained", config.max_seq_len)
```

## Spawning in Jupyter notebooks

When developing interactively in Jupyter notebooks, you can spawn models directly using `client.model()` without creating a full `RecipeContext`. This is convenient for quick experimentation and prototyping.

First, create a client using `get_client`:

```python theme={null}
from adaptive_harmony import get_client

# Create a client directly
client = await get_client(
    addr="wss://YOUR_ADAPTIVE_DEPLOYMENT_URL",
    num_gpus=2,
    api_key="YOUR_API_KEY",
    use_case="your-use-case",
)

# Spawn a model using client.model()
policy_builder = client.model(
    path="llama-3.1-8b-instruct",
    kv_cache_len=100_000,
    tokens_to_generate=2048,
).tp(1)

policy_model = await policy_builder.spawn_train(name="policy", max_batch_size=4096)
```

<Info>
  Notice that, in recipe scripts, `ctx.client` provides access to the same client object. The `Model().to_builder()` approach shown above is the recommended pattern for recipe scripts as it integrates with the platform's inference parameters configuration system.
</Info>
