Load models

Loading a model

To load a model in a custom recipe, you should use the harmony client and provide the model key after the prefix model_registry:// to the asynchronous model method. This method returns a ModelBuilder object, on which you can call several methods to configure how your model will be spawned.

.tp() to set the tensor parallelism (TP) of the model (explain in a next section in this page)
.with_adapter() to enable lightweight adapter training instead of full parameter fine-tuning (only use this if you are training the model)
.into_scoring_model() to convert the model into a scoring model (only use this if you are training the model to predict a scalar value, e.g. when training a value model for PPO or a reward model)

from adaptive_harmony.runtime import recipe_main, RecipeContext

@recipe_main
async def main(ctx: RecipeContext):
    client = ctx.client
    
    # Full parameter Llama 3.1 8B
    policy = client.model(
        path="model_registry://llama-3.1-8b-instruct",
        kv_cache_len=100_000,
        tokens_to_generate=2048,
    ).tp(1)

    # Adapter based Qwen 3 32B, turned into a scoring model (for use as reward model)
    reward_model = client.model(
        path="model_registry://qwen3-32b",
        kv_cache_len=100_000,
        tokens_to_generate=2048,
    ).tp(1).with_adapter().into_scoring_model()

Spawn methods

ModelBuilder also exposes spawn_train and spawn_inference methods. Adaptive Engine unifies training and inference. Instead of requiring different frameworks/runtimes for training and inference, you can simply spawn models meant for training or inference with spawn_train and spawn_inference. A model spawned with spawn_train will require more GPU memory upfront, since Adaptive makes sure that enough memory is available at spawn time to fit the required max_batch_size during training (model activations, optimizer state, etc…). If you are not training a given model in your recipe (you are spawning a judge model for example), make sure to always spawn it with spawn_inference to reduce GPU memory pressure. The max_batch_size parameter defines the maximum number of tokens that can be allocated in a single training batch - i.e a mini batch that is processed by the model in parallel, the batch size corresponding to each optimization step is user-defined and independent from this parameter. It also limits the maximum sequence length that the model is able to train on. In the worst case scenario, for a dataset of samples with length =~ to max_batch_size, the model will train on a single sample at a time. Any sequences larger than max_batch_size are simply dropped in the training classes, which also reconcile the desired optimization step batch size in # of samples. spawn_train returns a TrainingModel, and spawn_inference returns an InferenceModel; they are both async methods.

from adaptive_harmony.runtime import recipe_main, RecipeContext

@recipe_main
async def main(ctx: RecipeContext):
    client = ctx.client
    

    train_model = (
        await client.model(
            path="model_registry://llama-3.1-8b-instruct",
            kv_cache_len=100_000,
            tokens_to_generate=2048,
        )
        .tp(1)
        .spawn_train(name="train_model", max_batch_size=4096)
    )

    inference_model = (
        await client.model(
            path="model_registry://llama-3.1-8b-instruct",
            kv_cache_len=100_000,
            tokens_to_generate=2048,
        )
        .tp(1)
        .spawn_inference(name="inf_model")
    )

Tensor parallelism (tp)

Tensor parallelism (tp) determines how many GPUs a model is split across during execution. Choosing the right tp value depends on your model size and available hardware: larger models typically require higher tp to fit into memory, while smaller models may run efficiently with tp=1. Also, as explained above, a given model that fits on 2 GPUs with tp=2 that was spawned with inference_spawn might not fit if it is spawned with spawn_train. Always ensure that the number you set for tp matches the number of devices you want to use and is supported by your infrastructure.

Passing model parameters as config input

In custom recipes, you can pass models in the recipe config using the magic class AdaptiveModel. Adaptive will validate that the user-configured parameter for model_to_train below is a valid model key in Adaptive Engine. You can access the path to deploy the model with AdaptiveModel().path

from typing import Annotated
from adaptive_harmony.runtime import InputConfig, AdaptiveModel

class MyConfig(InputConfig):
    model_to_train: Annotated[AdaptiveModel]
    tp: Annotated[int]
    max_seq_len: Annotated[int]
    train_adapter: Annotated[bool]

@recipe_main
async def main(config: InputConfig, ctx: RecipeContext):
    client = ctx.client

    model_builder = await client.model(config.judge_model.path).tp(config.tp).spawn_inference("judge")
               
    if config.train_adapter:
        model_builder = policy_builder.with_adapter()

    await model_builder.spawn_train("model_trained", config.model_max_seq_len)

Platform

Inference

Evaluation

Graders

Recipes & Runs

Datasets

Interactions

Integrations

Deployment

Loading a model

Spawn methods

Tensor parallelism (tp)

Passing model parameters as config input

Platform

Inference

Evaluation

Graders

Recipes & Runs

Datasets

Interactions

Integrations

Deployment

​Loading a model

​Spawn methods

​Tensor parallelism (tp)

​Passing model parameters as config input

Loading a model

Spawn methods

Tensor parallelism (tp)

Passing model parameters as config input