Loading a model
To load a model in a custom recipe, use theModel class from adaptive_harmony.parameters. Call await model.to_builder(ctx) to get a ModelBuilder object, on which you can call several methods to configure how your model will be spawned.
For example, use builder.with_adapter() to enable lightweight adapter training instead of full parameter fine-tuning (only use this if you are training the model).
Spawn methods
ModelBuilder also exposesspawn_train and spawn_inference methods.
Adaptive Engine unifies training and inference. Instead of requiring different frameworks/runtimes for training and inference, you can simply spawn models meant for training or inference with spawn_train and spawn_inference. A model spawned with spawn_train will require more GPU memory upfront, since Adaptive makes sure that enough memory is available at spawn time to fit the required max_batch_size during training (model activations, optimizer state, etc…). If you are not training a given model in your recipe (you are spawning a judge model for example), make sure to always spawn it with spawn_inference to reduce GPU memory pressure.
The max_batch_size parameter defines the maximum number of tokens that can be allocated in a single training batch - i.e a mini batch that is processed by the model in parallel, the batch size corresponding to each optimization step is user-defined and independent from this parameter. It also limits the maximum sequence length that the model is able to train on. In the worst case scenario, for a dataset of samples with length =~ to max_batch_size, the model will train on a single sample at a time. Any sequences larger than max_batch_size are simply dropped in the training classes, which also reconcile the desired optimization step batch size in # of samples.
spawn_train returns a TrainingModel, and spawn_inference returns an InferenceModel; they are both async methods.
Tensor parallelism (tp)
Tensor parallelism (tp) determines how many GPUs a model is split across during execution.
Choosing the right tp value depends on your model size and available hardware: larger models typically require higher tp to fit into memory, while smaller models may run efficiently with tp=1. Also, as explained above, a given model that fits on 2 GPUs with tp=2 that was spawned with inference_spawn might not fit if it is spawned with spawn_train.
Always ensure that the number you set for tp matches the number of devices you want to use and is supported by your infrastructure.
Passing model parameters as config input
In custom recipes, you can pass models in the recipe config using theModel class from adaptive_harmony.parameters. Adaptive will validate that the user-configured parameter for model_to_train below is a valid model key in Adaptive Engine.
Using default deployment parameters
When you callawait model.to_builder(ctx), the model is configured with the default deployment parameters as set in the Adaptive platform (KV cache length, tensor parallelism, tokens to generate). You can change these parameters globally for a model by visiting its model details page (click the model in the organizational model registry page), and editing the “Inference Configuration” setting on the right-hand menu.
Often, inference defaults will not make sense for the more memory-intensive training regime, so any default can be overridden by passing parameters directly to to_builder():
Overridable parameters:
tp- Tensor parallelism (number of GPUs to split the model across)kv_cache_len- KV cache length for the modeltokens_to_generate- Maximum tokens to generate per completion
Spawning in Jupyter notebooks
When developing interactively in Jupyter notebooks, you can spawn models directly usingclient.model() without creating a full RecipeContext. This is convenient for quick experimentation and prototyping.
First, create a client using get_client:
Notice that, in recipe scripts,
ctx.client provides access to the same client object. The Model().to_builder() approach shown above is the recommended pattern for recipe scripts as it integrates with the platform’s inference parameters configuration system.
