Skip to main content
In this page, you will learn how to write a simple custom recipe, from model loading to training. You can set up your development environment by installing adaptive_harmony first:
pip install adaptive-harmony

Step by step guide

In this recipe, we train a model on completion safety feedback judged by Llama 3.3 70B as a grader, using GRPO.
1

Create a new python file

Custom recipes are written as single python files. You can store it anywhere you want in your codebase. Let’s create a recipe my_custom_recipe.py
touch my_custom_recipe.py
Fill it with this recipe skeleton:
my_custom_recipe.py
from adaptive_harmony.runtime import recipe_main, RecipeContext

@recipe_main
async def main(ctx: RecipeContext):
    print("Hello, world!")
The decorator @recipe_main defines a single async function in the file as the main entrypoint that Adaptive Engine should run when the recipe is launched. This decorator is required in order to upload a recipe to Adaptive.
When you first start writing a recipe, in order to more easily run and debug it locally, you can manually create a RecipeContext object. When you upload your recipe to Adaptive, the RecipeContext is automatically injected by the platform recipe runner, with the correct permissions and use case-related configuration.
my_custom_recipe.py
from adaptive_harmony.runtime import recipe_main, RecipeContext
from harmony_client.runtime.context import RecipeConfig

@recipe_main
async def main(ctx: RecipeContext):
    print("Hello, world!")

if __name__=="__main__":
    import asyncio

    async def run():
        # Create RecipeContext manually for local testing
        config = RecipeConfig(
            harmony_url="wss://YOUR_ADAPTIVE_DEPLOYMENT_URL",
            num_gpus=2,
            api_key="YOUR_ADAPTIVE_API_KEY",
            use_case="your-use-case",  # must exist in your Adaptive deployment
        )
        ctx = await RecipeContext.from_config(config)
        await main(ctx)

    asyncio.run(run())
Once you upload and run your recipe on the platform, your main() recipe entrypoint method is executed directly, so the final block used for local testing will not run.
2

Load model

We begin by spawning the policy model using the Model parameter type.
from adaptive_harmony.parameters import Model

# Spawn model for training
policy_builder = await Model(model_key="llama-3.2-1b").to_builder(ctx, tp=2)
policy_model = await policy_builder.spawn_train(name="policy", max_batch_size=4096)
You can specify deployment parameters like tensor parallelism directly in the to_builder() call.
3

Load Dataset

We load a dataset from the Hugging Face Dataset Hub as an example. The helper functions facilitate converting a Hugging Face dataset to a list of StringThread, the format for chat messages + metadata used throughout Adaptive recipes. See Loading datasets and StringThread to find out how to load a dataset that has been uploaded to Adaptive.
from adaptive_harmony import StringThread
from adaptive_harmony.core.dataset import load_from_hf, convert_sample_dict

# Helper function to convert the HF dataset to an Adaptive StringThread
convert_sample_fn = convert_sample_dict(turns_key="messages", role_key="role", content_key="content")

# Load the dataset
dataset = load_from_hf("HuggingFaceH4/ultrachat_200k", "train_sft", convert_sample_fn)

# Split into training and validation sets
train_dataset = dataset[:900]
validation_dataset = dataset[900:1000]
4

Define a Grader

We then define the grader that will be used for feedback during training. The grader requires a spawned inference model.
from adaptive_harmony.graders.binary_judge import BinaryJudgeGrader

# Spawn a judge model for the grader
judge_builder = await Model(model_key="llama-3.3-70b").to_builder(ctx, tp=2)
judge_model = await judge_builder.spawn_inference(name="safety_judge")

# Define a grader with the spawned model
criteria = "The answer must not be harmful, offensive, or use inappropriate language"
safety_grader = BinaryJudgeGrader(
    grader_key="safety",
    model=judge_model,
    criteria=criteria,
)
5

Train the model

Finally, we pass our model, grader, and parameters to the GRPO trainer. We also add a GraderEvalCallback to monitor performance on the validation set during training.
from adaptive_harmony.metric_logger import WandbLogger
from adaptive_harmony.common.grpo import GRPO
from adaptive_harmony.common.callbacks import GraderEvalCallback

# Define a logger
logger = WandbLogger("safety_grpo", "my_first_custom_recipe", "adaptive-ml")

# Create a validation callback to monitor training progress
validation_callback = GraderEvalCallback(
    validation_set=validation_dataset,
    model=policy_model,
    grader=safety_grader,
    frequency=0.1  # Evaluate every 10% of training
)

# Run GRPO training
await GRPO(
    dataset=train_dataset,
    model=policy_model,
    grader=safety_grader,
    logger=logger,
    callbacks=[validation_callback],
    max_num_grpo_steps=100,
    samples_per_batch=512,
    samples_per_mini_batch=64,
    completions_per_sample=4,
    kl_beta=0.01,
).run()
See Training Callbacks for more information about running arbitrary code during, including checkpoint saving, sample generation, and validation loss tracking.

Full recipe

my_custom_recipe.py
from adaptive_harmony.runtime import recipe_main, RecipeContext
from adaptive_harmony.parameters import Model
from adaptive_harmony.core.dataset import load_from_hf, convert_sample_dict
from adaptive_harmony.graders.binary_judge import BinaryJudgeGrader
from adaptive_harmony.metric_logger import WandbLogger
from adaptive_harmony.common.grpo import GRPO
from adaptive_harmony.common.callbacks import GraderEvalCallback

@recipe_main
async def main(ctx: RecipeContext):
    # Spawn policy model
    policy_builder = await Model(model_key="llama-3.2-1b").to_builder(ctx, tp=2)
    policy_model = await policy_builder.spawn_train(name="policy", max_batch_size=4096)

    # Helper function to convert the HF dataset to an Adaptive StringThread
    convert_sample_fn = convert_sample_dict(turns_key="messages", role_key="role", content_key="content")

    # Load the dataset and split into train/validation
    dataset = load_from_hf("HuggingFaceH4/ultrachat_200k", "train_sft", convert_sample_fn)
    train_dataset = dataset[:900]
    validation_dataset = dataset[900:1000]

    # Spawn judge model for grader
    judge_builder = await Model(model_key="llama-3.3-70b").to_builder(ctx, tp=2)
    judge_model = await judge_builder.spawn_inference(name="safety_judge")

    # Define a grader with the spawned model
    criteria = "Evaluate if the answer is harmful, offensive, or use inappropriate language"
    safety_grader = BinaryJudgeGrader(
        grader_key="safety",
        model=judge_model,
        criteria=criteria,
    )

    # Define a logger
    logger = WandbLogger("safety_grpo", "my_first_custom_recipe", "adaptive-ml")

    # Create a validation callback
    validation_callback = GraderEvalCallback(
        validation_set=validation_dataset,
        model=policy_model,
        grader=safety_grader,
        frequency=0.1
    )

    # Run GRPO training
    await GRPO(
        dataset=train_dataset,
        model=policy_model,
        grader=safety_grader,
        logger=logger,
        callbacks=[validation_callback],
        max_num_grpo_steps=100,
        samples_per_batch=512,
        samples_per_mini_batch=64,
        completions_per_sample=4,
        kl_beta=0.01,
    ).run()