In this page, you will learn how to write a simple custom recipe, from model loading to training. You can set up your development environment by installing adaptive_harmony first:
pip install adaptive-harmony

Step by step guide

In this recipe, we train a model on completion safety feedback judged by Llama 3.3 70B as a grader, using PPO.
1

Create a new python file

Custom recipes are written as single python files. You can store it anywhere you want in your codebase. Let’s create a recipe my_custom_recipe.py
touch my_custom_recipe.py
Fill it with this recipe skeleton:
my_custom_recipe.py
from adaptive_harmony.runtime import recipe_main, RecipeContext

@recipe_main
async def main(ctx: RecipeContext):
    client = ctx.client
    model_to_train = "model_registry://llama-3.2-1b"    # this is the key of the model we want to train
    print("Hello, world!")
The decorator @recipe_main defines a single async function in the file as the main entrypoint that Adaptive Engine should run when the recipe is launched. This decorator is required in order to upload a recipe to Adaptive.
When you first start writing a recipe, in order to more easily run and debug it, we suggest you instantiate a harmony client using get_client directly as explained in Harmony client and local testing . You can then just run python my_custom_recipe.py without concerning yourself with RecipeContext. When you upload your recipe to Adaptive however, RecipeContext is a mandatory input argument for your @recipe_main function.
my_custom_recipe.py
from adaptive_harmony.runtime import recipe_main

@recipe_main
async def main():
    get_client(
        addr="wss://YOUR_ADAPTIVE_DEPLOYMENT_URL.com",
        num_gpus=2,
        api_key="YOUR_ADAPTIVE_API_KEY",
        use_case="my-use-case"  # must exist in your Adaptive deployment
    )
    model_to_train = "model_registry://llama-3.2-1b"    # this is the key of the model we want to train
    print("Hello, world!")

if __name__=="__main__":
    import asyncio

    asyncio.run(main())
At the end of the debugging, you can remove this connection and retrieve the client from the context.
2

Load models

We begin by spawning the policy model and the value model.
# Spawn models
policy_model = await client.model(model_to_train).tp(2).spawn_train(name="policy",max_batch_size=4096)
value_model = await client.model(model_to_train).into_scoring_model().tp(2).spawn_train(name="value",max_batch_size=4096)
You can specify the tensor parallel degree for each spawned model using .tp().
3

Load Dataset

We load a dataset from the Hugging Face Dataset Hub as an example. The helper functions facilitate converting a Hugging Face dataset to a list of StringThread, the format for chat messages + metadata used throughout Adaptive recipes. See Loading datasets and StringThread to find out how to load a dataset that has been uploaded to Adaptive.
from adaptive_harmony import StringThread
from adaptive_harmony.core.dataset import load_from_hf, convert_sample_dict

# Helper function to convert the HF dataset to an Adaptive StringThread
convert_sample_fn = convert_sample_dict(turns_key="messages", role_key="role", content_key="content")

# Load the dataset
dataset = load_from_hf("HuggingFaceH4/ultrachat_200k", "train_sft", convert_sample_fn)
4

Define a Grader

We then define the grader that will be used for feedback during training.
from adaptive_harmony.graders.binary_judge import BinaryJudgeGrader

# Define a grader
criteria = "Evaluate if the answer is harmful, offensive, or use inappropriate language"
safety_grader = BinaryJudgeGrader(
    grader_key="safety",
    model_key="model_registry://llama-3.3-70b",
    client=client,
    criteria=criteria,
    tp=2,
)
grader.setup()
grader.setup() handles preparing the grader for training or evaluation, which in this case actually spawns the judge model.
5

Adapt the model

Finally, we pass all of our models, grader, and parameters to the PPO trainer.
from adaptive_harmony.metric_logger import WandbLogger
from adaptive_harmony.common import PPO

# Define a logger
logger = WandbLogger("safety_ppo", "my_first_custom_recipe", "adaptive-ml")

# Run PPO training
await PPO(
    dataset=dataset,
    model=policy_model,
    value_model=value_model,
    grader=safety_grader,
    logger=logger,
    max_num_ppo_steps=100,
    samples_per_batch=256,
    samples_per_mini_batch=128,
    mini_epochs_per_batch=2,
    kl_beta=0.01,
).run()

Full recipe

my_custom_recipe.py
from adaptive_harmony.runtime import recipe_main, RecipeContext
from adaptive_harmony.core.dataset import load_from_hf, convert_sample_dict
from adaptive_harmony.graders.binary_judge import BinaryJudgeGrader
from adaptive_harmony.metric_logger import WandbLogger
from adaptive_harmony.common import PPO

@recipe_main
async def main(ctx: RecipeContext):
    client = ctx.client
    model_to_train = "model_registry://llama-3.2-1b"    # this is the key of the model we want to train

    # Spawn models
    policy_model = await client.model(model_to_train).tp(2).spawn_train("policy", 4096)
    value_model = await client.model(model_to_train).into_scoring_model().tp(2).spawn_train("value", 4096)

    # Helper function to convert the HF dataset to an Adaptive StringThread
    convert_sample_fn = convert_sample_dict(turns_key="messages", role_key="role", content_key="content")

    # Load the dataset
    dataset = load_from_hf("HuggingFaceH4/ultrachat_200k", "train_sft", convert_sample_fn)

    # Define a scorer
    criteria = "Evaluate if the answer is harmful, offensive, or use inappropriate language"
    safety_grader = BinaryJudgeGrader(
        grader_key="safety",
        model_key="model_registry://llama-3.3-70b",
        client=client,
        criteria=criteria,
        tp=2,
    )
    grader.setup()

    # Define a logger
    logger = WandbLogger("safety_ppo", "my_first_custom_recipe", "adaptive-ml")

    # Run PPO training
    await PPO(
        dataset=dataset,
        model=policy_model,
        value_model=value_model,
        grader=safety_grader,
        logger=logger,
        max_num_ppo_steps=100,
        num_samples_per_batch=256,
        num_samples_per_mini_batch=128,
        mini_epochs_per_batch=2,
        kl_beta=0.01,
    ).run()