Recipe syntax

In this page, you will learn how to write a simple custom recipe, from model loading to training. You can set up your development environment by installing adaptive_harmony first:

pip install adaptive-harmony

Step by step guide

In this recipe, we train a model on completion safety feedback judged by Llama 3.3 70B as a grader, using PPO.

Create a new python file

Custom recipes are written as single python files. You can store it anywhere you want in your codebase. Let’s create a recipe my_custom_recipe.py

touch my_custom_recipe.py

Fill it with this recipe skeleton:

my_custom_recipe.py

from adaptive_harmony.runtime import recipe_main, RecipeContext

@recipe_main
async def main(ctx: RecipeContext):
    client = ctx.client
    model_to_train = "model_registry://llama-3.2-1b"    # this is the key of the model we want to train
    print("Hello, world!")

The decorator @recipe_main defines a single async function in the file as the main entrypoint that Adaptive Engine should run when the recipe is launched. This decorator is required in order to upload a recipe to Adaptive.

When you first start writing a recipe, in order to more easily run and debug it, we suggest you instantiate a harmony client using get_client directly as explained in Harmony client and local testing . You can then just run python my_custom_recipe.py without concerning yourself with RecipeContext. When you upload your recipe to Adaptive however, RecipeContext is a mandatory input argument for your @recipe_main function.

my_custom_recipe.py

from adaptive_harmony.runtime import recipe_main
from adaptive_harmony import get_client

@recipe_main
async def main():
    get_client(
        addr="wss://YOUR_ADAPTIVE_DEPLOYMENT_URL.com",
        num_gpus=2,
        api_key="YOUR_ADAPTIVE_API_KEY",
        use_case="my-use-case"  # must exist in your Adaptive deployment
    )
    model_to_train = "model_registry://llama-3.2-1b"    # this is the key of the model we want to train
    print("Hello, world!")

if __name__=="__main__":
    import asyncio

    asyncio.run(main())

At the end of the debugging, you can remove this connection and retrieve the client from the context.

Load models

We begin by spawning the policy model and the value model.

# Spawn models
policy_model = await client.model(model_to_train).tp(2).spawn_train(name="policy",max_batch_size=4096)
value_model = await client.model(model_to_train).into_scoring_model().tp(2).spawn_train(name="value",max_batch_size=4096)

You can specify the tensor parallel degree for each spawned model using .tp().

Load Dataset

We load a dataset from the Hugging Face Dataset Hub as an example. The helper functions facilitate converting a Hugging Face dataset to a list of StringThread, the format for chat messages + metadata used throughout Adaptive recipes. See Loading datasets and StringThread to find out how to load a dataset that has been uploaded to Adaptive.

from adaptive_harmony import StringThread
from adaptive_harmony.core.dataset import load_from_hf, convert_sample_dict

# Helper function to convert the HF dataset to an Adaptive StringThread
convert_sample_fn = convert_sample_dict(turns_key="messages", role_key="role", content_key="content")

# Load the dataset
dataset = load_from_hf("HuggingFaceH4/ultrachat_200k", "train_sft", convert_sample_fn)

Define a Grader

We then define the grader that will be used for feedback during training.

from adaptive_harmony.graders.binary_judge import BinaryJudgeGrader

# Define a grader
criteria = "Evaluate if the answer is harmful, offensive, or use inappropriate language"
safety_grader = BinaryJudgeGrader(
    grader_key="safety",
    model_key="model_registry://llama-3.3-70b",
    client=client,
    criteria=criteria,
    tp=2,
)
await safety_grader.setup()

await safety_grader.setup() handles preparing the grader for training or evaluation, which in this case actually spawns the judge model.

Adapt the model

Finally, we pass all of our models, grader, and parameters to the PPO trainer.

from adaptive_harmony.metric_logger import WandbLogger
from adaptive_harmony.common import PPO

# Define a logger
logger = WandbLogger("safety_ppo", "my_first_custom_recipe", "adaptive-ml")

# Run PPO training
await PPO(
    dataset=dataset,
    model=policy_model,
    value_model=value_model,
    grader=safety_grader,
    logger=logger,
    max_num_ppo_steps=100,
    samples_per_batch=256,
    samples_per_mini_batch=128,
    mini_epochs_per_batch=2,
    kl_beta=0.01,
).run()

Full recipe

my_custom_recipe.py

from adaptive_harmony.runtime import recipe_main, RecipeContext
from adaptive_harmony.core.dataset import load_from_hf, convert_sample_dict
from adaptive_harmony.graders.binary_judge import BinaryJudgeGrader
from adaptive_harmony.metric_logger import WandbLogger
from adaptive_harmony.common import PPO

@recipe_main
async def main(ctx: RecipeContext):
    client = ctx.client
    model_to_train = "model_registry://llama-3.2-1b"    # this is the key of the model we want to train

    # Spawn models
    policy_model = await client.model(model_to_train).tp(2).spawn_train(name="policy", max_batch_size=4096)
    value_model = await client.model(model_to_train).into_scoring_model().tp(2).spawn_train(name="value", max_batch_size=4096)

    # Helper function to convert the HF dataset to an Adaptive StringThread
    convert_sample_fn = convert_sample_dict(turns_key="messages", role_key="role", content_key="content")

    # Load the dataset
    dataset = load_from_hf("HuggingFaceH4/ultrachat_200k", "train_sft", convert_sample_fn)

    # Define a grader
    criteria = "Evaluate if the answer is harmful, offensive, or use inappropriate language"
    safety_grader = BinaryJudgeGrader(
        grader_key="safety",
        model_key="model_registry://llama-3.3-70b",
        client=client,
        criteria=criteria,
        tp=2,
    )
    await safety_grader.setup()

    # Define a logger
    logger = WandbLogger("safety_ppo", "my_first_custom_recipe", "adaptive-ml")

    # Run PPO training
    await PPO(
        dataset=dataset,
        model=policy_model,
        value_model=value_model,
        grader=safety_grader,
        logger=logger,
        max_num_ppo_steps=100,
        samples_per_batch=256,
        samples_per_mini_batch=128,
        mini_epochs_per_batch=2,
        kl_beta=0.01,
    ).run()

Start

Core

Advanced

Deploy

Updates

Step by step guide

Full recipe

Start

Core

Advanced

Deploy

Updates

​Step by step guide

​Full recipe

Step by step guide

Full recipe