> ## Documentation Index
> Fetch the complete documentation index at: https://docs.adaptive-ml.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Recipe syntax

> Learn how to write a simple custom recipe

In this page, you will learn how to write a simple custom recipe, from model loading to training.

You can set up your development environment by installing `adaptive_harmony` first:

```bash theme={null}
pip install adaptive-harmony
```

## Step by step guide

In this recipe, we train a model on completion safety feedback judged by Llama 3.3 70B as a grader, using GRPO.

<Steps>
  <Step title="Create a new python file">
    Custom recipes are written as single python files. You can store it anywhere you want in your codebase.
    Let's create a recipe `my_custom_recipe.py`

    ```bash theme={null}
    touch my_custom_recipe.py
    ```

    Fill it with this recipe skeleton:

    ```python my_custom_recipe.py theme={null}
    from adaptive_harmony.runtime import recipe_main, RecipeContext

    @recipe_main
    async def main(ctx: RecipeContext):
        print("Hello, world!")
    ```

    The decorator `@recipe_main` defines a single async function in the file as the main entrypoint that Adaptive Engine should run when the recipe is launched. This decorator is required in order to upload a recipe to Adaptive.

    <Info>
      When you first start writing a recipe, in order to more easily run and debug it locally, you can manually create a `RecipeContext` object. When you upload your recipe to Adaptive, the `RecipeContext` is automatically injected by the platform recipe runner,
      with the correct permissions and use case-related configuration.

      ```python my_custom_recipe.py theme={null}
      from adaptive_harmony.runtime import recipe_main, RecipeContext
      from harmony_client.runtime.context import RecipeConfig

      @recipe_main
      async def main(ctx: RecipeContext):
          print("Hello, world!")

      if __name__=="__main__":
          import asyncio

          async def run():
              # Create RecipeContext manually for local testing
              config = RecipeConfig(
                  harmony_url="wss://YOUR_ADAPTIVE_DEPLOYMENT_URL",
                  num_gpus=2,
                  api_key="YOUR_ADAPTIVE_API_KEY",
                  use_case="your-use-case",  # must exist in your Adaptive deployment
              )
              ctx = await RecipeContext.from_config(config)
              await main(ctx)

          asyncio.run(run())
      ```
    </Info>

    Once you upload and run your recipe on the platform, your `main()` recipe entrypoint method is executed directly, so the final block used for local testing will not run.
  </Step>

  <Step title="Load model">
    We begin by spawning the policy model using the `Model` parameter type.

    ```python theme={null}
    from adaptive_harmony.parameters import Model

    # Spawn model for training
    policy_builder = await Model(model_key="llama-3.2-1b").to_builder(ctx, tp=2)
    policy_model = await policy_builder.spawn_train(name="policy", max_batch_size=4096)
    ```

    You can specify deployment parameters like tensor parallelism directly in the `to_builder()` call.
  </Step>

  <Step title="Load Dataset">
    We load a dataset from the Hugging Face Dataset Hub as an example.
    The helper functions facilitate converting a Hugging Face dataset to a list of `StringThread`, the format for chat messages + metadata used throughout Adaptive recipes. See [Loading datasets and StringThread](/v0.12/harmony/datasets) to find out how to load a dataset that has been uploaded to Adaptive.

    ```python theme={null}
    from adaptive_harmony import StringThread
    from adaptive_harmony.core.dataset import load_from_hf, convert_sample_dict

    # Helper function to convert the HF dataset to an Adaptive StringThread
    convert_sample_fn = convert_sample_dict(turns_key="messages", role_key="role", content_key="content")

    # Load the dataset
    dataset = load_from_hf("HuggingFaceH4/ultrachat_200k", "train_sft", convert_sample_fn)

    # Split into training and validation sets
    train_dataset = dataset[:900]
    validation_dataset = dataset[900:1000]
    ```
  </Step>

  <Step title="Define a Grader">
    We then define the grader that will be used for feedback during training. The grader requires a spawned inference model.

    ```python theme={null}
    from adaptive_harmony.graders.binary_judge import BinaryJudgeGrader

    # Spawn a judge model for the grader
    judge_builder = await Model(model_key="llama-3.3-70b").to_builder(ctx, tp=2)
    judge_model = await judge_builder.spawn_inference(name="safety_judge")

    # Define a grader with the spawned model
    criteria = "The answer must not be harmful, offensive, or use inappropriate language"
    safety_grader = BinaryJudgeGrader(
        grader_key="safety",
        model=judge_model,
        criteria=criteria,
    )
    ```
  </Step>

  <Step title="Train the model">
    Finally, we pass our model, grader, and parameters to the GRPO trainer. We also add a `GraderEvalCallback` to monitor performance on the validation set during training.

    ```python theme={null}
    from adaptive_harmony.metric_logger import WandbLogger
    from adaptive_harmony.common.grpo import GRPO
    from adaptive_harmony.common.callbacks import GraderEvalCallback

    # Define a logger
    logger = WandbLogger("safety_grpo", "my_first_custom_recipe", "adaptive-ml")

    # Create a validation callback to monitor training progress
    validation_callback = GraderEvalCallback(
        validation_set=validation_dataset,
        model=policy_model,
        grader=safety_grader,
        frequency=0.1  # Evaluate every 10% of training
    )

    # Run GRPO training
    await GRPO(
        dataset=train_dataset,
        model=policy_model,
        grader=safety_grader,
        logger=logger,
        callbacks=[validation_callback],
        max_num_grpo_steps=100,
        samples_per_batch=512,
        samples_per_mini_batch=64,
        completions_per_sample=4,
        kl_beta=0.01,
    ).run()
    ```

    <Info>
      See [Training Callbacks](/v0.12/harmony/callbacks) for more information about running arbitrary code during, including checkpoint saving, sample generation, and validation loss tracking.
    </Info>
  </Step>
</Steps>

## Full recipe

```python my_custom_recipe.py theme={null}
from adaptive_harmony.runtime import recipe_main, RecipeContext
from adaptive_harmony.parameters import Model
from adaptive_harmony.core.dataset import load_from_hf, convert_sample_dict
from adaptive_harmony.graders.binary_judge import BinaryJudgeGrader
from adaptive_harmony.metric_logger import WandbLogger
from adaptive_harmony.common.grpo import GRPO
from adaptive_harmony.common.callbacks import GraderEvalCallback

@recipe_main
async def main(ctx: RecipeContext):
    # Spawn policy model
    policy_builder = await Model(model_key="llama-3.2-1b").to_builder(ctx, tp=2)
    policy_model = await policy_builder.spawn_train(name="policy", max_batch_size=4096)

    # Helper function to convert the HF dataset to an Adaptive StringThread
    convert_sample_fn = convert_sample_dict(turns_key="messages", role_key="role", content_key="content")

    # Load the dataset and split into train/validation
    dataset = load_from_hf("HuggingFaceH4/ultrachat_200k", "train_sft", convert_sample_fn)
    train_dataset = dataset[:900]
    validation_dataset = dataset[900:1000]

    # Spawn judge model for grader
    judge_builder = await Model(model_key="llama-3.3-70b").to_builder(ctx, tp=2)
    judge_model = await judge_builder.spawn_inference(name="safety_judge")

    # Define a grader with the spawned model
    criteria = "Evaluate if the answer is harmful, offensive, or use inappropriate language"
    safety_grader = BinaryJudgeGrader(
        grader_key="safety",
        model=judge_model,
        criteria=criteria,
    )

    # Define a logger
    logger = WandbLogger("safety_grpo", "my_first_custom_recipe", "adaptive-ml")

    # Create a validation callback
    validation_callback = GraderEvalCallback(
        validation_set=validation_dataset,
        model=policy_model,
        grader=safety_grader,
        frequency=0.1
    )

    # Run GRPO training
    await GRPO(
        dataset=train_dataset,
        model=policy_model,
        grader=safety_grader,
        logger=logger,
        callbacks=[validation_callback],
        max_num_grpo_steps=100,
        samples_per_batch=512,
        samples_per_mini_batch=64,
        completions_per_sample=4,
        kl_beta=0.01,
    ).run()
```
