adaptive_harmony first:
Step by step guide
In this recipe, we train a model on completion safety feedback judged by Llama 3.3 70B as a grader, using GRPO.Create a new python file
Custom recipes are written as single python files. You can store it anywhere you want in your codebase.
Let’s create a recipe Fill it with this recipe skeleton:The decorator Once you upload and run your recipe on the platform, your
my_custom_recipe.pymy_custom_recipe.py
@recipe_main defines a single async function in the file as the main entrypoint that Adaptive Engine should run when the recipe is launched. This decorator is required in order to upload a recipe to Adaptive.When you first start writing a recipe, in order to more easily run and debug it locally, you can manually create a
RecipeContext object. When you upload your recipe to Adaptive, the RecipeContext is automatically injected by the platform recipe runner,
with the correct permissions and use case-related configuration.my_custom_recipe.py
main() recipe entrypoint method is executed directly, so the final block used for local testing will not run.Load model
We begin by spawning the policy model using the You can specify deployment parameters like tensor parallelism directly in the
Model parameter type.to_builder() call.Load Dataset
We load a dataset from the Hugging Face Dataset Hub as an example.
The helper functions facilitate converting a Hugging Face dataset to a list of
StringThread, the format for chat messages + metadata used throughout Adaptive recipes. See Loading datasets and StringThread to find out how to load a dataset that has been uploaded to Adaptive.Define a Grader
We then define the grader that will be used for feedback during training. The grader requires a spawned inference model.
Train the model
Finally, we pass our model, grader, and parameters to the GRPO trainer. We also add a
GraderEvalCallback to monitor performance on the validation set during training.See Training Callbacks for more information about running arbitrary code during, including checkpoint saving, sample generation, and validation loss tracking.
Full recipe
my_custom_recipe.py

