adaptive_harmony first:
Step by step guide
In this recipe, we train a model on completion safety feedback judged by Llama 3.3 70B as a grader, using PPO.Create a new python file
Custom recipes are written as single python files. You can store it anywhere you want in your codebase.
Let’s create a recipe Fill it with this recipe skeleton:The decorator At the end of the debugging, you can remove this connection and retrieve the client from the context.
my_custom_recipe.pymy_custom_recipe.py
@recipe_main defines a single async function in the file as the main entrypoint that Adaptive Engine should run when the recipe is launched. This decorator is required in order to upload a recipe to Adaptive.When you first start writing a recipe, in order to more easily run and debug it, we suggest you instantiate a harmony client using
get_client directly as explained in Harmony client and local testing
. You can then just run python my_custom_recipe.py without concerning yourself with RecipeContext. When you upload your recipe to Adaptive however, RecipeContext is a mandatory input argument for your @recipe_main function.my_custom_recipe.py
Load models
We begin by spawning the policy model and the value model.You can specify the tensor parallel degree for each spawned model using
.tp().Load Dataset
We load a dataset from the Hugging Face Dataset Hub as an example.
The helper functions facilitate converting a Hugging Face dataset to a list of
StringThread, the format for chat messages + metadata used throughout Adaptive recipes. See Loading datasets and StringThread to find out how to load a dataset that has been uploaded to Adaptive.Define a Grader
We then define the grader that will be used for feedback during training.
await safety_grader.setup() handles preparing the grader for training or evaluation, which in this case actually spawns the judge model.Full recipe
my_custom_recipe.py

