adaptive_harmony
first:
Step by step guide
In this recipe, we train a model on completion safety feedback judged by Llama 3.3 70B as a grader, using PPO.1
Create a new python file
Custom recipes are written as single python files. You can store it anywhere you want in your codebase.
Let’s create a recipe Fill it with this recipe skeleton:The decorator At the end of the debugging, you can remove this connection and retrieve the client from the context.
my_custom_recipe.py
my_custom_recipe.py
@recipe_main
defines a single async function in the file as the main entrypoint that Adaptive Engine should run when the recipe is launched. This decorator is required in order to upload a recipe to Adaptive.When you first start writing a recipe, in order to more easily run and debug it, we suggest you instantiate a harmony client using
get_client
directly as explained in Harmony client and local testing
. You can then just run python my_custom_recipe.py
without concerning yourself with RecipeContext
. When you upload your recipe to Adaptive however, RecipeContext
is a mandatory input argument for your @recipe_main
function.my_custom_recipe.py
2
Load models
We begin by spawning the policy model and the value model.You can specify the tensor parallel degree for each spawned model using
.tp()
.3
Load Dataset
We load a dataset from the Hugging Face Dataset Hub as an example.
The helper functions facilitate converting a Hugging Face dataset to a list of
StringThread
, the format for chat messages + metadata used throughout Adaptive recipes. See Loading datasets and StringThread to find out how to load a dataset that has been uploaded to Adaptive.4
Define a Grader
We then define the grader that will be used for feedback during training.
grader.setup()
handles preparing the grader for training or evaluation, which in this case actually spawns the judge model.5
Adapt the model
Finally, we pass all of our models, grader, and parameters to the PPO trainer.
Full recipe
my_custom_recipe.py