Documentation Index
Fetch the complete documentation index at: https://docs.adaptive-ml.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Training callbacks allow you to execute custom logic periodically during training runs. They’re useful for monitoring model performance, generating sample outputs, saving checkpoints, and evaluating on validation sets without interrupting the main training loop. All trainer classes accept acallbacks parameter where you can pass a list of callback instances.
How callbacks work
Callbacks are triggered at regular intervals based on training progress (measured as a percentage of the job’s completion). Each callback:- Runs periodically - You specify how often (e.g., every 10% of training via
frequency=0.1) - Returns metrics - Callbacks return dictionaries that can be logged to your metric logger
Using callbacks
Pass callbacks to any training class via thecallbacks parameter:
Built-in callbacks
GraderEvalCallback
Evaluates your model on a validation set using a grader. This is the most common callback for monitoring training progress on held-out data.validation/rewards/*- All metrics from the grader’sget_logs()methodvalidation/generation_length_mean- Average generation lengthvalidation/generation_length_std- Standard deviation of generation lengthvalidation/num_samples- Number of samples evaluated
ValidationLossCallback
Computes the negative log-likelihood loss on a validation set. Useful for monitoring overfitting in supervised fine-tuning.validation/loss- Average negative log-likelihood on validation set
GenerateSamplesCallback
Generates and logs sample completions periodically. Useful for qualitatively inspecting model outputs during training.generation/samples- Table with columns:system,prompt,responsegeneration/generation_length_mean- Average completion lengthgeneration/generation_length_std- Standard deviation of completion lengthgeneration/num_samples- Number of samples generated
Creating custom callbacks
To create your own callback, inherit fromRecipeCallback and implement the callback method:
frequency- How often to trigger (e.g.,0.1= every 10% of training)log_key_prefix- Optional prefix for all logged metric keys- Return a dictionary of metrics to be logged
- Use async/await for any I/O operations
Best practices
-
Start with low frequency - Callbacks can slow down training if you run them too often. Start with
frequency=0.1or0.2and adjust as needed. - Use small validation sets - Keep validation sets small (e.g., 100-500 samples) for faster evaluation.
-
Combine callbacks strategically:
- GraderEvalCallback - Essential for RL training to monitor reward on validation data
- GenerateSamplesCallback - Helpful for debugging and qualitative inspection
- ValidationLossCallback - Useful for SFT to detect overfitting
-
Set appropriate temperatures:
- Use
temperature=0.0for deterministic evaluation (recommended for validation) - Use
temperature=1.0for diverse samples in GenerateSamplesCallback
- Use
- Mind your validation data - Make sure validation prompts don’t overlap with training data to get accurate generalization metrics. Also, make sure you don’t include completions in validation data, otherwise the evaluated model will see them when asked to generate a new one.
Example: comprehensive monitoring
Here’s a complete example combining multiple callbacks:- Evaluate on validation set every 10% of training
- Generate sample completions every 10% for qualitative inspection
- Log all metrics to your configured metric logger (W&B, MLflow, etc.)

