Launching a model training from custom recipes
data_set
: List of StringThread
objects containing training examplesmodel
: Training model instancelogger
: Logger for tracking training metricslr
: Learning ratesamples_per_batch
: Batch sizemax_grad_norm
: Gradient clipping normdata_set
: List of StringThread
promptsmodel
: Policy model for trainingvalue_model
: Value model for advantage estimationscoring_fn
: Function that returns reward scoreslr_policy
: Policy learning ratelr_value
: Value learning ratekl_beta
: KL divergence penalty coefficientclip_range
: PPO clipping rangedata_set
: List of StringThread
promptsmodel
: Training modelscoring_fn
: Function that returns reward scorescompletions_per_sample
: Number of completions per promptlr
: Learning ratekl_beta
: KL divergence penalty coefficientdata_set
: List of tuples containing (preferred_response, non_preferred_response)model
: Training modellogger
: Logger for tracking metricslr
: Learning ratesamples_per_batch
: Batch sizebeta
: DPO beta parameter