Adapting models to align with human or AI feedback through fine-tuning is at the core of Adaptive Engine.
It enables users to bootstrap initial models to beat frontier performance on specific tasks using only synthetic data, and then continuously improve
them with production feedback.
Adaptive Engine has built-in, robust recipes for supervised fine-tuning and reinforcement learning. These recipes can also be customized,
allowing you to tweak and explore hyperparameters.
Adaptive Engine supports different training objectives:
Adapt using existing feedback - fine-tunes a model to improve an outcome you have previously logged via UI or SDK.
Teach behaviour with natural language guidelines - provide simple textual guidelines to define what constitutes a good and bad completion for your use case;
an AI judge will use them to align your model with the desired behaviour. Reference completions and existing feedback are not required, only prompts are used.
Reward with external feedback endpoint - set up an external endpoint to provide feedback on completions during training. Enables any custom reward function and is particularly useful for tasks where execution feedback is available, such as database queries, code execution, etc. Read more about how to configure a reward server.
Supervised fine-tuning - standard SFT; fine-tunes your model using reference completions, no reinforcement learning involved.
You can use the Adaptive SDK to launch either of the above:
Teach behaviour with natural language guidelines (from logged interactions)
Copy
Ask AI
# Teach behaviour with natural language guidelines# Each guideline must have a name and a description, which is the body of the guideline# It is recommended that each guideline aims to instill a single behaviour, with little to no overlap amongst different guidelinesguidelines = [ { "name": "overpromise", "description": "The response should avoid unsupported promises or guarantees about product features or performance" }, { "name": "professional-tone", "description": "The tone of the response should be professional, empathetic, and aligned with customer service standards" }, { "name": "call-to-action", "description": "The response should encourage follow-up if additional help is needed, to ensure the customer’s issue is fully resolved" }]adapt_job = adaptive.training.jobs.create( model="llama_3.1_8b_instruct", output_model_name="llama-8b-support-guidelines", data_source="COMPLETIONS", data_config={"selection_type": "LAST", "max_samples": 5000}, # Train on the last 5000 logged interactions alignment_objective={ "guidelines": { "judge_model": "llama-3.3-70b-instruct", "judge_model_prompt": guidelines } },)
If you want more control over training, you can customize the training method, sample selection and hyperparameters in your config.
See the SDK Reference for the full training config specification.
.create will create and register a new model you can deploy for inference, or A/B Test
against the base model or others for validation.