Adapting models to align with human or AI feedback through fine-tuning is at the core of Adaptive Engine. It enables users to bootstrap initial models to beat frontier performance on specific tasks using only synthetic data, and then continuously improve them with production feedback.

Adaptive Engine has built-in, robust recipes for supervised fine-tuning and reinforcement learning. These recipes can also be customized, allowing you to tweak and explore hyperparameters.

Adaptive Engine supports different training objectives:

  1. Adapt using existing feedback - fine-tunes a model to improve an outcome you have previously logged via UI or SDK.
  2. Teach behaviour with natural language guidelines - provide simple textual guidelines to define what constitutes a good and bad completion for your use case; an AI judge will use them to align your model with the desired behaviour. Reference completions and existing feedback are not required, only prompts are used.
  3. Reward with external feedback endpoint - set up an external endpoint to provide feedback on completions during training. Enables any custom reward function and is particularly useful for tasks where execution feedback is available, such as database queries, code execution, etc. Read more about how to configure a reward server.
  4. Supervised fine-tuning - standard SFT; fine-tunes your model using reference completions, no reinforcement learning involved.

You can use the Adaptive SDK to launch either of the above:

Create an Adaptive client first

Adapt on existing feedback (from dataset)
# Train on uploaded dataset
dataset_key = "support-dataset"

adapt_job = adaptive.training.jobs.create(
    model="llama-3.1-8b-instruct",
    output_model_name="llama-8b-support-acceptance",
    data_source="DATASET",
    data_config={"dataset": dataset_key},
    feedback_type="DIRECT",
    alignment_objective={"metric": {"metric_key": "acceptance"}},
)

If you want more control over training, you can customize the training method, sample selection and hyperparameters in your config. See the SDK Reference for the full training config specification.

.create will create and register a new model you can deploy for inference, or A/B Test against the base model or others for validation.