In Adaptive Engine, you can evaluate models to know which off-the-shelf model performs best on your task, or how much did your fine-tuned model improve vs. its base model or others after training.

Launch an evaluation

Evaluation is done by launching a run of an evaluation recipe. An Evaluation recipe is a recipe that produces an EvaluationArtefact. Adaptive provides a built-in recipe for most evaluation use-cases. For more tailored usage (using custom graders for example), you can create your own recipe with your own completion grading strategy by following our guides Custom Graders and Write an Evaluation Recipe

Visualize results

Once an Evaluation run is finished, it will produce an Evaluation Artifact. This will contain:
  1. An Evaluation score table that summarises all model’s scores for the graders that were used during the eval
  2. A detailed list of interactions from all graded samples in the evaluated dataset.

Evaluation table