Graders

Graders evaluate LLM completions and produce metric scores. Use them as reward signals for training or to assess model performance during evaluation.

Create an AI judge

AI judges use an LLM to grade completions based on a criterion you define:

adaptive.graders.create.binary_judge(
    key="helpful-judge",
    criteria="The response directly answers the user's question without going off-topic",
    judge_model="llama-3.1-8b-instruct",
    feedback_key="helpfulness",
)

Parameter	Type	Required	Description
`key`	str	Yes	Unique identifier
`criteria`	str	Yes	What constitutes a pass (natural language)
`judge_model`	str	Yes	Model to use as judge
`feedback_key`	str	Yes	Feedback key to write scores to

The judge returns PASS/FAIL for each completion along with reasoning.

Prompt templates for AI judges

AI judges use Handlebars templates for their prompts. Template variables give you access to the conversation context, completion, and metadata.Basic syntax:

{{{completion}}}            — insert without HTML escaping (always use for text)
{{#if metadata.domain}}     — conditional block
...
{{else}}
...
{{/if}}     
{{#each turns}}             — loop over a list (list of turn objects below)
Turn {{@index}}:
{{role}}: {{content}}
{{/each}}

Example template:

System: "You are a judge. Evaluate based on: {{criteria}}.
         Output this JSON schema: {{output_schema}}"

User: "Context:\n{{context_str_without_last_user}}
       Question:\n{{last_user_turn_content}}
       Response:\n{{completion}}"

All template variables

Variable	Description
`completion`	The assistant’s completion being evaluated
`last_user_turn_content`	Content of the final user turn
`context_str`	Full conversation context as a formatted string
`context_str_without_last_user`	Context excluding the final user turn
`turns`	All turns as a list of `{role, content}` dicts
`context_turns`	All turns except the completion
`context_turns_without_last_user`	Context turns without the last user turn
`metadata`	Thread metadata dict
`output_schema`	Expected output JSON schema
`template_variables`	Custom variables passed at judge initialization

Use triple braces ({{{var}}}) for variables that may contain HTML entities or special characters.

Grader types

Type	Method	Use when
AI judge	`create.binary_judge()`	Criteria can be expressed in natural language
Pre-built	`create.prebuilt_judge()`	RAG evaluation (faithfulness, relevancy)
External endpoint	`create.external_endpoint()`	Scoring requires an external system
Custom	`create.custom()`	Python logic in recipes

Pre-built graders

For RAG applications, use pre-built graders optimized by Adaptive:

adaptive.graders.create.prebuilt_judge(
    key="rag-faithfulness",
    type="FAITHFULNESS",
    judge_model="llama-3.1-8b-instruct",
)

Faithfulness: Does the completion adhere to provided context?
Context Relevancy: Is the retrieved context relevant to the query?
Answer Relevancy: Does the completion answer the question?

How pre-built graders work

Faithfulness breaks the completion into atomic claims and checks each against the context:

score = claims supported by context / total claims

Pass context as document turns in the input messages. Each retrieved chunk should be a separate turn.Sample:

{"role": "document", "content": "Tim Berners-Lee created the first website."}
{"role": "document", "content": "Tim Berners-Lee invented the world wide web."}
{"role": "user", "content": "Who published the first website?"}

Completion: “Tim Berners-Lee published the first website in August 1990.”Score: 0.5 (first claim supported, date claim unsupported)

Context Relevancy checks if retrieved chunks are relevant to the query:

score = relevant chunks / total chunks

Answer Relevancy checks if the completion addresses the question:

score = relevant claims / total claims

Extra information not requested by the user lowers the score.

For reward servers and custom graders, see Reward Servers and Custom Recipes.See SDK Reference for all grader methods.

Start

Core

Advanced

Deploy

Updates

Create an AI judge

Prompt templates for AI judges

Grader types

Pre-built graders

Create a grader

Use in recipes

​Create an AI judge

​Prompt templates for AI judges

​Grader types

​Pre-built graders

​Create a grader

​Use in recipes

Create an AI judge

Prompt templates for AI judges

Grader types

Pre-built graders

Create a grader

Use in recipes