- SDK
- UI
Create an AI judge
AI judges use an LLM to grade completions based on a criterion you define:| Parameter | Type | Required | Description |
|---|---|---|---|
key | str | Yes | Unique identifier |
criteria | str | Yes | What constitutes a pass (natural language) |
judge_model | str | Yes | Model to use as judge |
feedback_key | str | Yes | Feedback key to write scores to |
Grader types
| Type | Method | Use when |
|---|---|---|
| AI judge | create.binary_judge() | Criteria can be expressed in natural language |
| Pre-built | create.prebuilt_judge() | RAG evaluation (faithfulness, relevancy) |
| External endpoint | create.external_endpoint() | Scoring requires an external system |
| Custom | create.custom() | Python logic in recipes |
Pre-built graders
For RAG applications, use pre-built graders optimized by Adaptive:- Faithfulness: Does the completion adhere to provided context?
- Context Relevancy: Is the retrieved context relevant to the query?
- Answer Relevancy: Does the completion answer the question?
How pre-built graders work
How pre-built graders work
Faithfulness breaks the completion into atomic claims and checks each against the context:Pass context as Completion: “Tim Berners-Lee published the first website in August 1990.”Score: 0.5 (first claim supported, date claim unsupported)
Context Relevancy checks if retrieved chunks are relevant to the query:
Answer Relevancy checks if the completion addresses the question:Extra information not requested by the user lowers the score.
document turns in the input messages. Each retrieved chunk should be a separate turn.Sample:Context Relevancy checks if retrieved chunks are relevant to the query:
Answer Relevancy checks if the completion addresses the question:


