> ## Documentation Index
> Fetch the complete documentation index at: https://docs.adaptive-ml.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Graders

> Score completions for training and evaluation

Graders evaluate LLM completions and provide quantitative feedback. Use them as reward signals for training or to assess model performance during evaluation.

<Tabs>
  <Tab title="SDK" icon="code">
    ## Create an AI judge

    AI judges use an LLM to grade completions based on a criterion you define:

    ```python theme={null}
    adaptive.graders.create.binary_judge(
        key="helpful-judge",
        criteria="The response directly answers the user's question without going off-topic",
        judge_model="llama-3.1-8b-instruct",
        feedback_key="helpfulness",
    )
    ```

    | Parameter      | Type | Required | Description                                |
    | -------------- | ---- | -------- | ------------------------------------------ |
    | `key`          | str  | Yes      | Unique identifier                          |
    | `criteria`     | str  | Yes      | What constitutes a pass (natural language) |
    | `judge_model`  | str  | Yes      | Model to use as judge                      |
    | `feedback_key` | str  | Yes      | Feedback key to write scores to            |

    The judge returns PASS/FAIL for each completion along with reasoning.

    ## Grader types

    | Type              | Method                       | Use when                                      |
    | ----------------- | ---------------------------- | --------------------------------------------- |
    | AI judge          | `create.binary_judge()`      | Criteria can be expressed in natural language |
    | Pre-built         | `create.prebuilt_judge()`    | RAG evaluation (faithfulness, relevancy)      |
    | External endpoint | `create.external_endpoint()` | Scoring requires an external system           |
    | Custom            | `create.custom()`            | Python logic in recipes                       |

    ## Pre-built graders

    For RAG applications, use pre-built graders optimized by Adaptive:

    ```python theme={null}
    adaptive.graders.create.prebuilt_judge(
        key="rag-faithfulness",
        type="FAITHFULNESS",
        judge_model="llama-3.1-8b-instruct",
    )
    ```

    * **Faithfulness**: Does the completion adhere to provided context?
    * **Context Relevancy**: Is the retrieved context relevant to the query?
    * **Answer Relevancy**: Does the completion answer the question?

    <Accordion title="How pre-built graders work">
      **Faithfulness** breaks the completion into atomic claims and checks each against the context:

      ```
      score = claims supported by context / total claims
      ```

      Pass context as `document` turns in the input messages. Each retrieved chunk should be a separate turn.

      **Sample:**

      ```json theme={null}
      {"role": "document", "content": "Tim Berners-Lee created the first website."}
      {"role": "document", "content": "Tim Berners-Lee invented the world wide web."}
      {"role": "user", "content": "Who published the first website?"}
      ```

      Completion: "Tim Berners-Lee published the first website in August 1990."

      Score: 0.5 (first claim supported, date claim unsupported)

      ***

      **Context Relevancy** checks if retrieved chunks are relevant to the query:

      ```
      score = relevant chunks / total chunks
      ```

      ***

      **Answer Relevancy** checks if the completion addresses the question:

      ```
      score = relevant claims / total claims
      ```

      Extra information not requested by the user lowers the score.
    </Accordion>

    For reward servers and custom graders, see [Reward Servers](/v0.12/advanced/reward-servers) and [Custom Recipes](/v0.12/harmony/overview).

    See [SDK Reference](/v0.12/reference/sdk) for all grader methods.
  </Tab>

  <Tab title="UI" icon="mouse-pointer">
    ## Create a grader

    Navigate to your use case and open the **Graders** tab. Click **Create Grader** and select a type.

    For AI judges, provide a natural language criterion that defines what a passing completion looks like.

    <Frame caption="Define a criterion for AI judge graders">
      <img src="https://mintlify.s3.us-west-1.amazonaws.com/adaptiveml/static/ai_judge_shot.png" />
    </Frame>

    Add examples to improve judge accuracy. Include both passing and failing examples with justifications.

    ## Use in recipes

    After creating a grader, select it when configuring training or evaluation recipes. The grader scores each completion and provides feedback for the run.
  </Tab>
</Tabs>
