Define judges that will score model answers.
BinaryJudgeScorer
evaluates outputs based on specific criteria and returns a binary 1/0 score.
All the response formatting and parsing logic is already implemented in the class, you can only write your criteria in the criteria field.
FaithfulnessScorer
evaluates how faithful a response is to the input context.
It scores each sentence in the last assistant turn as fully supported by the context or not (1 or 0). The final score is the average of each sentence.
The context is the rest of the thread, excluding the system prompt.
It requires an input language code to help to split the sentences.
CombinedScorer
allows you to combine multiple scorers into a single scoring function:
Scorer
class.. Here’s how to build your own scorer:
The Scorer
abstract class only have the async method score
to implement that should return a ScoreWithMetadata object.
Both Scorer and ScoreWithMetadata are to be imported from harmony.scoring.base_scorer
from_function
classmethod. In this case, the scoring function can return a float.
model.render_schema
function takes a Pydantic model and generates a schema description that can be used to instruct the model on the expected output format.
generate_and_validate
function generates an answer with the model, validates the completion against a Pydantic model, and retries if validation fails.