Evaluate models using AI feedback
document
turns in the input messages. You can add all retrieved context in a single turn,
but preferably each retrieved chunk should be added as a separate turn.
Sample input/output
document
turn in the input messages.
The output score is an average, computed by the ratio of chunks that contain information relevant to answering the user query / total number of input chunks
(which equals the # of document turns in the prompt).
Sample input/output
Sample input/output
score = 1
) or not compliant (score = 0
) with the guideline.
The output score is the average computed across all scores for individual guidelines.
The textual judgement attached to each completion contains one reasoning trace per guideline, indicating why the completion did or did not adhere to the guideline.
Sample input/output
Tips on writing effective evaluation guidelines