Structured generation - Adaptive ML Documentation

The adaptive_harmony library includes methods to help you generate structured outputs with LLMs that adhere to a specific JSON schema, as well as to render simplified, LLM-readable JSON schemas you can prompt your LLM with to explain what output structure to follow. You can achieve both by using annotated Pydantic Models such as the following:

from typing import Literal
from pydantic import BaseModel, Field

class BinaryJudgeOutput(BaseModel):
    reasoning: str = Field(
        description="Reasoning to support the rationale behind the score."
    )
    grade: Literal["PASS", "FAIL"] = Field(
        description="The grade for the sample."
    )

The benefit of depending on Pydantic models is that the model definition becomes the single source of truth across the output structure instructions in a prompt, the parsing of a text response back into that model, and referring to known properties in the object that results from parsing and type validation.

Instruct LLM to follow desired output structure

Expanding on the example above, BinaryJudgeOutput is the response structure we would expect from an LLM judge that classifies completions as “PASS” or “FAIL” according to some user-defined eval criteria. A simple system prompt we could use for this judge would be something like:

You are an evaluator of human to AI interactions.
You will be given a full interaction between a human and an AI model, as well as an evaluation criterion.
Your task is to evaluate the AI’s response against the criterion. If the response respects and complies with the criterion, you must grade it with a “PASS”, otherwise you must grade it with a “FAIL”.
You must reason about the interaction and whether it respects the criterion in a short paragraph before you decide on the final grade.
You must return your output as a valid JSON string that strictly adheres to the following schema, with no preamble or postamble:

{json_schema}

Render simplified JSON schema

You could then create a simplified json schema from your model definition:

from adaptive_harmony.core.structured_output import render_schema

schema_str = render_schema(BinaryJudgeOutput)

print(schema_str)

Output:

{
  "reasoning": str,
  "grade": Literal["PASS", "FAIL"]
}

reasoning: Reasoning to support the rationale behind the score.
grade: The grade for the sample.

Generate with a response model

For Pydantic-validated output without prompt engineering, pass the model class directly to generate(). The schema is enforced via constrained decoding and the result is parsed into your model.

from adaptive_harmony import InferenceModel, StringThread

async def main():
    model: InferenceModel
    thread: StringThread

    result, parsed = await model.generate(thread, BinaryJudgeOutput)
    print(parsed.grade)  # "PASS" or "FAIL"
    print(parsed.reasoning)

generate(thread, response_model) returns a (StringThread, BaseModel) tuple — the first item is the model’s response thread, the second is the validated Pydantic instance. generate_tokens(thread, response_model) works the same way and returns a (TokenizedThread, BaseModel) tuple, which is what you want for RL training where you need raw tokens for log-probabilities. JSON Schemas emitted to the underlying engine include additionalProperties: false on every object schema, so output is compatible with strict-mode JSON validators (e.g., OpenAI’s strict mode).

Generate text and validate Pydantic model

generate_and_validate is the older path. It instructs the LLM via prompt and validates the response after generation, with one automatic retry on parse failure. Prefer generate(thread, MyModel) above for new code; use generate_and_validate when you cannot use constrained decoding (e.g., when targeting a model that does not support it).

from adaptive_harmony import HarmonyClient, InferenceModel, StringThread
from adaptive_harmony.core.utils import stringify_thread
from adaptive_harmony.core.structured_output import JsonParseError

async def main():
    # instantiate this before
    client: HarmonyClient
    # spawn an inference model before with client.model().spawn_inference()
    model: InferenceModel
    # original interaction being evaluated 
    original_thread: StringThread
    # system prompt template from above
    system_prompt_template: str
    # convert original thread into a string representation
    stringified_original_thread = stringify_thread(original_thread)

    # build a thread to prompt the judge LLM with
    judging_thread = (
        StringThread
        .system(system_prompt_template.format())
        .user(f"INTERACTION TO EVALUATE:\n{stringified_original_thread}")
    )

    # returns both the generated text and the validate pydantic model
    try:
        output_text, output_pydantic_model = await model.generate_and_validate(
            thread=judging_thread,
            pydantic_model=BinaryJudgeOutput
        )
    except JsonParseError:
        print("Model failed to output valid structure")

By default, .generate_and_validate() retries generation once with correction instructions if the LLM fails to comply with the specified format. You can control how many retries are attempted by passing max_parsing_retries to the method.

​Instruct LLM to follow desired output structure

​Render simplified JSON schema

​Generate with a response model

​Generate text and validate Pydantic model

Instruct LLM to follow desired output structure

Render simplified JSON schema

Generate with a response model

Generate text and validate Pydantic model