> ## Documentation Index
> Fetch the complete documentation index at: https://docs.adaptive-ml.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Structured generation

> Generate text with Adaptive models that follows a desired JSON schema

The `adaptive_harmony` library includes methods to help you generate structured outputs with LLMs that adhere to a specific JSON schema, as well as to render simplified, LLM-readable JSON schemas you can prompt your LLM with to explain what output structure to follow. You can achieve both by using annotated Pydantic Models such as the following:

```python theme={null}
from typing import Literal
from pydantic import BaseModel, Field

class BinaryJudgeOutput(BaseModel):
    reasoning: str = Field(
        description="Reasoning to support the rationale behind the score."
    )
    grade: Literal["PASS", "FAIL"] = Field(
        description="The grade for the sample."
    )
```

The benefit of depending on Pydantic models is that the model definition becomes the single source of truth across the output structure instructions in a prompt, the parsing of a text response back into that model, and referring to known properties in the object that results from parsing and type validation.

### Instruct LLM to follow desired output structure

Expanding on the example above, `BinaryJudgeOutput` is the response structure we would expect from an LLM judge that classifies completions as "PASS" or "FAIL" according to some user-defined eval criteria.

A simple system prompt we could use for this judge would be something like:

> You are an evaluator of human to AI interactions. <br />
>
> You will be given a full interaction between a human and an AI model, as well as an evaluation criterion. <br />
>
> Your task is to evaluate the AI's response against the criterion. If the response respects and complies with the criterion, you must grade it with a "PASS", otherwise you must grade it with a "FAIL". <br />
> You must reason about the interaction and whether it respects the criterion in a short paragraph before you decide on the final grade. <br />
> You must return your output as a valid JSON string that strictly adheres to the following schema, with no preamble or postamble: <br /><br />
> `{json_schema}`

### Render simplified JSON schema

You could then create a simplified json schema from your model definition:

```python theme={null}
from adaptive_harmony.core.structured_output import render_schema

schema_str = render_schema(BinaryJudgeOutput)

print(schema_str)
```

Output:

```
{
  "reasoning": str,
  "grade": Literal["PASS", "FAIL"]
}

reasoning: Reasoning to support the rationale behind the score.
grade: The grade for the sample.
```

## Generate with a response model

For Pydantic-validated output without prompt engineering, pass the model class directly to `generate()`. The schema is enforced via constrained decoding and the result is parsed into your model.

```python theme={null}
from adaptive_harmony import InferenceModel, StringThread

async def main():
    model: InferenceModel
    thread: StringThread

    result, parsed = await model.generate(thread, BinaryJudgeOutput)
    print(parsed.grade)  # "PASS" or "FAIL"
    print(parsed.reasoning)
```

`generate(thread, response_model)` returns a `(StringThread, BaseModel)` tuple — the first item is the model's response thread, the second is the validated Pydantic instance. `generate_tokens(thread, response_model)` works the same way and returns a `(TokenizedThread, BaseModel)` tuple, which is what you want for RL training where you need raw tokens for log-probabilities.

JSON Schemas emitted to the underlying engine include `additionalProperties: false` on every object schema, so output is compatible with strict-mode JSON validators (e.g., OpenAI's strict mode).

## Generate text and validate Pydantic model

`generate_and_validate` is the older path. It instructs the LLM via prompt and validates the response after generation, with one automatic retry on parse failure. Prefer `generate(thread, MyModel)` above for new code; use `generate_and_validate` when you cannot use constrained decoding (e.g., when targeting a model that does not support it).

```python theme={null}
from adaptive_harmony import HarmonyClient, InferenceModel, StringThread
from adaptive_harmony.core.utils import stringify_thread
from adaptive_harmony.core.structured_output import JsonParseError

async def main():
    # instantiate this before
    client: HarmonyClient
    # spawn an inference model before with client.model().spawn_inference()
    model: InferenceModel
    # original interaction being evaluated 
    original_thread: StringThread
    # system prompt template from above
    system_prompt_template: str
    # convert original thread into a string representation
    stringified_original_thread = stringify_thread(original_thread)

    # build a thread to prompt the judge LLM with
    judging_thread = (
        StringThread
        .system(system_prompt_template.format())
        .user(f"INTERACTION TO EVALUATE:\n{stringified_original_thread}")
    )

    # returns both the generated text and the validate pydantic model
    try:
        output_text, output_pydantic_model = await model.generate_and_validate(
            thread=judging_thread,
            pydantic_model=BinaryJudgeOutput
        )
    except JsonParseError:
        print("Model failed to output valid structure")
```

<Info>
  By default, `.generate_and_validate()` retries generation once with correction instructions if the LLM fails to comply with the specified format. You can control how many retries are attempted by passing `max_parsing_retries` to the method.
</Info>
