> ## Documentation Index
> Fetch the complete documentation index at: https://docs.adaptive-ml.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Monitoring

> Watch training telemetry across runs in real time

The Monitoring tab surfaces training and RL telemetry — loss curves, reward signals, gradient norms, validation metrics — streamed live from a running job and persisted for the run's lifetime. Pick multiple runs to compare them side-by-side on the same axes.

Monitoring covers training-time signals only. For inference-time signals (TTFT, latency, token counts) and post-completion feedback, see [Metrics](/v0.14/core/metrics).

<Tabs>
  <Tab title="SDK" icon="code">
    ## Default metrics

    Pre-built training recipes auto-emit the metrics below — no extra code in your run config:

    | Recipe                                 | Auto-emitted metrics                                                                                                   |
    | -------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
    | `sft`                                  | `train/loss`, `train/gradient_norm`, `val/loss` (when validation enabled), `val/<grader_key>` (when grader configured) |
    | `preference_rlhf`, `metric_rlhf`, `rl` | `train/loss`, `train/reward`, `train/kl`, `train/gradient_norm`, plus stage-specific metrics for multi-stage runs      |
    | `eval`                                 | Per-grader scalar scores aggregated across the dataset                                                                 |

    Metric streams flush to the platform every 0.5 seconds and appear in the UI within seconds of being emitted.

    ## Log custom metrics from a Harmony recipe

    For [custom recipes](/v0.14/harmony/overview), get a logger from the recipe context and call it like a function:

    ```python theme={null}
    from adaptive_harmony.metric_logger import get_prod_logger

    @recipe_main
    async def my_recipe(config: MyConfig, ctx: RecipeContext):
        logger = get_prod_logger(ctx)

        # Wire the dashboard URL into the run record so users can click through
        if logger.training_monitoring_link:
            ctx.job.set_monitoring_link(logger.training_monitoring_link)

        for step in range(num_steps):
            loss = await train_one_step(...)
            logger({"train/loss": loss, "train/lr": current_lr})
    ```

    The logger accepts any `Mapping[str, int | float | Table]`. Each call advances the internal step counter once.

    ### Logging tables

    For structured per-step data — sample completions, gradient breakdowns, layer statistics — log a `Table`:

    ```python theme={null}
    from adaptive_harmony.logging_table import Table

    samples = Table()
    samples.add_row(["prompt-1", completion_text, score])
    samples.add_row(["prompt-2", completion_text_2, score_2])

    logger({"eval/samples": samples})
    ```

    Tables appear in the run detail view, paginated and searchable.

    <Accordion title="Backend selection">
      `get_prod_logger` auto-selects a logging backend based on environment variables present in the recipe sandbox: WandB → MLflow → TensorBoard → stdout. The Adaptive monitoring backend is added in addition (not as a replacement) when `ADAPTIVE_BASE_URL` and `ADAPTIVE_API_KEY` are set, so metrics always reach the platform UI even if you also log to a third-party tracker.

      Set `ADAPTIVE_MONITORING_DISABLED=1` to opt out of the Adaptive backend (e.g., for local-only development).
    </Accordion>

    See [SDK Reference](/v0.14/reference/sdk) for monitoring methods.
  </Tab>

  <Tab title="UI" icon="mouse-pointer">
    ## Watch a run

    Open the **Monitoring** tab in your project. Each running or completed run appears in the list with status, recipe, and headline metric. Click a run to open its detail view — charts update in real time as the job emits metrics (no refresh required).

    Use the smoothing controls (EMA / SMA / Gaussian) to denoise noisy curves like RL reward without losing the underlying signal.

    ## Compare runs side-by-side

    Select multiple runs from the list and open **Compare**. The compare view overlays each run's curves on the same axes and shows a config diff — every recipe argument that differs between the runs is highlighted. Use this for hyperparameter sweeps, A/B comparisons of dataset versions, or picking the best checkpoint to promote.

    Run selection is unrestricted — you can compare any two (or more) runs in the project, even across different recipe types.

    ## Pick a checkpoint

    Charts are most useful right before promotion: scrub the run timeline, identify the step where validation loss bottoms out (or reward stabilizes), then jump to the run's **Checkpoints** tab to promote the matching saved checkpoint. See [Promoted checkpoints](/v0.14/core/models#promoted-checkpoints).

    The platform does not auto-rank checkpoints by metric — promotion is a manual decision based on what you see in the charts.
  </Tab>
</Tabs>

## Retention

Monitoring metrics are tied to the run's lifetime — they live as long as the run record does. Deleting a run deletes its metrics. There is no separate TTL for metrics.
