Monitoring

The Monitoring tab surfaces training and RL telemetry — loss curves, reward signals, gradient norms, validation metrics — streamed live from a running job and persisted for the run’s lifetime. Pick multiple runs to compare them side-by-side on the same axes. Monitoring covers training-time signals only. For inference-time signals (TTFT, latency, token counts) and post-completion feedback, see Metrics.

Default metrics

Pre-built training recipes auto-emit the metrics below — no extra code in your run config:

Recipe	Auto-emitted metrics
`sft`	`train/loss`, `train/gradient_norm`, `val/loss` (when validation enabled), `val/<grader_key>` (when grader configured)
`preference_rlhf`, `metric_rlhf`, `rl`	`train/loss`, `train/reward`, `train/kl`, `train/gradient_norm`, plus stage-specific metrics for multi-stage runs
`eval`	Per-grader scalar scores aggregated across the dataset

Metric streams flush to the platform every 0.5 seconds and appear in the UI within seconds of being emitted.

Log custom metrics from a Harmony recipe

For custom recipes, get a logger from the recipe context and call it like a function:

from adaptive_harmony.metric_logger import get_prod_logger

@recipe_main
async def my_recipe(config: MyConfig, ctx: RecipeContext):
    logger = get_prod_logger(ctx)

    # Wire the dashboard URL into the run record so users can click through
    if logger.training_monitoring_link:
        ctx.job.set_monitoring_link(logger.training_monitoring_link)

    for step in range(num_steps):
        loss = await train_one_step(...)
        logger({"train/loss": loss, "train/lr": current_lr})

The logger accepts any Mapping[str, int | float | Table]. Each call advances the internal step counter once.

Logging tables

For structured per-step data — sample completions, gradient breakdowns, layer statistics — log a Table:

from adaptive_harmony.logging_table import Table

samples = Table()
samples.add_row(["prompt-1", completion_text, score])
samples.add_row(["prompt-2", completion_text_2, score_2])

logger({"eval/samples": samples})

Tables appear in the run detail view, paginated and searchable.

Backend selection

get_prod_logger auto-selects a logging backend based on environment variables present in the recipe sandbox: WandB → MLflow → TensorBoard → stdout. The Adaptive monitoring backend is added in addition (not as a replacement) when ADAPTIVE_BASE_URL and ADAPTIVE_API_KEY are set, so metrics always reach the platform UI even if you also log to a third-party tracker.Set ADAPTIVE_MONITORING_DISABLED=1 to opt out of the Adaptive backend (e.g., for local-only development).

See SDK Reference for monitoring methods.

Retention

Monitoring metrics are tied to the run’s lifetime — they live as long as the run record does. Deleting a run deletes its metrics. There is no separate TTL for metrics.

Start

Core

Advanced

Deploy

Updates

Default metrics

Log custom metrics from a Harmony recipe

Logging tables

Watch a run

Compare runs side-by-side

Pick a checkpoint

Retention

Start

Core

Advanced

Deploy

Updates

Documentation Index

​Default metrics

​Log custom metrics from a Harmony recipe

​Logging tables

​Watch a run

​Compare runs side-by-side

​Pick a checkpoint

​Retention

Default metrics

Log custom metrics from a Harmony recipe

Logging tables

Watch a run

Compare runs side-by-side

Pick a checkpoint

Retention