# Monitoring

> Read the inference dashboard — latency breakdown, throughput, error rate, and the window/bucket time controls.

The **Inference Stats** dashboard shows how your runtime is performing in real time:
latency, throughput, error rate, and where time is spent in the pipeline. It updates
continuously while a datasource is streaming.

## Key indicators

- **Latency percentiles** — p50 / p95 / p99 of end-to-end inference time.
- **Throughput** — predictions per second.
- **Error rate** — share of inferences that failed.
- **Execution provider** — whether inference is running on TensorRT, CUDA, or CPU.
  A GPU box that falls back to CPU shows up here.

## Latency breakdown

Each inference is decomposed so you can see where time goes:

- **Queue wait** — time spent waiting in the request queue
- **Pre-process** — input normalization
- **Model exec** — pure inference time
- **Post-process** — output decoding

If latency rises, the breakdown tells you whether the model itself slowed down or the
box is saturated upstream.

## Window and bucket controls

Two controls shape the time view:

- **Window** — how far back you look (for example `5m`, `1h`, `24h`).
- **Bucket** — how wide each point on the chart is (for example `10s`, `1m`).

The number of points is `window ÷ bucket`. The dashboard auto-snaps the bucket when
you change the window so charts stay readable — wider windows use wider buckets.

**How the chart stays steady**

The latest point is aligned to a bucket boundary, so the current bucket never shows
an artificially low count. Empty buckets are filled with zeros, so the line stays
continuous even when traffic is sparse.

**Data retention**

Live metrics are retained for **3 days**. Windows longer than that return no data. A
brand-new install needs a short warm-up before the charts show traffic.

## Reading a fresh install

If the chart says "no data", the most common reasons are:

- No datasource is **streaming** yet — enable and **Start** a source.
- The install is still warming up — wait for the first window of samples.

## If something goes wrong

- Empty charts, unexpected CPU fallback, or rising error rate — see the
  [Troubleshooting](/troubleshooting/) runbook.

## Next steps

  - [Check versions](/operate/versioning/) — Confirm frontend and backend versions match.
  - [Connect a datasource](/configure/input-datasources/) — Start streaming to populate the dashboard.