Monitoring

Nội dung này hiện chưa có sẵn bằng ngôn ngữ của bạn.

The Inference Stats dashboard shows how your runtime is performing in real time: latency, throughput, error rate, and where time is spent in the pipeline. It updates continuously while a datasource is streaming.

Key indicators

Latency percentiles — p50 / p95 / p99 of end-to-end inference time.
Throughput — predictions per second.
Error rate — share of inferences that failed.
Execution provider — whether inference is running on TensorRT, CUDA, or CPU. A GPU box that falls back to CPU shows up here.

Latency breakdown

Each inference is decomposed so you can see where time goes:

Queue wait — time spent waiting in the request queue
Pre-process — input normalization
Model exec — pure inference time
Post-process — output decoding

If latency rises, the breakdown tells you whether the model itself slowed down or the box is saturated upstream.

Window and bucket controls

Two controls shape the time view:

Window — how far back you look (for example 5m, 1h, 24h).
Bucket — how wide each point on the chart is (for example 10s, 1m).

The number of points is window ÷ bucket. The dashboard auto-snaps the bucket when you change the window so charts stay readable — wider windows use wider buckets.

Reading a fresh install

If the chart says “no data”, the most common reasons are:

No datasource is streaming yet — enable and Start a source.
The install is still warming up — wait for the first window of samples.

If something goes wrong

Empty charts, unexpected CPU fallback, or rising error rate — see the Troubleshooting runbook.

Next steps

Check versions Confirm frontend and backend versions match.

Connect a datasource Start streaming to populate the dashboard.