Monitoring
Nội dung này hiện chưa có sẵn bằng ngôn ngữ của bạn.
The Inference Stats dashboard shows how your runtime is performing in real time: latency, throughput, error rate, and where time is spent in the pipeline. It updates continuously while a datasource is streaming.
Key indicators
Section titled “Key indicators”- Latency percentiles — p50 / p95 / p99 of end-to-end inference time.
- Throughput — predictions per second.
- Error rate — share of inferences that failed.
- Execution provider — whether inference is running on TensorRT, CUDA, or CPU. A GPU box that falls back to CPU shows up here.
Latency breakdown
Section titled “Latency breakdown”Each inference is decomposed so you can see where time goes:
- Queue wait — time spent waiting in the request queue
- Pre-process — input normalization
- Model exec — pure inference time
- Post-process — output decoding
If latency rises, the breakdown tells you whether the model itself slowed down or the box is saturated upstream.
Window and bucket controls
Section titled “Window and bucket controls”Two controls shape the time view:
- Window — how far back you look (for example
5m,1h,24h). - Bucket — how wide each point on the chart is (for example
10s,1m).
The number of points is window ÷ bucket. The dashboard auto-snaps the bucket when
you change the window so charts stay readable — wider windows use wider buckets.
Reading a fresh install
Section titled “Reading a fresh install”If the chart says “no data”, the most common reasons are:
- No datasource is streaming yet — enable and Start a source.
- The install is still warming up — wait for the first window of samples.
If something goes wrong
Section titled “If something goes wrong”- Empty charts, unexpected CPU fallback, or rising error rate — see the Troubleshooting runbook.
Next steps
Section titled “Next steps” Check versions Confirm frontend and backend versions match.
Connect a datasource Start streaming to populate the dashboard.