Troubleshooting
Find your symptom in the table, jump to the fix. Every runbook page follows the same shape: Symptom → Confirm → Fix → Prevent.
Find your symptom
Section titled “Find your symptom”| If you see… | Go to |
|---|---|
GPU installed but inference runs on CPU; log shows CUDA failure 500: named symbol not found | GPU not used — CUDA error 500 |
| Inference container exits/restarts when GPU is enforced and the host GPU is broken | GPU not used — CUDA error 500 |
| Host GPU works, but provider shows a lower tier than expected (e.g. CUDA instead of TensorRT, or CPU) | Running on CPU when GPU expected |
Log line Failed to load library libonnxruntime_providers_tensorrt.so | Running on CPU when GPU expected |
Enabling a datasource bounces back to Disabled with model_not_paired or adapter_connect_failed | Datasource down or faulted |
| A streaming OPC-UA / MQTT / CSV source stops; row shows a fault indicator / red banner | Datasource down or faulted |
| Datasource is Enabled but no predictions appear | Datasource down or faulted |
| Settings → About highlights the frontend / backend version in amber | Frontend / backend version mismatch |
Version reads 0.0.0-dev… or ends in -dirty | Frontend / backend version mismatch |
| Disk filling up; database file far larger than the data it holds; deleting rows doesn’t shrink it | Disk fills / database grows unbounded |
A container shows unhealthy in docker ps but the app serves fine | Container marked unhealthy but service works |