GPU not used — CUDA error 500
Symptom
Section titled “Symptom”The box has an NVIDIA GPU, but inference runs on CPU. The inference container log shows:
CUDA failure 500: named symbol not found... resolved to CPU onlyIn the dashboard, the Python Inference card shows CPU (fallback) instead of TensorRT or CUDA. With strict GPU enforcement enabled the container instead exits and keeps restarting.
Confirm
Section titled “Confirm”The decisive test is NVIDIA’s own compute sample. Run it on the host:
docker run --rm --gpus all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0- Healthy host → prints
Test PASSED. - Broken host →
Failed to allocate device vector A (error code named symbol not found)!
If the GPUs are listed by nvidia-smi -L but this sample fails, the GPU is unusable for compute in containers. The problem is the host GPU stack, not the Xisom image.
You can also confirm the fallback in the running container’s log:
docker compose -f docker-compose.release.yml logs inference | grep -E "CUDA failure 500|resolved to CPU"This is a host-level GPU-compute problem on the platform driver layer, so the fix is on the host — not in the Xisom stack.
-
Update the NVIDIA GPU driver on the host to a current build with full CUDA compute support. On a Windows + WSL2 host, also run
wsl --updatethenwsl --shutdownafterwards. -
Re-wire the container runtime to the refreshed driver:
Terminal window sudo nvidia-ctk runtime configure --runtime=dockersudo systemctl restart docker -
Re-run the compute sample until it prints
Test PASSED:Terminal window docker run --rm --gpus all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0 -
Restart the inference container and confirm it picked up the GPU. No config change is needed — the default execution mode auto-selects the best available provider:
Terminal window docker compose -f docker-compose.release.yml restart inferencedocker compose -f docker-compose.release.yml logs inference | grep "EP enabled"# expect: "TensorRT EP enabled" or "CUDA EP enabled"
While the host is still broken
Section titled “While the host is still broken”Run cleanly on CPU instead of fighting the GPU probe. Set EXECUTION_MODE=cpu (or FORCE_CPU=1) on the inference service and restart. The box keeps predicting on CPU until the driver is fixed.
Do not enable strict GPU enforcement (STRICT_EP=1) with a GPU mode on a broken host — it will correctly refuse to start rather than silently run on CPU, which is the opposite of what you want during the workaround.
Prevent
Section titled “Prevent”- Make a silent CPU fallback loud where GPU is mandatory. On boxes that must run on GPU, pin the provider and enable strict enforcement (
EXECUTION_MODE=cuda+STRICT_EP=1) so a missing GPU library fails the container at startup instead of quietly degrading to CPU. - Treat the compute sample as the acceptance gate after any GPU-driver or host update —
nvidia-smipassing is not sufficient evidence that inference will use the GPU. - First run is slow, not stuck. On first GPU start the engine is compiled and cached (2–3 minutes). Subsequent restarts are fast. Don’t mistake the long startup window for the failure above.
Related
Section titled “Related”- Execution provider fell back to CPU — when the host GPU is fine but the provider still drops a tier.
- Hardware Setup — supported GPU tiers.
- Observability & Alerts — where the active provider is shown.