# GPU not used — CUDA error 500

> Inference container logs "CUDA failure 500" and falls back to CPU though a GPU is installed.

## Symptom

The box has an NVIDIA GPU, but inference runs on CPU. The inference container log shows:

```
CUDA failure 500: named symbol not found
... resolved to CPU only
```

In the dashboard, the **Python Inference** card shows CPU (fallback) instead of `TensorRT` or `CUDA`. With strict GPU enforcement enabled the container instead exits and keeps restarting.

`nvidia-smi` reporting your GPU as present does **not** mean CUDA compute works. The
management layer (`nvidia-smi`) and the compute driver are different things — the
compute driver is what failed here.

## Confirm

The decisive test is NVIDIA's own compute sample. Run it on the host:

```bash
docker run --rm --gpus all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
```

- **Healthy host** → prints `Test PASSED`.
- **Broken host** → `Failed to allocate device vector A (error code named symbol not found)!`

If the GPUs are listed by `nvidia-smi -L` but this sample fails, the GPU is unusable for compute in containers. The problem is the host GPU stack, not the Xisom image.

You can also confirm the fallback in the running container's log:

```bash
docker compose -f docker-compose.release.yml logs inference | grep -E "CUDA failure 500|resolved to CPU"
```

## Fix

This is a host-level GPU-compute problem on the platform driver layer, so the fix is on the host — not in the Xisom stack.

1. Update the **NVIDIA GPU driver** on the host to a current build with full CUDA compute support. On a Windows + WSL2 host, also run `wsl --update` then `wsl --shutdown` afterwards.

   
   Updating Docker / the container platform alone does **not** fix this. The compute
   failure lives in the GPU driver layer below the container runtime. A platform update
   can even reset GPU passthrough settings and make things worse.
   

2. Re-wire the container runtime to the refreshed driver:

   ```bash
   sudo nvidia-ctk runtime configure --runtime=docker
   sudo systemctl restart docker
   ```

3. Re-run the compute sample until it prints `Test PASSED`:

   ```bash
   docker run --rm --gpus all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
   ```

4. Restart the inference container and confirm it picked up the GPU. No config change is needed — the default execution mode auto-selects the best available provider:

   ```bash
   docker compose -f docker-compose.release.yml restart inference
   docker compose -f docker-compose.release.yml logs inference | grep "EP enabled"
   # expect: "TensorRT EP enabled" or "CUDA EP enabled"
   ```

### While the host is still broken

Run cleanly on CPU instead of fighting the GPU probe. Set `EXECUTION_MODE=cpu` (or `FORCE_CPU=1`) on the inference service and restart. The box keeps predicting on CPU until the driver is fixed.

Do **not** enable strict GPU enforcement (`STRICT_EP=1`) with a GPU mode on a broken host — it will correctly refuse to start rather than silently run on CPU, which is the opposite of what you want during the workaround.

## Prevent

- **Make a silent CPU fallback loud where GPU is mandatory.** On boxes that *must* run on GPU, pin the provider and enable strict enforcement (`EXECUTION_MODE=cuda` + `STRICT_EP=1`) so a missing GPU library fails the container at startup instead of quietly degrading to CPU.
- **Treat the compute sample as the acceptance gate** after any GPU-driver or host update — `nvidia-smi` passing is not sufficient evidence that inference will use the GPU.
- **First run is slow, not stuck.** On first GPU start the engine is compiled and cached (2–3 minutes). Subsequent restarts are fast. Don't mistake the long startup window for the failure above.

## Related

- [Execution provider fell back to CPU](/troubleshooting/execution-provider-fallback/) — when the host GPU is fine but the provider still drops a tier.
- [Hardware Setup](/install-deploy/hardware-setup/) — supported GPU tiers.
- [Observability & Alerts](/operate/monitoring/) — where the active provider is shown.
