Skip to content

GPU not used — CUDA error 500

The box has an NVIDIA GPU, but inference runs on CPU. The inference container log shows:

CUDA failure 500: named symbol not found
... resolved to CPU only

In the dashboard, the Python Inference card shows CPU (fallback) instead of TensorRT or CUDA. With strict GPU enforcement enabled the container instead exits and keeps restarting.

The decisive test is NVIDIA’s own compute sample. Run it on the host:

Terminal window
docker run --rm --gpus all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
  • Healthy host → prints Test PASSED.
  • Broken hostFailed to allocate device vector A (error code named symbol not found)!

If the GPUs are listed by nvidia-smi -L but this sample fails, the GPU is unusable for compute in containers. The problem is the host GPU stack, not the Xisom image.

You can also confirm the fallback in the running container’s log:

Terminal window
docker compose -f docker-compose.release.yml logs inference | grep -E "CUDA failure 500|resolved to CPU"

This is a host-level GPU-compute problem on the platform driver layer, so the fix is on the host — not in the Xisom stack.

  1. Update the NVIDIA GPU driver on the host to a current build with full CUDA compute support. On a Windows + WSL2 host, also run wsl --update then wsl --shutdown afterwards.

  2. Re-wire the container runtime to the refreshed driver:

    Terminal window
    sudo nvidia-ctk runtime configure --runtime=docker
    sudo systemctl restart docker
  3. Re-run the compute sample until it prints Test PASSED:

    Terminal window
    docker run --rm --gpus all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
  4. Restart the inference container and confirm it picked up the GPU. No config change is needed — the default execution mode auto-selects the best available provider:

    Terminal window
    docker compose -f docker-compose.release.yml restart inference
    docker compose -f docker-compose.release.yml logs inference | grep "EP enabled"
    # expect: "TensorRT EP enabled" or "CUDA EP enabled"

Run cleanly on CPU instead of fighting the GPU probe. Set EXECUTION_MODE=cpu (or FORCE_CPU=1) on the inference service and restart. The box keeps predicting on CPU until the driver is fixed.

Do not enable strict GPU enforcement (STRICT_EP=1) with a GPU mode on a broken host — it will correctly refuse to start rather than silently run on CPU, which is the opposite of what you want during the workaround.

  • Make a silent CPU fallback loud where GPU is mandatory. On boxes that must run on GPU, pin the provider and enable strict enforcement (EXECUTION_MODE=cuda + STRICT_EP=1) so a missing GPU library fails the container at startup instead of quietly degrading to CPU.
  • Treat the compute sample as the acceptance gate after any GPU-driver or host update — nvidia-smi passing is not sufficient evidence that inference will use the GPU.
  • First run is slow, not stuck. On first GPU start the engine is compiled and cached (2–3 minutes). Subsequent restarts are fast. Don’t mistake the long startup window for the failure above.