1574671 : Cannot find GPU using CUDA¶

Created: 2026-02-06T22:54:49Z - current status: new¶

Anonymized Summary:

A user is encountering an error while attempting to run the PyNX module in a JupyterHub notebook on the Maxwell cluster (specifically on the allgpu partition). The error indicates that the GPU (NVIDIA A100-PCIE-40GB) cannot be initialized, despite the notebook running on a GPU-enabled node. The issue appears to stem from OpenCL/CUDA compatibility or misconfiguration in the custom kernel (pynx-py312-env).

Core Issue:¶

The PyNX module fails to detect or initialize the GPU, returning:

Failed initialising GPU. Please check GPU name [NVIDIA A100-PCIE-40GB] or CUDA/OpenCL installation.

Even though the job is running on the allgpu partition, the environment may not be properly configured for CUDA/OpenCL access.

Possible Solutions/Next Steps:¶

1. Verify CUDA/OpenCL Environment¶

Ensure the custom kernel (pynx-py312-env) has the correct CUDA toolkit and OpenCL drivers installed.
Load the appropriate CUDA module (e.g., module load maxwell cuda/11.8) before launching the notebook.
Check if the kernel includes PyCUDA or PyOpenCL dependencies.

2. Check GPU Visibility¶

Run a simple test in the notebook to confirm GPU visibility: python import torch # or tensorflow/cupy print(torch.cuda.is_available()) # Should return True print(torch.cuda.get_device_name(0)) # Should match the GPU name (e.g., A100)
If this fails, the environment may lack GPU support.

3. Kernel Configuration¶

If the kernel was created manually, ensure it includes:
- CUDA-compatible Python packages (e.g., cupy, pycuda).
- Correct paths to CUDA libraries (e.g., LD_LIBRARY_PATH).
Example for a conda environment: bash mamba create -n pynx-env python=3.12 pynx cuda-toolkit=11.8 -c conda-forge

4. JupyterHub-Specific Checks¶

Confirm the notebook is running on a GPU node (check !nvidia-smi in a cell).
If using a batch job, ensure the script includes: bash #SBATCH --partition=allgpu #SBATCH --gres=gpu:1 # Request 1 GPU module load maxwell cuda/11.8

5. PyNX-Specific Debugging¶

PyNX may require explicit CUDA/OpenCL flags. Try: python from pynx import * # Force CUDA backend (if supported) os.environ["PYNX_CUDA"] = "1"

6. Fallback: Use Pre-Installed Environments¶

If the custom kernel is problematic, test PyNX in a pre-installed environment (e.g., tensorflow-2.11 or rapids-22.04): bash module load maxwell conda/3.9 cuda/11.8 . mamba-init mamba activate tensorflow-2.11

References:¶

Maxwell TensorFlow Documentation
Maxwell CUDA/GPU Setup (for environment configuration)
PyNX Documentation (for backend-specific settings)