1571646 : running calculations on maxwell server¶
Created: 2026-01-28T07:30:25Z - current status: new¶
Here is the anonymized and summarized version of the user's query, along with suggested solutions:
Summary of Issues¶
A user running Quantum Espresso (QE) on the Maxwell cluster reports the following concerns:
1. GPU acceleration not detected in QE calculations, despite QE supporting NVIDIA GPUs.
2. Verification of GPU usage on a specific node (e.g., max-wng001).
3. Requesting high-performance nodes with the best GPUs via Slurm (#SBATCH directives).
4. Unclear parameter -pd .true. in the mpirun command (undocumented in QE manuals).
Solutions & Recommendations¶
1. Enabling GPU Acceleration in QE¶
- Issue: The script uses
qe/7.4.1-cuda, but GPU acceleration is not detected. - Solution:
- Ensure the GPU-aware MPI environment is correctly loaded. The user’s script already loads
qe/7.4.1-cuda, but additional Slurm directives are needed:bash #SBATCH --gres=gpu:1 # Request 1 GPU per node #SBATCH --partition=maxgpu # Already present in the script #SBATCH --constraint="P100|V100|A100" # Request specific GPU models (optional) - Verify GPU support in QE by checking the output log for:
Using GPU acceleration (CUDA) - If missing, ensure the QE build (
qe/7.4.1-cuda) was compiled with GPU support. Contact support if unsure.
2. Checking GPU Usage on a Node¶
- Issue: How to verify GPU usage on a node (e.g.,
max-wng001). - Solution:
- Method 1: SSH into the node and run:
bash ssh max-wng001 nvidia-smi- This shows active GPU processes, memory usage, and GPU model.
- Method 2: Use Slurm’s
srunto check GPU status without SSH:bash srun --jobid=<JOBID> --overlap nvidia-smi - Method 3: Check job logs for GPU-related output (e.g.,
slurm-*.out).
3. Requesting High-Performance Nodes¶
- Issue: How to target nodes with the best GPUs via
AVAIL_FEATURESinsinfo. - Solution:
- Use
#SBATCH --constraintto request specific hardware. Example:bash #SBATCH --constraint="A100|V100" # Prioritize A100 or V100 GPUs #SBATCH --partition=maxgpu # Already present - For CPU performance, combine constraints (e.g.,
75F3for AMD EPYC):bash #SBATCH --constraint="A100&75F3" # A100 GPU + EPYC 75F3 CPU - List available features with:
bash sinfo -o "%N %f" # Shows nodes and their features
4. Clarifying -pd .true. in QE¶
- Issue: The parameter
-pd .true.is undocumented in QE manuals. - Solution:
- This flag is not standard in QE. It may be a:
- Custom wrapper script (check
/software/qe/7.4.1-cuda/bin/pw.x --help). - Typo (possibly intended as
-nd .true.for "no daemon" mode, though this is speculative).
- Custom wrapper script (check
- Recommendation: Remove
-pd .true.and test if the job runs correctly. If issues arise, consult the Maxwell cluster’s QE documentation or support team.
Revised Script Example¶
#!/bin/bash
#SBATCH --time=7-0:00:00
#SBATCH --partition=maxgpu
#SBATCH --job-name=moo3.ph6
#SBATCH --output=slurm-%j-%x-%N.out
#SBATCH --gres=gpu:1 # Request 1 GPU
#SBATCH --constraint="A100|V100" # Target high-end GPUs
unset LD_PRELOAD
cd "/path/to/working/directory"
source /etc/profile.d/modules.sh
module purge
module load maxwell qe/7.4.1-cuda
REF=$(date +'%Y%m%d%H%M%S')
export OMPI_MCA_btl=^openib,uct,ofi
export OMPI_MCA_mtl=^ofi
export OMPI_MCA_pml=ucx
export OMPI_MCA_opal_warn_on_missing_libcuda=0
export OMP_NUM_THREADS=1
# Use all available GPUs on the node
N=$(nvidia-smi -L | wc -l)
# Remove -pd .true. unless confirmed necessary
mpirun -N $N pw.x -i file.inp > file.$REF.out