1571646 : running calculations on maxwell server

Created: 2026-01-28T07:30:25Z - current status: new

Here is the anonymized and summarized version of the user's query, along with suggested solutions:


Summary of Issues

A user running Quantum Espresso (QE) on the Maxwell cluster reports the following concerns: 1. GPU acceleration not detected in QE calculations, despite QE supporting NVIDIA GPUs. 2. Verification of GPU usage on a specific node (e.g., max-wng001). 3. Requesting high-performance nodes with the best GPUs via Slurm (#SBATCH directives). 4. Unclear parameter -pd .true. in the mpirun command (undocumented in QE manuals).


Solutions & Recommendations

1. Enabling GPU Acceleration in QE

  • Issue: The script uses qe/7.4.1-cuda, but GPU acceleration is not detected.
  • Solution:
  • Ensure the GPU-aware MPI environment is correctly loaded. The user’s script already loads qe/7.4.1-cuda, but additional Slurm directives are needed: bash #SBATCH --gres=gpu:1 # Request 1 GPU per node #SBATCH --partition=maxgpu # Already present in the script #SBATCH --constraint="P100|V100|A100" # Request specific GPU models (optional)
  • Verify GPU support in QE by checking the output log for: Using GPU acceleration (CUDA)
  • If missing, ensure the QE build (qe/7.4.1-cuda) was compiled with GPU support. Contact support if unsure.

2. Checking GPU Usage on a Node

  • Issue: How to verify GPU usage on a node (e.g., max-wng001).
  • Solution:
  • Method 1: SSH into the node and run: bash ssh max-wng001 nvidia-smi
    • This shows active GPU processes, memory usage, and GPU model.
  • Method 2: Use Slurm’s srun to check GPU status without SSH: bash srun --jobid=<JOBID> --overlap nvidia-smi
  • Method 3: Check job logs for GPU-related output (e.g., slurm-*.out).

3. Requesting High-Performance Nodes

  • Issue: How to target nodes with the best GPUs via AVAIL_FEATURES in sinfo.
  • Solution:
  • Use #SBATCH --constraint to request specific hardware. Example: bash #SBATCH --constraint="A100|V100" # Prioritize A100 or V100 GPUs #SBATCH --partition=maxgpu # Already present
  • For CPU performance, combine constraints (e.g., 75F3 for AMD EPYC): bash #SBATCH --constraint="A100&75F3" # A100 GPU + EPYC 75F3 CPU
  • List available features with: bash sinfo -o "%N %f" # Shows nodes and their features

4. Clarifying -pd .true. in QE

  • Issue: The parameter -pd .true. is undocumented in QE manuals.
  • Solution:
  • This flag is not standard in QE. It may be a:
    • Custom wrapper script (check /software/qe/7.4.1-cuda/bin/pw.x --help).
    • Typo (possibly intended as -nd .true. for "no daemon" mode, though this is speculative).
  • Recommendation: Remove -pd .true. and test if the job runs correctly. If issues arise, consult the Maxwell cluster’s QE documentation or support team.

Revised Script Example

#!/bin/bash
#SBATCH --time=7-0:00:00
#SBATCH --partition=maxgpu
#SBATCH --job-name=moo3.ph6
#SBATCH --output=slurm-%j-%x-%N.out
#SBATCH --gres=gpu:1  # Request 1 GPU
#SBATCH --constraint="A100|V100"  # Target high-end GPUs

unset LD_PRELOAD
cd "/path/to/working/directory"

source /etc/profile.d/modules.sh
module purge
module load maxwell qe/7.4.1-cuda

REF=$(date +'%Y%m%d%H%M%S')

export OMPI_MCA_btl=^openib,uct,ofi
export OMPI_MCA_mtl=^ofi
export OMPI_MCA_pml=ucx
export OMPI_MCA_opal_warn_on_missing_libcuda=0
export OMP_NUM_THREADS=1

# Use all available GPUs on the node
N=$(nvidia-smi -L | wc -l)

# Remove -pd .true. unless confirmed necessary
mpirun -N $N pw.x -i file.inp > file.$REF.out

Sources

  1. Maxwell QE Documentation
  2. Maxwell Slurm GPU Jobs
  3. Quantum Espresso GPU Support (external)