1581246 : Unusually slow Maxwell nodes¶
Created: 2026-03-03T14:43:28Z - current status: new¶
Here is the anonymized, summarized report and suggested solution:
Summary of Issue¶
A user reports performance inconsistencies when running multiple jobs using the pycalibration package (used for detector calibration at [RESEARCH_FACILITY]) on the Maxwell cluster. While some jobs complete within the expected timeframe (~30–60 minutes), others on specific nodes take significantly longer (5–8 hours). The issue persists regardless of memory allocation (tested with 500 GB and 700 GB) and across different datasets.
The user observed that slow performance correlates with specific nodes (visible in attached screenshots, now anonymized as [NODE_1], [NODE_2], etc.).
Possible Causes & Solutions¶
- Hardware Heterogeneity
- The Maxwell cluster includes nodes with varying hardware (e.g., different CPU models, memory speeds, or interconnects). Slower nodes may have older CPUs (e.g., Intel V4 E5-2640, which are known to cause MPI errors) or less performant memory subsystems.
-
Solution:
- Explicitly request nodes with uniform hardware using
--constraintin the Slurm script. For example:bash #SBATCH --constraint='Gold-6240|EPYC-7402' # Target newer, homogeneous hardware - Verify node specifications with
sinfo -o "%N %c %m %f"to identify outliers.
- Explicitly request nodes with uniform hardware using
-
Memory Contention
- While the user requested 700 GB, the cluster does not enforce consumable memory. Other jobs on the same node may compete for memory bandwidth, causing slowdowns.
-
Solution:
- Use the script template from random-samples->Ensuring minimum memory per core to dynamically adjust cores based on available memory:
bash mem_per_core=$((40*1024)) # 40 GB per core for node in $(srun hostname -s | sort -u); do slots=$(( $(sinfo -n $node --noheader -o '%m') / $mem_per_core )) echo "$node slots=$slots" >> $HOSTFILE done mpirun --hostfile $HOSTFILE ...
- Use the script template from random-samples->Ensuring minimum memory per core to dynamically adjust cores based on available memory:
-
I/O Bottlenecks
- Pycalibration may involve heavy I/O (e.g., reading/writing large datasets). Slower nodes might have degraded storage performance or network latency.
-
Solution:
- Ensure jobs use
/beegfs(high-performance storage) and avoid/tmpor home directories. - Add
--exclusiveto the Slurm script to prevent resource sharing:bash #SBATCH --exclusive
- Ensure jobs use
-
MPI/Threading Issues
- If pycalibration uses MPI or multithreading, misconfiguration (e.g., oversubscribing cores) could cause slowdowns.
-
Solution:
- Limit threads per MPI rank (e.g., 2–4 threads per rank) and ensure total cores match the node’s physical cores:
bash total_cores=$(nproc) np=$(( $total_cores / 4 )) # Example: 4 threads per rank mpirun -np $np --map-by node ...
- Limit threads per MPI rank (e.g., 2–4 threads per rank) and ensure total cores match the node’s physical cores:
-
Node-Specific Problems
- Some nodes may have underlying hardware issues (e.g., failing memory, overheating).
- Solution:
- Report the slow nodes (
[NODE_1],[NODE_2], etc.) to Maxwell support for diagnostics. - Exclude problematic nodes using
--excludein the Slurm script:bash #SBATCH --exclude=[NODE_1],[NODE_2]
- Report the slow nodes (
Recommended Next Steps¶
-
Test with Homogeneous Hardware Submit a test job with
--constraint='Gold-6240'to isolate hardware-related slowdowns. -
Monitor Resource Usage Use
sacctorseff <JOBID>to check CPU/memory utilization during slow runs. Look for: - High
%memor%cpuusage by other jobs on the same node. -
I/O wait times (
iowaitintop). -
Consult pycalibration Documentation Verify if the package has known issues with specific MPI implementations or threading models.
-
Contact Support If the issue persists, provide:
- The exact Slurm script used.
- Output of
scontrol show job <JOBID>for slow jobs. - Logs showing resource usage (e.g.,
topsnapshots).