1579769 : max-p3ag037

Created: 2026-02-25T17:19:43Z - current status: new


Anonymized Summary: A new compute node has been added to the GPFS cluster and is ready for use. The remaining task is to integrate it into the SLURM workload manager, which requires discussion due to multiple possible configurations.

Suggested Solution: 1. Clarify SLURM Integration Requirements: - Determine the intended use case (e.g., partition assignment, resource limits, or preemption rules). - Decide whether the node should be part of an existing partition (e.g., maxgpu, compgpu, or hpcgwgpu) or a new one. - Define SLURM constraints (e.g., GPU type, memory, or CPU architecture) based on the node’s hardware.

  1. Review SLURM Configuration Options:
  2. Update the SLURM configuration file (slurm.conf) to include the new node, specifying its resources (e.g., GPUs, CPUs, memory).
  3. Adjust partition definitions if needed (e.g., adding the node to compgpu for general access or hpcgwgpu for industry collaborations).
  4. Set job limits (e.g., concurrent jobs per user) if applicable (e.g., for compgpu).

  5. Coordinate with Cluster Administrators:

  6. Schedule a call or meeting to align on the integration approach, especially if the node serves a specific purpose (e.g., AI workloads via HPC Gateway).
  7. Test the integration with a small batch job to verify functionality.

References: - Maxwell Cluster Documentation: SLURM Batch Jobs - Maxwell Blog: New GPU Partitions (2025-09-29) (for partition-specific rules).