1576259 : XFEL GPU nodes for Maxwell

Created: 2026-02-12T13:58:13Z - current status: new

|

Anonymized & Summarized Issue

Summary: A user is inquiring about recent hardware offers for GPU nodes to replace aging infrastructure in a high-performance computing (HPC) cluster. The request includes: - Scope: ~20 compute nodes + display nodes. - Market challenges: Significant price increases (e.g., memory up to 500%, SSDs/HDDs 40–80%), volatile GPU pricing, and erratic delivery times. - GPU preferences: - High FP64 performance: Options like NVIDIA B200 or H200 (1/2/4 GPUs per system). - Lower FP64 performance: Options like B300, L40s, or RTX Pro 6000 (for display nodes). - Vendor considerations: Dell (shorter quote validity) vs. Megware (longer delivery times). - Request: Guidance on preferred models and quotes tailored to their use case.

Solution/Next Steps: 1. Clarify Requirements: - Specify the intended workload (e.g., AI, simulations, visualization) to determine GPU suitability (e.g., FP64 vs. FP8/INT8). - Confirm node configuration (e.g., GPUs/system, memory needs, CPU preferences). - Prioritize budget constraints or delivery timelines.

  1. Leverage Existing Cluster Knowledge:
  2. The Maxwell cluster’s comgpu partition (shared with industry) uses H200 GPUs (4 per node, 1.5TB RAM), which may serve as a reference for high-memory, high-FP64 workloads.
  3. For display nodes, consider GPUs optimized for visualization (e.g., L40s or RTX Pro 6000).

  4. Vendor Engagement:

  5. Request quotes from Dell (faster but shorter validity) and Megware (longer lead times) for comparison.
  6. Highlight the need for HBM memory (critical for GPUs like H200/B200) and confirm availability of 64GB/128GB DIMMs.

  7. Cost Mitigation:

  8. Explore bulk discounts or phased procurement to offset price volatility.
  9. Consider refurbished/off-lease hardware for non-critical nodes if budget is constrained.

  10. Contact Support:

  11. Reach out to the cluster’s support team (maxwell.service@desy.de) with a brief description of the intended applications to refine recommendations.

Sources Referenced: 1. Maxwell Cluster GPU Partitions (2025-09-29) 2. Hardware Specifications for comgpu Partition