1576259 : XFEL GPU nodes for Maxwell¶
Created: 2026-02-12T13:58:13Z - current status: new¶
|
Anonymized & Summarized Issue¶
Summary: A user is inquiring about recent hardware offers for GPU nodes to replace aging infrastructure in a high-performance computing (HPC) cluster. The request includes: - Scope: ~20 compute nodes + display nodes. - Market challenges: Significant price increases (e.g., memory up to 500%, SSDs/HDDs 40–80%), volatile GPU pricing, and erratic delivery times. - GPU preferences: - High FP64 performance: Options like NVIDIA B200 or H200 (1/2/4 GPUs per system). - Lower FP64 performance: Options like B300, L40s, or RTX Pro 6000 (for display nodes). - Vendor considerations: Dell (shorter quote validity) vs. Megware (longer delivery times). - Request: Guidance on preferred models and quotes tailored to their use case.
Solution/Next Steps: 1. Clarify Requirements: - Specify the intended workload (e.g., AI, simulations, visualization) to determine GPU suitability (e.g., FP64 vs. FP8/INT8). - Confirm node configuration (e.g., GPUs/system, memory needs, CPU preferences). - Prioritize budget constraints or delivery timelines.
- Leverage Existing Cluster Knowledge:
- The Maxwell cluster’s
comgpupartition (shared with industry) uses H200 GPUs (4 per node, 1.5TB RAM), which may serve as a reference for high-memory, high-FP64 workloads. -
For display nodes, consider GPUs optimized for visualization (e.g., L40s or RTX Pro 6000).
-
Vendor Engagement:
- Request quotes from Dell (faster but shorter validity) and Megware (longer lead times) for comparison.
-
Highlight the need for HBM memory (critical for GPUs like H200/B200) and confirm availability of 64GB/128GB DIMMs.
-
Cost Mitigation:
- Explore bulk discounts or phased procurement to offset price volatility.
-
Consider refurbished/off-lease hardware for non-critical nodes if budget is constrained.
-
Contact Support:
- Reach out to the cluster’s support team (
maxwell.service@desy.de) with a brief description of the intended applications to refine recommendations.
Sources Referenced:
1. Maxwell Cluster GPU Partitions (2025-09-29)
2. Hardware Specifications for comgpu Partition