1580332 : High number of files in /gpfs/cfel/user/goercksj¶
Created: 2026-02-27T13:23:35Z - current status: new¶
**
Anonymized Summary¶
A user has generated a large number of small files (approx. 78 million) in a GPFS-CFEL user directory (/gpfs/cfel/user/[USERNAME]), specifically under SPIND/output/run_*_zarr/. While the total storage usage is only ~377 GiB (well below quota), the sheer volume of files accounts for ~45% of all files on the /gpfs/cfel filesystem. The system administrator seeks clarification on whether this file generation is intentional or part of the user’s workflow.
Suggested Solution/Next Steps¶
- Workflow Review:
- If the file generation is intentional (e.g., part of a data processing pipeline), the user should:
- Consolidate files where possible (e.g., merge small files into larger archives or use efficient formats like HDF5/Zarr groups).
- Clean up obsolete files regularly to avoid hitting system limits (e.g., inode exhaustion).
- Redirect output to a more suitable filesystem (e.g., dCache for long-term storage or DUST for temporary/reproducible data).
-
If the file generation is unintended (e.g., a bug or misconfiguration), the user should:
- Investigate the application’s output logic (e.g., check if intermediate files can be disabled or batched).
- Test with smaller datasets to validate the workflow.
-
Filesystem-Specific Recommendations:
-
GPFS-CFEL is optimized for fast, large-file I/O but may struggle with high file counts due to metadata overhead. Alternatives:
- dCache: For long-term storage of large datasets (contact
osm.service@desy.deif group space is needed). - DUST: For temporary/reproducible data (but requires manual cleanup).
- Scratch/TMP: For short-lived intermediate files (but subject to automatic deletion).
- dCache: For long-term storage of large datasets (contact
-
Administrative Actions:
- If the workflow cannot be optimized, the user should request a quota adjustment (e.g., for inodes) by contacting CFEL IT or
maxwell.service@desy.de. - Monitor the directory growth to prevent system-wide performance degradation.
Sources¶
- Maxwell Storage Systems Overview
- Where to Store Scientific Data
- GPFS-CFEL Documentation (CFEL-specific policies).