1580332 : High number of files in /gpfs/cfel/user/goercksj

Created: 2026-02-27T13:23:35Z - current status: new

**

Anonymized Summary

A user has generated a large number of small files (approx. 78 million) in a GPFS-CFEL user directory (/gpfs/cfel/user/[USERNAME]), specifically under SPIND/output/run_*_zarr/. While the total storage usage is only ~377 GiB (well below quota), the sheer volume of files accounts for ~45% of all files on the /gpfs/cfel filesystem. The system administrator seeks clarification on whether this file generation is intentional or part of the user’s workflow.


Suggested Solution/Next Steps

  1. Workflow Review:
  2. If the file generation is intentional (e.g., part of a data processing pipeline), the user should:
    • Consolidate files where possible (e.g., merge small files into larger archives or use efficient formats like HDF5/Zarr groups).
    • Clean up obsolete files regularly to avoid hitting system limits (e.g., inode exhaustion).
    • Redirect output to a more suitable filesystem (e.g., dCache for long-term storage or DUST for temporary/reproducible data).
  3. If the file generation is unintended (e.g., a bug or misconfiguration), the user should:

    • Investigate the application’s output logic (e.g., check if intermediate files can be disabled or batched).
    • Test with smaller datasets to validate the workflow.
  4. Filesystem-Specific Recommendations:

  5. GPFS-CFEL is optimized for fast, large-file I/O but may struggle with high file counts due to metadata overhead. Alternatives:

    • dCache: For long-term storage of large datasets (contact osm.service@desy.de if group space is needed).
    • DUST: For temporary/reproducible data (but requires manual cleanup).
    • Scratch/TMP: For short-lived intermediate files (but subject to automatic deletion).
  6. Administrative Actions:

  7. If the workflow cannot be optimized, the user should request a quota adjustment (e.g., for inodes) by contacting CFEL IT or maxwell.service@desy.de.
  8. Monitor the directory growth to prevent system-wide performance degradation.

Sources

  1. Maxwell Storage Systems Overview
  2. Where to Store Scientific Data
  3. GPFS-CFEL Documentation (CFEL-specific policies).