CVMFS¶
CernVM File SystemCVMFS is a read-only, HTTP-based distributed file system developed at CERN, designed to deliver software and data to computing nodes at scale. It presents a POSIX file system interface (mounted under /cvmfs/
CVMFS is the standard mechanism for distributing experiment software stacks in the Worldwide LHC Computing Grid (WLCG) and is used at this site to provide software environments for experiments including ATLAS, CMS, Belle 2, and LHCb. Repository content is published centrally, replicated to a local Stratum-1 server, and served to clients via a site Squid proxy cache. Clients cache accessed content on local disk, so repeated reads of the same files incur no network overhead. All data is integrity-verified via a hash based content addressing and multiple files with the same content are automatically deduplicated.
CVMFS is intended for mostly static files used by a large set of clients, e.g., program files, static configuration data,...
CVMFS repositories are normally auto mounted on a client, i.e., if a cvmfs repository is not visible under /cvmfs/<repository.name.null> it should be sufficient to trigger its mount with ls /cvmfs/<repository.name.null> to make it appear
CVMFS Repository Maintenance¶
Any changes have to be prepared as transactions on a CVMFS repository's stratum-0 instance. A privileged user can open a repository for a new transaction. On the writable transaction storage, files can be added, removed or modified. When a repository maintainer is satisfied with the new changes, these changes can then be published as a new commit to the CVMFS repository. On a new commit, CVMFS is indexing the changes, chunks larger files into smaller chunks for better diistribution, deduplicates file chunks already present and creates a new commit. The new commit is then replicated by the stratum-1 instance with a frequency of about 15 minutes. When a new commit has propagated to a client, the new commit is then the version visible under the client's repository mount path.
Preparing a CVMFS commit/release¶
- login as priviledged user to the repository's CVMFS stratum-0 server
- open the repository for a new transaction with
cvmfs_server transaction <repository.name.null> - add/remove/modify files on the stratum-0 under
/cvmfs/<repository.name.null> - commit the changes as a new
cvmfs_server publish - under the DESY set up changes might take up to 15 minutes to fully propagate to the clients
Overview¶
CVMFS consists of a caching hierarchy to allow for a scalable distribution of files. At the base of CVMFS is a CVMFS stratum-0 instance hosting the files of a repository; it is the only instance were on request data can be updated, i.e., added, removed or changed. Since such a repository server is not intended for high I/O or CPU intense workloads, a repository's data are replicated to a stratum-1 instance, which hosts one or more local as well as remote repositories. From a stratum-1 instance other remote stratum-1, located globvally at some larger sites, instance can replicate repositories to allow for a shorter distance to client servers or computers. Such clients normally do not access a strtaum-1 directly but through Squid pull through caches, that each site is running locally. Thus, a client asks one of its local Squid servers for a file, which either has the file already cached from a previous requests, or on a miss the Squid server asks one of its known stratum-1 instances for the file and upon retrieval sends it to the client.
Grid CVMFS¶
In the WLCG Grid the different levels of the CVMFS hierarchy are distributed also horizontally so that a client can fall back to another instance iof the prefered/closest instance is not available. All data in a Grid CVMFS repository are available globally.
Site local-only CVMFS¶
If some data are not supposed to be available globally, a CVMFS hierarchy can also be set up in a local-only approach, i.e., to keep all requests limited to local clients. Since no fallback instances have to be configured, requests can be answered faster without walking through all configured instances.
Advanced Usage¶
Repository Namespace Housekeeping¶
It is heartly suggested to plan for a namespace scheme before filling a CVMFS repository with data, i.e., to consider how to organize data and releases in a well structured way. Different repositories like atlas.cern.ch, cms.cern.ch, hmz.desy.de, lhcb.cern.ch, belle.kek.jp, belle.cern.ch,... approach their namespaces in varying ways and could offer ideas.
Examples might be
Date Tagged Releases¶
/cvmfs/<repository.name.null>/releases/
/cvmfs/<repository.name.null>/releases/20260328.1
/cvmfs/<repository.name.null>/releases/20260328.2
/cvmfs/<repository.name.null>/releases/20260329.1
/cvmfs/<repository.name.null>/releases/20260331.1
/cvmfs/<repository.name.null>/releases/current --symlink--> ./20260331.1
Container Hash Identifiers¶
/cvmfs/<repository.name.null>/container/src/9b40c1d08f7dbd0f014e1d7170e41be5dd69218b
/cvmfs/<repository.name.null>/container/src/e19baea1142f766e3f3a8b81e7dba1fb7162544d
/cvmfs/<repository.name.null>/container/src/65c1218597ac23fe16e4768968ad930959cca3db
/cvmfs/<repository.name.null>/container/humanredablename --symlink--> ./src/65c1218597ac23fe16e4768968ad930959cca3db
CVMFS catalogues¶
As a CVMFS repository can potentially have a large number of files, a CVMFS repository can prepare catalogs for the content of directory trees to increase the meta data lookup performance. To tell CVMFS during a transaction to create a catalog for a given directory tree like a new release, create an empty .cvmfscatalog file at the base of the dirtectory tree, e.g., touch /cvmfs/<repository.name.null>/releases/MYNEWRELEASE/.cvmfscatalog
Named Commits aka tagged Snapshots¶
A newly published commit consists of the changes to the previous commits. For performance reasons, all published commits older than two weeks are automatically merged into a single commit with a garbage collection run, i.e., discarding all the unnecessary file chunks that are not referenced anymore by the current releases. However, a commit can explicitly tagged with a name as a dedicated snapshot. e.g.,
cvmfs_server publish -a mytag2026.14.32 -m "comment what makes this commit so special" preprod.desy.de
Such a named snapshot is not automatically garbage collected and stays available in the background. The repository status of such a named snapshot can also be accessed by prepared clients via the hidden path
/cvmfs/<repository.name.null>/.cvmfs/snapshots/...
As the usage might be confusing for a normal user, it is suggested to use the snapshot feature prudently. Also, since named snapshots are not automatically garbage collected, it is suggested to occassionally clean up and remove unused snapshots for performance reasons.
Gitlab Runner¶
A CVMFS repository instance can be integrated into a Gitlab pipeline for an automatic build and publishing pipeline. As a stratum-0 instance has only limited compute resources, all heavy build steps should be performed by another instance in a pipeline. For example, a generic runner could build a new container release and push it to the DESY container registry. A follow up test could pull and validate the container if it passes some sanity checks. If the test was passed successfully, the Gitlab pipeline could trigger the actual deployment runner on the CVMFS stratum-0 instance, which would pull the container and prepare a new release for distribution with CVMFS.
