Table of Contents
Welcome to the dCache. dCache is a distributed storage solution for storing huge amounts of data without a hard limit, used to provide storage in the petabyte range. Therefore it qualifies as the storage system supporting data intensive experiments.
dCache is a joined effort between the Deutsches Elektronen-Synchrotron (DESY) in Hamburg, Nordic Data Grid Facility (NDGF based in Copenhagen), the Fermi National Accelerator Laboratory near Chicago with significant distributions and support from the University of California, San Diego, INFN, Bari as well as Rutherford Appleton Laboratory, UK and CERN in Geneva.
dCache can use hierarchical storage management (e.g., hard disk and tape), provides mechanisms to automatically increase performance and balance loads, increase resilience and availability. It also supplies advanced control systems to manage data as well as data flows. Normal filesystem (btrfs, ext4, XFS, ZFS) is used to store data on storage nodes.
There are several ways of accessing data stored in dCache:
NFS
v4.1 (Chimera)HTTP
andWebDAV
GridFTP
(GSI-FTP
)xrootd
SRM
(versions 1.1 and 2.2)dCap
andGSIdCap
dCache supports certificate based authentication through the
Grid Security Infrastructure used in GSI-FTP
, GSIdCap
transfer protocols and the SRM
management protocol.
Certificate authentication is also available for HTTP
and
WebDAV
.
dCache also supports fine-grain authorization with support for
POSIX file permissions and NFS
-style access control lists.
Other features of dCache are:
Resilience and high availability can be implemented in different ways by having multiple replicas of the same files.
Easy migration of data via the migration module.
A powerful cost calculation system that allows to control the data flow (reading and writing from/to pools, between pools and also between pools and tape).
Load balancing and performance tuning by hot pool replication (via cost calculation and replicas created by pool-to-pool-transfers).
Space management and support for space tokens.
Garbage collection of replicas, depending on their flags, age, et cetera.
Detailed logging and debugging as well as accounting and statistics.
XML information provider with detailed live information about the cluster.
Scriptable adminstration interface with a terminal-based front-end.
Web-interface with live information of the most important information.
Ensuring data integrity through checksumming.
dCache / SRM
can transparently manage data distributed among
dozens of disk storage nodes (sometimes distributed over several
countries). The system has shown to significantly improve the
efficiency of connected tape storage systems, by caching, gather
and flush and scheduled staging techniques. Furthermore, it
optimizes the throughput to and from data clients by dynamically
replicating datasets on the detection of load hot spots. The
system is tolerant against failures of its data servers, which
allows administrators to deploy commodity disk storage
components.
Access to the data is provided by various standard
protocols. Furthermore the software comes with an implementation
of the Storage Resource Manager protocol (SRM
), which is an
open standard for grid middleware to communicate with site
specific storage fabrics.