release notes | Book: 1.9.5, 1.9.12 (opt, FHS), 2.11 (FHS), 2.12 (FHS), 2.13 (FHS), 2.14 (FHS), | Wiki | Q&A black_bg
Web: Multi-page, Single page | PDF: A4-size, Letter-size | eBook: epub black_bg

Chapter 30. Glossary

The following terms are used in dCache.

tertiary storage system

A mass storage system which stores data and is connected to the dCache system. Each dCache pool will write files to it as soon as they have been completely written to the pool (if the pool is not configured as a LFS). The tertiary storage system is not part of dCache. However, it is possible to connect any mass storage system as tertiary storage system to dCache via a simple interface.

tape backend

A tertiary storage system which stores data on magnetic tapes.

Hierarchical Storage Manager (HSM)

See tertiary storage system.

HSM Type

The type of HSM which is connected to dCache as a tertiary storage system. The choice of the HSM type influences the communication between dCache and the HSM. Currently there are osm and enstore. osm is used for most HSMs (TSM, HPSS, ...).

Large File Store (LFS)

A Large File Store is the name for a dCache instance that is acting as a filesystem independent to, or in cooperation with, an HSM system. When dCache is acting as an LFS, files may be stored and later read without involving any HSM system.

Whether a dCache instance provides an LFS depends on whether there are pools configured to do so. The LFS option, specified for each pool within the poollist file, describes how that pool should behave. This option can take three possible values:

none

the pool does not contribute to any LFS capacity. All newly written files are regarded precious and sent to the HSM backend.

precious

Newly create files are regarded as precious but are not scheduled for the HSM store procedure. Consequently, these file will only disappear from the pool when deleted in the namespace.

volatile (or transient)

Newly create files are regarded cached and are not scheduled for the HSM store procedure. Though they will never be stored on tape, these file are part of the aging procedure and will be removed as soon as new space is needed.

Note

The volatile lfs mode is deprecated and should not be used.

to store

Copying a file from a dCache pool to the tertiary storage system.

to restore

Copying a file from the tertiary storage system to one of the dCache pools.

to stage

See to restore.

transfer

Any kind of transfer performed by a dCache pool. There are store, restore, pool to pool (client and server), read, and write transfers. The latter two are client transfers.

See Also mover.

mover

The process/thread within a pool which performs a transfer. Each pool has a limited number of movers that may be active at any time; if this limit is reached then further requests for data are queued.

In many protocols, end clients connect to a mover to transfer file contents. To support this, movers must speak the protocol the end client is using.

See Also transfer.

The dCacheSetup File

This is the primary configuration file of a dCache host. It is located at $dcache_home/config/dCacheSetup (typically /etc/dcache/dCacheSetup). Each domain uses the file config/<domainName>Setup which is in fact a symbolic link to config/dCacheSetup. The config/dCacheSetup file might even be the same across the hosts of a dCache instance.

Primary Network Interface

poollist File

The poollist files are a collection of files in the /etc/dcache directory. Each poollist file describes the set of pools that should be available for a given node. These files have a filename like <hostname>.poollist, where <hostname> is the simple hostname of the node the pools are to run on.

The file consists of one or more lines, with each line describing a pool.

Location Manager

The location manager is a cell that instructs a newly started domains to which domain they should connect. This allows domains to form arbitrary network topologies; although, by default, a dCache instance will form a star topology with the dCacheDomain domain at the centre.

Cell

A cell is a collection of Java threads that provide a discrete and simple service within dCache. Each cell is hosted within a domain.

Cells have an address derived from concatenating their name, the @ symbol and their containing domain name.

Domain

A domain is a collection of one or more cells that provide a set of related services within a dCache instance. Each domain requires its own Java Virtual Machine. A typical domain might provide external connectivity (i.e., a door) or manage the pools hosted on a machine.

Each domain has at least one cell, called the System cell and many tunnel cells for communicating with other Domains. To provide a useful service, a domain will contain other cells that provide specific behaviour.

Door

Door is the generic name for special cells that provides the first point of access for end clients to communicate with a dCache instance. There are different door implementations (e.g., GSIdCap door and GridFTP door), allowing a dCache instance to support multiple communication protocols.

A door will (typically) bind to a well-known port number depending on the protocol the door supports. This allows for only a single door instance per machine for each protocol.

A door will typically identify which pool will satisfy the end user’s operation and redirect the client to the corresponding pool. In some cases this is not possible; for example, some protocols (such as GridFTP version 1) do not allow servers to redirect end-clients, in other cases pool servers may be behind a firewall, so preventing direct access. When direct end-client access is not possible, the door may act as a data proxy, streaming data to the client.

By default, each door is hosted in a dedicated domain. This allows easy control of whether a protocol is supported from a particular machine.

Java Virtual Machine (JVM)

Java programs are typically compiled into a binary form called Java byte-code. Byte-code is comparable to the format that computers understand native; however, no mainstream processor understands Java byte-code. Instead compiled Java programs typically require a translation layer for them to run. This translation layer is called a Java Virtual Machine (JVM). It is a standardised execution environment that Java programs may run within. A JVM is typically represented as a process within the host computer.

Well Known Cell

A well-known cell is a cell that registers itself centrally. Within the admin interface, a well-known cell may be referred to by just its cell name.

Pinboard

The pinboard is a collection of messages describing events within dCache and is similar to a log file. Each cell will (typically) have its own pinboard.

Breakeven Parameter

Secondary Network Interface

least recently used (LRU) File

Default Mover Queue

Namespace

The namespace is a core component of dCache. It maps each stored file to a unique identification number and allows storing of metadata against either files or directories.

dCache supports two (independent) namespace implementations: pnfs and Chimera.

pnfs filesystem

pnfs is a filesystem that uses a database to store all information, including the contents of files. This filesystem is made available via NFS, so authorised hosts can mount pnfs and use it like any other file system.

dCache may use pnfs as its namespace. Although it is possible to store file contents in pnfs, dCache does not do this. Instead dCache stores the file data on one (or more) pools.

pnfs includes some unique additional properties. These include dot commands, pnfs IDs, levels, directory tags and wormholes.

pnfs dot command

To configure and access some of the special features of the pnfs filesystem, special files may be read, written to or created. These files all start with a dot (or period) and have one or more parameters after, each parameter is contained within a set of parentheses; for example, the file .(tag)(<foo>) is the pnfs dot command for reading or writing the <foo> directory tag value.

Care must be taken when accessing a dot command from a shell. Shells will often expand parentheses so the filename must be protected against this; for example, by quoting the filename or by escaping the parentheses.

pnfs level

In pnfs, each file can have up to eight independent contents; these file-contents, called levels, may be accessed independently. dCache will store some file metadata in levels 1 and 2, but dCache will not store any file data in pnfs.

pnfs directory tag

pnfs includes the concept of tags. A tag is a keyword-value pair associated with a directory. Subdirectories inherit tags from their parent directory. New values may be assigned, but tags cannot be removed. The dot command .(tag)(<foo>) may be used to read or write tag <foo>’s value. The dot command .(tags)() may be read for a list of all tags in that file’s subdirectory.

More details on directory tags are given in the section called “Directory Tags”.

pnfs ID

Each component (file, directory, etc) in a pnfs instance has a unique ID: a 24-digit hexadecimal number. This unique ID is used in dCache to refer to files without knowing the component’s name or in which directory the component is located.

More details on pnfs IDs are given in the section called “pnfsIDs”.

Pool to Pool Transfer

A pool-to-pool transfer is one where a file is transferred from one dCache pool to another. This is typically done to satisfy a read request, either as a load-balancing technique or because the file is not available on pools that the end-user has access.

Storage Class

The storage class is a string of the form


      <StoreName>:<StorageGroup>@<type-of-storage-system>
    

containing exactly one @-symbol.

  • <StoreName>:<StorageGroup> is a string describing the storage class in a syntax which depends on the storage system.

  • <type-of-storage-system> denotes the type of storage system in use.

    In general use <type-of-storage-system>=osm.

A storage class is used by a tertiary storage system to decide where to store the file (i.e. on which set of tapes). dCache can use the storage class for a similar purpose, namely to decide on which pools the file can be stored.

batch File

A batch file describes which cells in a domain are to be started and with what options. They typically have filenames from combining the name of a domain with .batch; for example, the dCacheDomain domain has a corresponding batch file dCache.batch

Although the cells in a domain may be configured by altering the corresponding batch file, most typical changes can be altered by editing the dCacheConfig file and this is the preferred method.

Context

Wormhole

A wormhole is a feature of the pnfs filesystem. A wormhole is a file that is accessible in all directories; however, the file is not returned when scanning a directory(e.g., using the ls command).

More details on wormholes are given in the section called “Global Configuration with Wormholes”.

Chimera

Chimera is a namespace implementation that is similar to pnfs but provides better integration with a relational database. Because of this, it allows additional functionality to be added, so some dCache features require a chimera namespace.

Many pnfs features are available in Chimera, including levels, directory tags and many of the dot commands.

Chimera ID

A Chimera ID is a 36 hexadecimal digit that uniquely defines a file or directory. It’s equivalent to a pnfs ID.

Replica

It is possible that dCache will choose to make a file accessible from more than one pool using a pool-to-pool copy. If this happens, then each copy of the file is a replica.

A file is independent of which pool is storing the data whereas a replica is uniquely specified by the pnfs ID and the pool name it is stored on.

Precious Replica

A precious replica is a replica that should be stored on tape.

Cached Replica

A cached replica is a replica that should not be stored on tape.

Replica Manager

The replica manager keeps track of the number of replicas of each file within a certain subset of pools and makes sure this number is always within a specified range. This way, the system makes sure that enough versions of each file are present and accessible at all times. This is especially useful to ensure resilience of the dCache system, even if the hardware is not reliable. The replica manager cannot be used when the system is connected to a tertiary storage system. The activation and configuration of the replica manager is described in Chapter 7, The replica Service (Replica Manager).

Storage Resource Manager (SRM)

An SRM provides a standardised webservice interface for managing a storage resource (e.g. a dCache instance). It is possible to reserve space, initiate file storage or retrieve, and replicate files to another SRM. The actual transfer of data is not done via the SRM itself but via any protocol supported by both parties of the transfer. Authentication and authorisation is done with the grid security infrastructure. dCache comes with an implementation of an SRM which can turn any dCache instance into a grid storage element.

pnfs Companion

The pnfs companion is a (database) table that stores dCache specific information; specifically, on which pools a file may be found. dCache can operate without a companion and will store file location information within a level.

Storing replica location information in the companion database greatly improves the performance of dCache as the location information is often queried by the pool manager.

Although a companion database may be used with Chimera, doing so provides no performance improvements and is not recommended.

Billing/Accounting

Accounting information is either stored in a text file or in a PostgreSQL database by the billing cell usually started in the httpdDomain domain. This is described in Chapter 24, Accounting.

Pool Manager

The pool manager is the cell running in the dCacheDomain domain. It is a central component of a dCache instance and decides which pool is used for an incoming request.

Cost Module

The cost module is a Java class responsible for combining the different types of cost associated with a particular operation; for example, if a file is to be stored, the cost module will combine the storage costs and CPU costs for each candidate target pool. The pool manager will choose the candidate pool with the least combined cost.

Pool Selection Unit

The pool selection unit is a Java class responsible for determining the set of candidate pools for a specific transaction. A detailed account of its configuration and behaviour is given in the section called “The Pool Selection Mechanism”.

Pin Manager

The pin manager is a cell by default running in the utility domain. It is a central service that can pin files to a pool for a certain time. It is used by the SRM to satisfy prestage requests.

Space Manager

The (SRM) Space Manager is a cell by default running in the srm domain. It is a central service that records reserved space on pools. A space reservation may be either for a specific duration or never expires. The Space Manager is used by the SRM to satisfy space reservation requests.

Pool

A pool is a cell responsible for storing retrieved files and for providing access to that data. Data access is supported via movers. A machine may have multiple pools, perhaps due to that machine’s storage being split over multiple partitions.

A pool must have a unique name and all pool cells on a particular machine are hosted in a domain that derives its name from the host machine’s name.

The list of directories that are to store pool data are found in the poollist File, which is located on the pool node.

sweeper

A sweeper is an activity located on a pool. It is responsible for deleting files on the pool that have been marked for removal. Files can be marked for removal because their corresponding namespace entry has been deleted or because the local file is a cache copy and more disk space is needed.

HSM sweeper

The HSM sweeper, if enabled, is a component that is responsible for removing files from the HSM when the corresponding namespace entry has been removed.

cost

The pool manager determines the pool used for storing a file by calculating a cost value for each available pool. The pool with the lowest cost is used. The costs are calculated by the cost module as described in . The total cost is a linear combination of the I.e.,

where ccf and scf are configurable with the command set pool decision.

performance cost

See Also gl-cost.

space cost

See Also gl-cost.