The Pool Selection Mechanism

release notes | Book: 1.9.5, 1.9.12 (opt, FHS), 2.11 (FHS), 2.12 (FHS), 2.13 (FHS), 2.14 (FHS), | Wiki | Q&A black_bg

Web: Multi-page, Single page | PDF: A4-size, Letter-size | eBook: epub black_bg

The Pool Selection Mechanism
Prev	Chapter 8. The `poolmanager` Service	Next

> > >

The Pool Selection Mechanism

The PSU generates a list of allowed storage-pools for each incoming transfer-request. The PSU configuration described below tells the PSU which combinations of transfer-request and storage-pool are allowed. Imagine a two-dimensional table with a row for each possible transfer-request and a column for each pool - each field in the table containing either “yes” or “no”. For an incoming transfer-request the PSU will return a list of all pools with “yes” in the corresponding row.

Instead of “yes” and “no” the table really contains a preference - a non-negative integer. However, the PSU configuration is easier to understand if this is ignored.

Actually maintaining such a table in memory (and as user in a configuration file) would be quite inefficient, because there are many possibilities for the transfer-requests. Instead, the PSU consults a set of rules in order to generate the list of allowed pools. Each such rule is called a link because it links a set of transfer-requests to a group of pools.

[return to top]

Links

A link consists of a set of unit groups and a list of pools. If all the unit groups are matched, the pools belonging to the link are added to the list of allowable pools.

A link is defined in the file /var/lib/dcache/config/poolmanager.conf by

psu create link <link> <unitgroup>
psu set link <link> -readpref=<rpref> -writepref=<wpref> -cachepref=<cpref> -p2ppref=<ppref>
psu add link <link> <poolgroup>

For the preference values see the section called “Preference Values for Type of Transfer”.

The main task is to understand how the unit groups in a link are defined. After we have dealt with that, the preference values will be discussed and a few examples will follow.

The four properties of a transfer request, which are relevant for the PSU, are the following:

Location of the File

The location of the file in the file system is not used directly. Each file has the following two properties which can be set per directory:

Storage Class. The storage class is a string. It is used by a tertiary storage system to decide where to store the file (i.e. on which set of tapes) and dCache can use the storage class for a similar purpose (i.e. on which pools the file can be stored.). A detailed description of the syntax and how to set the storage class of a directory in the namespace is given in the section called “Storage Classes”.
Cache Class. The cache class is a string with essentially the same functionality as the storage class, except that it is not used by a tertiary storage system. It is used in cases, where the storage class does not provide enough flexibility. It should only be used, if an existing configuration using storage classes does not provide sufficient flexibility.

IP Address

The IP address of the requesting host.

Protocol / Type of Door

The protocol respectively the type of door used by the transfer.

Type of Transfer

The type of transfer is either read, write, p2p request or cache.

A request for reading a file which is not on a read pool will trigger a p2p request and a subsequent read request. These will be treated as two separate requests.

A request for reading a file which is not stored on disk, but has to be staged from a connected tertiary storage system will trigger a cache request to fetch the file from the tertiary storage system and a subsequent read request. These will be treated as two separate requests.

Each link contains one or more unit groups, all of which have to be matched by the transfer request. Each unit group in turn contains several units. The unit group is matched if at least one of the units is matched.

[return to top]

Types of Units

There are four types of units: network (-net), protocol (-protocol), storage class (-store) and cache class (-dcache) units. Each type imposes a condition on the IP address, the protocol, the storage class and the cache class respectively.

For each transfer at most one of each of the four unit types will match. If more than one unit of the same type could match the request then the most restrictive unit matches.

The unit that matches is selected from all units defined in dCache, not just those for a particular unit group. This means that, if a unit group has a unit that could match a request but this request also matches a more restrictive unit defined elsewhere then the less restrictive unit will not match.

Network Unit

A network unit consists of an IP address and a net mask written as <IP-address>/<net mask>, say 111.111.111.0/255.255.255.0. It is satisfied, if the request is coming from a host with IP address within the subnet given by the address/netmask pair.

psu create ugroup <name-of-unitgroup>
psu create unit -net <IP-address>/<net mask>
psu addto ugroup <name-of-unitgroup> <IP-address>/<net mask>

Protocol Unit

A protocol unit consists of the name of the protocol and the version number written as <protocol-name>/<version-number>, e.g., xrootd/3.

psu create ugroup <name-of-unitgroup>
psu create unit -protocol <protocol-name>/<version-number>
psu addto ugroup <name-of-unitgroup> <protocol-name>/<version-number>

Storage Class Unit

A storage class unit is given by a storage class. It is satisfied if the requested file has this storage class. Simple wild cards are allowed: for this it is important to know that a storage class must always contain exactly one @-symbol as will be explained in the section called “Storage Classes”. In a storage class unit, either the part before the @-symbol or both parts may be replaced by a *-symbol; for example, *@osm and *@* are both valid storage class units whereas something@* is invalid. The *-symbol represents a limited wildcard: any string that doesn’t contain an @-symbol will match.

psu create ugroup <name-of-unitgroup>
psu create unit -store <StoreName>:<StorageGroup>@<type-of-storage-system>
psu addto ugroup <name-of-unitgroup> <StoreName>:<StorageGroup>@<type-of-storage-system>

Cache Class Unit

A cache class unit is given by a cache class. It is satisfied, if the cache class of the requested file agrees with it.

psu create ugroup <name-of-unitgroup>
psu create unit -dcache <name-of-cache-class>
psu addto ugroup <name-of-unitgroup> <name-of-cache-class>

[return to top]

Preference Values for Type of Transfer

The conditions for the type of transfer are not specified with units. Instead, each link contains four attributes -readpref, -writepref, -p2ppref and -cachepref, which specify a preference value for the respective types of transfer. If all the unit groups in the link are matched, the corresponding preference is assigned to each pool the link points to. Since we are ignoring different preference values at the moment, a preference of 0 stands for no and a non-zero preference stands for yes. A negative value for -p2ppref means, that the value for -p2ppref should equal the one for the -readpref.

[return to top]

Multiple non-zero Preference Values

Note

This explanation of the preference values can be skipped at first reading. It will not be relevant, if all non-zero preference values are the same. If you want to try configuring the pool manager right now without bothering about the preferences, you should only use 0 (for no) and, say, 10 (for yes) as preferences. You can choose -p2ppref=-1 if it should match the value for -readpref. The first examples below are of this type.

If several different non-zero preference values are used, the PSU will not generate a single list but a set of lists, each containing pools with the same preference. The Cost Manager will use the list of pools with highest preference and select the one with the lowest cost for the transfer. Only if all pools with the highest preference are offline, the next list will be considered by the Cost Manager. This can be used to configure a set of fall-back pools which are used if none of the other pools are available.

[return to top]

Pool Groups

Pools can be grouped together to pool groups.

psu create pgroup <name-of-poolgroup>
psu create pool <name-of-pool>
psu addto pgroup <name-of-poolgroup> <name-of-pool>

Example:

Consider a host pool1 with two pools, pool1_1 and pool1_2, and a host pool2 with one pool pool2_1. If you want to treat them in the same way, you would create a pool group and put all of them in it:

psu create pgroup normal-pools
psu create pool pool1_1
psu addto pgroup normal-pools pool1_1
psu create pool pool1_2
psu addto pgroup normal-pools pool1_2
psu create pool pool2_1
psu addto pgroup normal-pools pool2_1

If you later want to treat pool1_2 differently from the others, you would remove it from this pool group and add it to a new one:

psu removefrom pgroup normal-pools pool1_2
psu create pgroup special-pools
psu addto pgroup special-pools pool1_2

In the following, we will assume that the necessary pool groups already exist. All names ending with -pools will denote pool groups.

Note that a pool-node will register itself with the PoolManager: The pool will be created within the PSU and added to the pool group default, if that exists. This is why the dCache system will automatically use any new pool-nodes in the standard configuration: All pools are in default and can therefore handle any request.

[return to top]

Define a link

Now we have everything we need to define a link.

psu create ugroup <name-of-unitgroup>
psu create unit - <type> <unit>
psu addto ugroup <name-of-unitgroup> <unit>

psu create pgroup <poolgroup>
psu create pool <pool>
psu addto pgroup <poolgroup> <pool>

psu create link <link> <name-of-unitgroup>
psu set link <link> -readpref=<10> -writepref=<0> -cachepref=<10>-p2ppref=<-1>
psu add link <link>  <poolgroup>

[return to top]

Examples

Find some examples for the configuration of the PSU below.

[return to top]

Separate Write and Read Pools

The dCache we are going to configure receives data from a running experiment, stores the data onto a tertiary storage system, and serves as a read cache for users who want to analyze the data. While the new data from the experiment should be stored on highly reliable and therefore expensive systems, the cache functionality may be provided by inexpensive hardware. It is therefore desirable to have a set of pools dedicated for writing the new data and a separate set for reading.

Example:

The simplest configuration for such a setup would consist of two links “write-link” and “read-link”. The configuration is as follows:

psu create pgroup read-pools
psu create pool pool1
psu addto pgroup read-pools pool1
psu create pgroup write-pools
psu create pool pool2
psu addto pgroup write-pools pool2

psu create unit -net 0.0.0.0/0.0.0.0
psu create ugroup allnet-cond
psu addto ugroup allnet-cond 0.0.0.0/0.0.0.0

psu create link read-link allnet-cond
psu set link read-link -readpref=10 -writepref=0 -cachepref=10
psu add link read-link read-pools

psu create link write-link allnet-cond
psu set link write-link -readpref=0 -writepref=10 -cachepref=0
psu add link write-link write-pools

Why is the unit group allnet-cond necessary? It is used as a condition which is always true in both links. This is needed, because each link must contain at least one unit group.

[return to top]

Restricted Access by IP Address

You might not want to give access to the pools for the whole network, as in the previous example (the section called “Separate Write and Read Pools”), though.

Example:

Assume, the experiment data is copied into the cache from the hosts with IP 111.111.111.201, 111.111.111.202, and 111.111.111.203. As you might guess, the subnet of the site is 111.111.111.0/255.255.255.0. Access from outside should be denied. Then you would modify the above configuration as follows:

psu create pgroup read-pools
psu create pool pool1
psu addto pgroup read-pools pool1
psu create pgroup write-pools
psu create pool pool2
psu addto pgroup write-pools pool2

psu create unit -store *@*

psu create unit -net 111.111.111.0/255.255.255.0
psu create unit -net 111.111.111.201/255.255.255.255
psu create unit -net 111.111.111.202/255.255.255.255
psu create unit -net 111.111.111.203/255.255.255.255

psu create ugroup write-cond
psu addto ugroup write-cond 111.111.111.201/255.255.255.255
psu addto ugroup write-cond 111.111.111.202/255.255.255.255
psu addto ugroup write-cond 111.111.111.203/255.255.255.255

psu create ugroup read-cond
psu addto ugroup read-cond 111.111.111.0/255.255.255.0
psu addto ugroup read-cond 111.111.111.201/255.255.255.255
psu addto ugroup read-cond 111.111.111.202/255.255.255.255
psu addto ugroup read-cond 111.111.111.203/255.255.255.255

psu create link read-link read-cond
psu set link read-link -readpref=10 -writepref=0 -cachepref=10
psu add link read-link read-pools

psu create link write-link write-cond
psu set link write-link -readpref=0 -writepref=10 -cachepref=0
psu add link write-link write-pools

Important

For a given transfer exactly zero or one storage class unit, cache class unit, net unit and protocol unit will match. As always the most restrictive one will match, the IP 111.111.111.201 will match the 111.111.111.201/255.255.255.255 unit and not the 111.111.111.0/255.255.255.0 unit. Therefore if you only add 111.111.111.0/255.255.255.0 to the unit group “read-cond”, the transfer request coming from the IP 111.111.111.201 will only be allowed to write and not to read. The same is true for transfer requests from 111.111.111.202 and 111.111.111.203.

[return to top]

Reserving Pools for Storage and Cache Classes

If pools are financed by one experimental group, they probably do not like it if they are also used by another group. The best way to restrict data belonging to one experiment to a set of pools is with the help of storage class conditions. If more flexibility is needed, cache class conditions can be used for the same purpose.

Example:

Assume, data of experiment A obtained in 2010 is written into subdirectories in the namespace tree which are tagged with the storage class exp-a:run2010@osm, and similarly for the other years. (How this is done is described in the section called “Storage Classes”.) Experiment B uses the storage class exp-b:alldata@osm for all its data. Especially important data is tagged with the cache class important. (This is described in the section called “Cache Class”.) A suitable setup would be

psu create pgroup exp-a-pools
psu create pool pool1
psu addto pgroup exp-a-pools pool1

psu create pgroup exp-b-pools
psu create pool pool2
psu addto pgroup exp-b-pools pool2

psu create pgroup exp-b-imp-pools
psu create pool pool3
psu addto pgroup exp-b-imp-pools pool3

psu create unit -net 111.111.111.0/255.255.255.0
psu create ugroup allnet-cond
psu addto ugroup allnet-cond 111.111.111.0/255.255.255.0

psu create ugroup exp-a-cond
psu create unit -store exp-a:run2011@osm
psu addto ugroup exp-a-cond exp-a:run2011@osm
psu create unit -store exp-a:run2010@osm
psu addto ugroup exp-a-cond exp-a:run2010@osm

psu create link exp-a-link allnet-cond exp-a-cond
psu set link exp-a-link -readpref=10 -writepref=10 -cachepref=10
psu add link exp-a-link exp-a-pools

psu create ugroup exp-b-cond
psu create unit -store exp-b:alldata@osm
psu addto ugroup exp-b-cond exp-b:alldata@osm

psu create ugroup imp-cond
psu create unit -dcache important
psu addto ugroup imp-cond important

psu create link exp-b-link allnet-cond exp-b-cond
psu set link exp-b-link -readpref=10 -writepref=10 -cachepref=10
psu add link exp-b-link exp-b-pools

psu create link exp-b-imp-link allnet-cond exp-b-cond imp-cond
psu set link exp-b-imp-link -readpref=20 -writepref=20 -cachepref=20
psu add link exp-b-link exp-b-imp-pools

Data tagged with cache class “important” will always be written and read from pools in the pool group exp-b-imp-pools, except when all pools in this group cannot be reached. Then the pools in exp-a-pools will be used.

Note again that these will never be used otherwise. Not even, if all pools in exp-b-imp-pools are very busy and some pools in exp-a-pools have nothing to do and lots of free space.

The central IT department might also want to set up a few pools, which are used as fall-back, if none of the pools of the experiments are functioning. These will also be used for internal testing. The following would have to be added to the previous setup:

Example:

psu create pgroup it-pools
psu create pool pool_it
psu addto pgroup it-pools pool_it

psu create link fallback-link allnet-cond
psu set link fallback-link -readpref=5 -writepref=5 -cachepref=5
psu add link fallback-link it-pools

Note again that these will only be used, if none of the experiments pools can be reached, or if the storage class is not of the form exp-a:run2009@osm, exp-a:run2010@osm, or exp-b:alldata@osm. If the administrator fails to create the unit exp-a:run2005@osm and add it to the unit group exp-a-cond, the fall-back pools will be used eventually.

[return to top]

Storage Classes

The storage class is a string of the form <StoreName>:<StorageGroup>@<type-of-storage-system>, where <type-of-storage-system> denotes the type of storage system in use, and <StoreName>:<StorageGroup> is a string describing the storage class in a syntax which depends on the storage system. In general use <type-of-storage-system>=osm.

Consider for example the following setup:

Example:

[root] # /usr/bin/chimera-cli lstag /data/experiment-a
Total: 2
OSMTemplate
sGroup
[root] # /usr/bin/chimera-cli readtag /data/experiment-a OSMTemplate
StoreName myStore
[root] # /usr/bin/chimera-cli readtag /data/experiment-a sGroup
STRING

This is the setup after a fresh installation and it will lead to the storage class myStore:STRING@osm. An adjustment to more sensible values will look like

[root] # echo "StoreName exp-a" | /usr/bin/chimera-cli Writetag /data/experiment-a OSMTemplate
[root] # echo "run2010" | /usr/bin/chimera-cli Writetag /data/experiment-a sGroup

and will result in the storage class exp-a:run2010@osm. To summarize: The storage class will depend on the directory the data is stored in and is configurable.

[return to top]

Cache Class

Storage classes might already be in use for the configuration of a tertiary storage system. In most cases they should be flexible enough to configure the PSU. However, in rare cases the existing configuration and convention for storage classes might not be flexible enough.

Consider for example a situation, where data produced by an experiment always has the same storage class exp-a:alldata@osm. This is good for the tertiary storage system, since all data is supposed to go to the same tape set sequentially. However, the data also contains a relatively small amount of meta-data, which is accessed much more often by analysis jobs than the rest of the data. You would like to keep the meta-data on a dedicated set of dCache pools. However, the storage class does not provide means to accomplish that.

The cache class of a directory is set by the tag cacheClass as follows:

[root] # echo "metaData" > ".(tag)(cacheClass)"

In the above example the meta-data is stored in directories which are tagged in this way.

There is a nice trick for easy checking of the existing tags in one directory:

[root] # grep '' `cat '.(tags)()'`
.(tag)(OSMTemplate):StoreName exp-a
.(tag)(sGroup):run2010
.(tag)(cacheClass):metaData

This only works, if the quote-symbols are used correctly. (tick, tick, back-tick, tick, tick, back-tick).

Tags are inherited by sub-directories: Changing a tag of a directory will change the tag of each sub-directory, if the tag has never been changed for this sub-directory directly. Changing tags breaks these inheritance links. Directories in namespace should never be moved, since this will mess up the inheritance structure and eventually break the whole system.

[return to top]

Link Groups

The PoolManager supports a type of objects called link groups. These link groups are used by the SRM SpaceManager to make reservations against space. Each link group corresponds to a number of dCache pools in the following way: A link group is a collection of links and each link points to a set of pools. Each link group knows about the size of its available space, which is the sum of all sizes of available space in all the pools included in this link group.

To create a new link group login to the Admin Interface and cd to the PoolManager.

(local) admin > cd PoolManager
(PoolManager) admin > psu create linkGroup <linkgroup>
(PoolManager) admin > psu addto linkGroup <linkgroup> <link>
(PoolManager) admin > save

With save the changes will be saved to the file /etc/dcache/PoolManager.conf.

Note

You can also edit the file /etc/dcache/PoolManager.conf to create a new link group. Please make sure that it already exists. Otherwise you will have to create it first via the Admin Interface by

(PoolManager) admin > save

Edit the file /etc/dcache/PoolManager.conf

psu create linkGroup <linkgroup>
psu addto linkGroup <linkgroup> <link>

After editing this file you will have to restart the domain which contains the PoolManager cell to apply the changes.

Note

Administrators will have to take care, that a pool is not present in more than one link group.

Properties of space reservation. The dCache administrator can specify a RetentionPolicy and an AccessLatency for the space reservation, where RetentionPolicy describes the quality of the storage service that will be provided for the data (files) stored in this space reservation and AccessLatency describes the availability of this data.

A link group has five boolean properties called replicaAllowed, outputAllowed, custodialAllowed, onlineAllowed and nearlineAllowed. The values of these properties (true or false) can be configured via the Admin Interface or directly in the file /opt/d-cache/config/PoolManager.conf.

The link groups contained in a space reservation match the RetentionPolicy and the AccessLatency of the space reservation.

(PoolManager) admin > psu set linkGroup custodialAllowed <linkgroup> <true|false>
(PoolManager) admin > psu set linkGroup outputAllowed <linkgroup> <true|false>
(PoolManager) admin > psu set linkGroup replicaAllowed <linkgroup> <true|false>
(PoolManager) admin > psu set linkGroup onlineAllowed <linkgroup> <true|false>
(PoolManager) admin > psu set linkGroup nearlineAllowed <linkgroup> <true|false>

Moreover an attribute can be assigned to a link group.

(PoolManager) admin > psu set linkGroup attribute <linkgroup> <key>=<value>

Example:

Possible assignments for attributes are:

(PoolManager) admin > psu set linkGroup attribute name-of-LinkGroup VO=dteam001
(PoolManager) admin > psu set linkGroup attribute name-of-LinkGroup VO=cms001
(PoolManager) admin > psu set linkGroup attribute name-of-LinkGroup HSM=osm

Note

Please note that that it is up to administrators that the link groups’ attributes are specified correctly.

For example dCache will not complain if the link group that does not support tape backend will be declared as one that supports custodial.

Prev	Up	Next
Chapter 8. The `poolmanager` Service	Home	The Cost Module