The Pool Selection Mechanism

release notes | Book: 1.9.5, 1.9.12 (opt, FHS), 2.6 (FHS), 2.7 (FHS), 2.8 (FHS), 2.9 (FHS), 2.10 (FHS), 2.11 (FHS), | Wiki | Q&A black_bg

Web: Multi-page, Single page | PDF: A4-size, Letter-size | eBook: epub black_bg

The Pool Selection Mechanism
Prev	Chapter 7. The `poolmanager` Service	Next

> > >

The Pool Selection Mechanism

The PSU is responsible for finding the set of pools which can be used for a specific transfer-request. By telling the PSU which pools are permitted for which type of transfer-request, the administrator of the dCache system can adjust the system to any kind of scenario: Separate organizations served by separate pools, special pools for writing the data to a tertiary storage system, pools in a DMZ which serves only a certain kind of data (e.g., for the grid). This section explains the mechanism employed by the PSU and shows how to configure it with several examples.

The PSU generates a list of allowed storage-pools for each incoming transfer-request. The PSU configuration described below tells the PSU which combinations of transfer-request and storage-pool are allowed. Imagine a two-dimensional table with a row for each possible transfer-request and a column for each pool - each field in the table containing either “yes” or “no”. For an incoming transfer-request the PSU will return a list of all pools with “yes” in the corresponding row.

Instead of “yes” and “no” the table really contains a preference - a non-negative integer. However, the PSU configuration is easier to understand if this is ignored.

Actually maintaining such a table in memory (and as user in a configuration file) would be quite inefficient, because there are many possibilities for the transfer-requests. Instead, the PSU consults a set of rules in order to generate the list of allowed pools. Each such rule is called a link because it links a set of transfer-requests to a group of pools.

[return to top]

Links

A link consists of a set of unit groups and a list of pools. If all the unit groups are matched, the pools belonging to the link are added to the list of allowable pools.

A link is defined in the file /var/lib/dcache/config/poolmanager.conf by

psu create link <link> <unitgroup>
psu set link <link> -readpref=<rpref> -writepref=<wpref> -cachepref=<cpref> -p2ppref=<ppref>
psu add link <link> <poolgroup>

For the preference values see the section called “Preference Values for Type of Transfer”.

The main task is to understand how the unit groups in a link are defined. After we have dealt with that, the preference values will be discussed and a few examples will follow.

The four properties of a transfer request, which are relevant for the PSU, are the following:

Location of the File

The location of the file in the file system is not used directly. Each file has the following two properties which can be set per directory:

Storage Class. The storage class is a string. It is used by a tertiary storage system to decide where to store the file (i.e. on which set of tapes) and dCache can use the storage class for a similar purpose (i.e. on which pools the file can be stored.). A detailed description of the syntax and how to set the storage class of a directory in the namespace is given in the section called “Storage Classes”.
Cache Class. The cache class is a string with essentially the same functionality as the storage class, except that it is not used by a tertiary storage system. It is used in cases, where the storage class does not provide enough flexibility. It should only be used, if an existing configuration using storage classes does not provide sufficient flexibility.

IP Address

The IP address of the requesting host.

Protocol / Type of Door

The protocol respectively the type of door used by the transfer.

Type of Transfer

The type of transfer is either read, write, p2p request or cache.

A request for reading a file which is not on a read pool will trigger a p2p request and a subsequent read request. These will be treated as two separate requests.

A request for reading a file which is not stored on disk, but has to be staged from a connected tertiary storage system will trigger a cache request to fetch the file from the tertiary storage system and a subsequent read request. These will be treated as two separate requests.

Each link contains one or more unit groups, all of which have to be matched by the transfer request. Each unit group in turn contains several units. The unit group is matched if at least one of the units is matched.

[return to top]

Types of Units

There are four types of units: network (-net), protocol (-protocol), storage class (-store) and cache class (-dcache) units. Each type imposes a condition on the IP address, the protocol, the storage class and the cache class respectively.

For each transfer at most one of each of the four unit types will match. If more than one unit of the same type could match the request then the most restrictive unit matches.

The unit that matches is selected from all units defined in dCache, not just those for a particular unit group. This means that, if a unit group has a unit that could match a request but this request also matches a more restrictive unit defined elsewhere then the less restrictive unit will not match.

Network Unit

A network unit consists of an IP address and a net mask written as <IP-address>/<net mask>, say 111.111.111.0/255.255.255.0. It is satisfied, if the request is coming from a host with IP address within the subnet given by the address/netmask pair.

psu create ugroup <name-of-unitgroup>
psu create unit -net <IP-address>/<net mask>
psu addto ugroup <name-of-unitgroup> <IP-address>/<net mask>

Protocol Unit

A protocol unit consists of the name of the protocol and the version number written as <protocol-name>/<version-number>, e.g., xrootd/3.

psu create ugroup <name-of-unitgroup>
psu create unit -protocol <protocol-name>/<version-number>
psu addto ugroup <name-of-unitgroup> <protocol-name>/<version-number>

Storage Class Unit

A storage class unit is given by a storage class. It is satisfied if the requested file has this storage class. Simple wild cards are allowed: for this it is important to know that a storage class must always contain exactly one @-symbol as will be explained in the section called “Storage Classes”. In a storage class unit, either the part before the @-symbol or both parts may be replaced by a *-symbol; for example, *@osm and *@* are both valid storage class units whereas something@* is invalid. The *-symbol represents a limited wildcard: any string that doesn’t contain an @-symbol will match.

psu create ugroup <name-of-unitgroup>
psu create unit -store <StoreName>:<StorageGroup>@<type-of-storage-system>
psu addto ugroup <name-of-unitgroup> <StoreName>:<StorageGroup>@<type-of-storage-system>

Cache Class Unit

A cache class unit is given by a cache class. It is satisfied, if the cache class of the requested file agrees with it.

psu create ugroup <name-of-unitgroup>
psu create unit -dcache <name-of-cache-class>
psu addto ugroup <name-of-unitgroup> <name-of-cache-class>

[return to top]

Preference Values for Type of Transfer

The conditions for the type of transfer are not specified with units. Instead, each link contains four attributes -readpref, -writepref, -p2ppref and -cachepref, which specify a preference value for the respective types of transfer. If all the unit groups in the link are matched, the corresponding preference is assigned to each pool the link points to. Since we are ignoring different preference values at the moment, a preference of 0 stands for no and a non-zero preference stands for yes. A negative value for -p2ppref means, that the value for -p2ppref should equal the one for the -readpref.

[return to top]

Multiple non-zero Preference Values

Note

This explanation of the preference values can be skipped at first reading. It will not be relevant, if all non-zero preference values are the same. If you want to try configuring the pool manager right now without bothering about the preferences, you should only use 0 (for no) and, say, 10 (for yes) as preferences. You can choose -p2ppref=-1 if it should match the value for -readpref. The first examples below are of this type.

If several different non-zero preference values are used, the PSU will not generate a single list but a set of lists, each containing pools with the same preference. The Pool Manager will use the list of pools with highest preference and select a pool according to the load balancing policy for the transfer. Only if all pools with the highest preference are offline, the next list will be considered by the Pool Manager. This can be used to configure a set of fall-back pools which are used if none of the other pools are available.

[return to top]

Pool Groups

Pools can be grouped together to pool groups.

psu create pgroup <name-of-poolgroup>
psu create pool <name-of-pool>
psu addto pgroup <name-of-poolgroup> <name-of-pool>

Example:

Consider a host pool1 with two pools, pool1_1 and pool1_2, and a host pool2 with one pool pool2_1. If you want to treat them in the same way, you would create a pool group and put all of them in it:

psu create pgroup normal-pools
psu create pool pool1_1
psu addto pgroup normal-pools pool1_1
psu create pool pool1_2
psu addto pgroup normal-pools pool1_2
psu create pool pool2_1
psu addto pgroup normal-pools pool2_1

If you later want to treat pool1_2 differently from the others, you would remove it from this pool group and add it to a new one:

psu removefrom pgroup normal-pools pool1_2
psu create pgroup special-pools
psu addto pgroup special-pools pool1_2

In the following, we will assume that the necessary pool groups already exist. All names ending with -pools will denote pool groups.

Note that a pool-node will register itself with the PoolManager: The pool will be created within the PSU and added to the pool group default, if that exists. This is why the dCache system will automatically use any new pool-nodes in the standard configuration: All pools are in default and can therefore handle any request.

[return to top]

Storage Classes

The storage class is a string of the form <StoreName>:<StorageGroup>@<type-of-storage-system>, where <type-of-storage-system> denotes the type of storage system in use, and <StoreName>:<StorageGroup> is a string describing the storage class in a syntax which depends on the storage system. In general use <type-of-storage-system>=osm.

Consider for example the following setup:

Example:

[root] # /usr/bin/chimera-cli lstag /data/experiment-a
Total: 2
OSMTemplate
sGroup
[root] # /usr/bin/chimera-cli readtag /data/experiment-a OSMTemplate
StoreName myStore
[root] # /usr/bin/chimera-cli readtag /data/experiment-a sGroup
STRING

This is the setup after a fresh installation and it will lead to the storage class myStore:STRING@osm. An adjustment to more sensible values will look like

[root] # /usr/bin/chimera-cli writetag /data/experiment-a OSMTemplate "StoreName exp-a"
[root] # /usr/bin/chimera-cli writetag /data/experiment-a sGroup "run2010"

and will result in the storage class exp-a:run2010@osm for any data stored in the /data/experiment-a directory.

To summarize: The storage class depends on the directory the data is stored in and is configurable.

[return to top]

Cache Class

Storage classes might already be in use for the configuration of a tertiary storage system. In most cases they should be flexible enough to configure the PSU. However, in rare cases the existing configuration and convention for storage classes might not be flexible enough.

Consider for example a situation, where data produced by an experiment always has the same storage class exp-a:alldata@osm. This is good for the tertiary storage system, since all data is supposed to go to the same tape set sequentially. However, the data also contains a relatively small amount of meta-data, which is accessed much more often by analysis jobs than the rest of the data. You would like to keep the meta-data on a dedicated set of dCache pools. However, the storage class does not provide means to accomplish that.

The cache class of a directory is set by the tag cacheClass as follows:

Example:

[root] # /usr/bin/chimera-cli writetag /data/experiment-a cacheClass "metaData"

In this example the meta-data is stored in directories which are tagged in this way.

Check the existing tags of a directory and their content by:

[root] # /usr/bin/chimera-cli lstag /path/to/directory
Total: numberOfTags
tag1
tag2
...
[root] # /usr/bin/chimera-cli readtag /path/to/directory tag1
contentOfTag1

Note

A new directory will inherit the tags from the parent directory. But updating a tag will not update the tags of any child directories.

[return to top]

Define a link

Now we have everything we need to define a link.

psu create ugroup <name-of-unitgroup>
psu create unit - <type> <unit>
psu addto ugroup <name-of-unitgroup> <unit>

psu create pgroup <poolgroup>
psu create pool <pool>
psu addto pgroup <poolgroup> <pool>

psu create link <link> <name-of-unitgroup>
psu set link <link> -readpref=<10> -writepref=<0> -cachepref=<10>-p2ppref=<-1>
psu add link <link>  <poolgroup>

[return to top]

Examples

Find some examples for the configuration of the PSU below.

[return to top]

Separate Write and Read Pools

The dCache we are going to configure receives data from a running experiment, stores the data onto a tertiary storage system, and serves as a read cache for users who want to analyze the data. While the new data from the experiment should be stored on highly reliable and therefore expensive systems, the cache functionality may be provided by inexpensive hardware. It is therefore desirable to have a set of pools dedicated for writing the new data and a separate set for reading.

Example:

The simplest configuration for such a setup would consist of two links “write-link” and “read-link”. The configuration is as follows:

psu create pgroup read-pools
psu create pool pool1
psu addto pgroup read-pools pool1
psu create pgroup write-pools
psu create pool pool2
psu addto pgroup write-pools pool2

psu create unit -net 0.0.0.0/0.0.0.0
psu create ugroup allnet-cond
psu addto ugroup allnet-cond 0.0.0.0/0.0.0.0

psu create link read-link allnet-cond
psu set link read-link -readpref=10 -writepref=0 -cachepref=10
psu add link read-link read-pools

psu create link write-link allnet-cond
psu set link write-link -readpref=0 -writepref=10 -cachepref=0
psu add link write-link write-pools

Why is the unit group allnet-cond necessary? It is used as a condition which is always true in both links. This is needed, because each link must contain at least one unit group.

[return to top]

Restricted Access by IP Address

You might not want to give access to the pools for the whole network, as in the previous example (the section called “Separate Write and Read Pools”), though.

Example:

Assume, the experiment data is copied into the cache from the hosts with IP 111.111.111.201, 111.111.111.202, and 111.111.111.203. As you might guess, the subnet of the site is 111.111.111.0/255.255.255.0. Access from outside should be denied. Then you would modify the above configuration as follows:

psu create pgroup read-pools
psu create pool pool1
psu addto pgroup read-pools pool1
psu create pgroup write-pools
psu create pool pool2
psu addto pgroup write-pools pool2

psu create unit -store *@*

psu create unit -net 111.111.111.0/255.255.255.0
psu create unit -net 111.111.111.201/255.255.255.255
psu create unit -net 111.111.111.202/255.255.255.255
psu create unit -net 111.111.111.203/255.255.255.255

psu create ugroup write-cond
psu addto ugroup write-cond 111.111.111.201/255.255.255.255
psu addto ugroup write-cond 111.111.111.202/255.255.255.255
psu addto ugroup write-cond 111.111.111.203/255.255.255.255

psu create ugroup read-cond
psu addto ugroup read-cond 111.111.111.0/255.255.255.0
psu addto ugroup read-cond 111.111.111.201/255.255.255.255
psu addto ugroup read-cond 111.111.111.202/255.255.255.255
psu addto ugroup read-cond 111.111.111.203/255.255.255.255

psu create link read-link read-cond
psu set link read-link -readpref=10 -writepref=0 -cachepref=10
psu add link read-link read-pools

psu create link write-link write-cond
psu set link write-link -readpref=0 -writepref=10 -cachepref=0
psu add link write-link write-pools

Important

For a given transfer exactly zero or one storage class unit, cache class unit, net unit and protocol unit will match. As always the most restrictive one will match, the IP 111.111.111.201 will match the 111.111.111.201/255.255.255.255 unit and not the 111.111.111.0/255.255.255.0 unit. Therefore if you only add 111.111.111.0/255.255.255.0 to the unit group “read-cond”, the transfer request coming from the IP 111.111.111.201 will only be allowed to write and not to read. The same is true for transfer requests from 111.111.111.202 and 111.111.111.203.

[return to top]

Reserving Pools for Storage and Cache Classes

If pools are financed by one experimental group, they probably do not like it if they are also used by another group. The best way to restrict data belonging to one experiment to a set of pools is with the help of storage class conditions. If more flexibility is needed, cache class conditions can be used for the same purpose.

Example:

Assume, data of experiment A obtained in 2010 is written into subdirectories in the namespace tree which are tagged with the storage class exp-a:run2010@osm, and similarly for the other years. (How this is done is described in the section called “Storage Classes”.) Experiment B uses the storage class exp-b:alldata@osm for all its data. Especially important data is tagged with the cache class important. (This is described in the section called “Cache Class”.) A suitable setup would be

psu create pgroup exp-a-pools
psu create pool pool1
psu addto pgroup exp-a-pools pool1

psu create pgroup exp-b-pools
psu create pool pool2
psu addto pgroup exp-b-pools pool2

psu create pgroup exp-b-imp-pools
psu create pool pool3
psu addto pgroup exp-b-imp-pools pool3

psu create unit -net 111.111.111.0/255.255.255.0
psu create ugroup allnet-cond
psu addto ugroup allnet-cond 111.111.111.0/255.255.255.0

psu create ugroup exp-a-cond
psu create unit -store exp-a:run2011@osm
psu addto ugroup exp-a-cond exp-a:run2011@osm
psu create unit -store exp-a:run2010@osm
psu addto ugroup exp-a-cond exp-a:run2010@osm

psu create link exp-a-link allnet-cond exp-a-cond
psu set link exp-a-link -readpref=10 -writepref=10 -cachepref=10
psu add link exp-a-link exp-a-pools

psu create ugroup exp-b-cond
psu create unit -store exp-b:alldata@osm
psu addto ugroup exp-b-cond exp-b:alldata@osm

psu create ugroup imp-cond
psu create unit -dcache important
psu addto ugroup imp-cond important

psu create link exp-b-link allnet-cond exp-b-cond
psu set link exp-b-link -readpref=10 -writepref=10 -cachepref=10
psu add link exp-b-link exp-b-pools

psu create link exp-b-imp-link allnet-cond exp-b-cond imp-cond
psu set link exp-b-imp-link -readpref=20 -writepref=20 -cachepref=20
psu add link exp-b-link exp-b-imp-pools

Data tagged with cache class “important” will always be written and read from pools in the pool group exp-b-imp-pools, except when all pools in this group cannot be reached. Then the pools in exp-a-pools will be used.

Note again that these will never be used otherwise. Not even, if all pools in exp-b-imp-pools are very busy and some pools in exp-a-pools have nothing to do and lots of free space.

The central IT department might also want to set up a few pools, which are used as fall-back, if none of the pools of the experiments are functioning. These will also be used for internal testing. The following would have to be added to the previous setup:

Example:

psu create pgroup it-pools
psu create pool pool_it
psu addto pgroup it-pools pool_it

psu create link fallback-link allnet-cond
psu set link fallback-link -readpref=5 -writepref=5 -cachepref=5
psu add link fallback-link it-pools

Note again that these will only be used, if none of the experiments pools can be reached, or if the storage class is not of the form exp-a:run2009@osm, exp-a:run2010@osm, or exp-b:alldata@osm. If the administrator fails to create the unit exp-a:run2005@osm and add it to the unit group exp-a-cond, the fall-back pools will be used eventually.

Prev	Up	Next
Chapter 7. The `poolmanager` Service	Home	The Partition Manager