The Pool Selection Mechanism

release notes | Book: 1.9.5, 1.9.12 (opt, FHS), 2.11 (FHS), 2.12 (FHS), 2.13 (FHS), 2.14 (FHS), | Wiki | Q&A black_bg

Web: Multi-page, Single page | PDF: A4-size, Letter-size | eBook: epub black_bg

The Pool Selection Mechanism
Prev	Chapter 7. Configuring the Pool Manager	Next

> > >

The Pool Selection Mechanism

The PSU generates a list of allowable storage-pools for each incoming transfer-request. The PSU-configuration described below tells the PSU which combinations of transfer-request and storage-pool are allowed. Imagine a two-dimensional table with a row for each possible transfer-request and a column for each pool - each field in the table containing either “yes” or “no”. For an incoming transfer-request the PSU will return a list of all pools with “yes” in the corresponding row.

Instead of “yes” and “no” the table really contains a preference - a non-negative integer. However, the PSU configuration is easier to understand if this is ignored.

Actually maintaining such a table in memory (and as user in a configuration file) would be quite inefficient, because of the many possibilities for the transfer-requests. Instead, the PSU consults a set of rules in order to generate the list of allowed pools. Each such rule is called a link because it links a set of transfer-requests to a group of pools. A link consists of a set of condition and a list of pools. If all the conditions are satisfied, the pools belonging to the link are added to the list of allowable pools.

The main task is to understand how the conditions in a link are defined. After we have dealt with that, the preference values will be discussed and a few examples will follow.

[return to top]

The Condition of a Link

The properties of a transfer-request, which are relevant for the PSU, are the following:

Location of the File: The directory of the file in the file system (perfectly normal file system - pnfs).
IP Address: The IP address of the requesting host.
Type of Transfer: The type of transfer is either “read”, “write”, or "cache". A request for reading a file which is not stored in the cache, but has to be staged from a connected tertiary storage system will trigger a “cache”-request and a subsequent “read”-request. These will be treated as two separate requests.

The location of the file in the file system is not used directly. Each file has the following two properties which can be set per directory:

Storage Class: The storage class is a string. It is used by a tertiary storage system to decide where to store the file (i.e. on which set of tapes) and dCache can use the storage class for a similar purpose (i.e. on which pools the file can be stored.). A detailed description of the syntax and how to set the storage class of a directory in the PNFS is given in the section called “Storage Classes”.
Cache Class: The cache class is a string with essentially the same functionality as the storage class, except that it is not used by a tertiary storage system. It is used in cases, where the storage class does not provide enough flexibility. It should only be used, if an existing configuration using storage classes does not provide sufficient flexibility

Each link contains one or more conditions, all of which have to be satisfied by the transfer-request. Each condition in turn contains several elementary conditions. The condition is satisfied if at least one of the elementary conditions is satisfied. For the mathematically inclined this logical structure can be expressed by the following formula:

Example:

link ==     ( elemCond1 or elemCond2 ) 
        and ( elemCond3 or elemCond4 or elemCond5 ) 
        and ... and ( ... ),

where the parentheses are the conditions. The first condition contains the elementary conditions elemCond1 and elemCond2, and the second one contains elemCond3, elemCond4, and elemCond5.

There are 3 types of elementary conditions: elementary network (-net), storage class (-store), and cache class conditions (-dcache). Each type imposes a condition on the IP address, the storage class, and the cache class, respectively.

An elementary network condition consists of an IP address and a net mask written as “<IP-address>/<net mask>”, say “111.111.111.0/255.255.255.0”. It is satisfied, if the request is coming from a host with IP address within the subnet given by the address/netmask pair.

An elementary storage class condition is given by a storage class. It is satisfied if the requested file has this storage class. Simple wild cards are allowed: for this it is important to know that a storage class must always contains exactly one @-symbol as will be explained in the section called “Storage Classes”. In an elementary storage class condition, either the part before the @-symbol or both parts may be replaced by a *-symbol; for example, *@osm and *@* are both valid elementary storage class conditions whereas something@* is invalid. The *-symbol represents a limited wildcard: any string that doesn’t contain an @-symbol will match.

An elementary cache class condition is given by a cache class. It is satisfied, if the cache class of the requested file agrees with it.

The conditions for the type of transfer are not specified with elementary conditions. Instead, each link contains three attributes “-readpref”, “-writepref”, and “-cachepref”, which specify a preference value for the respective types of transfer. If all the conditions in the link are satisfied, the corresponding preference is assigned to each pool the link points to. Since we are ignoring different preference values at the moment, a preference of 0 stands for “no” and a non-zero preference stands for “yes”.

The following explanation of the preference values can be skipped at first reading. It will not be relevant, if all non-zero preference values are the same. If you want to try configuring the pool manager right now without bothering about the preferences, you should only use 0 (for “no”) and, say, 10 (for “yes”) as preferences. The first examples below are of this type.

[return to top]

Preference Values

If more than one preference value different from zero is used, the PSU will not generate a single list but a set of lists, each containing pools with the same preference. The Cost Manager will use the list of pools with highest preference and select the one with the lowest cost for the transfer. Only if all pools with the highest preference are unavailable, the next list will be considered by the Cost Manager. This can be used to configure a set of fall-back pools which are used if none of the other pools are available.

[return to top]

Syntax and Examples

The syntax of the commands for configuring the PSU will be explained with the examples below. These commands can be issued within the PoolManager-cell to change the configuration while the system is running. The save-command can then be used to save the current configuration to the file config/PoolManager.conf in the dCache program-directory. This file will be parsed, whenever the dCache system starts up. It is a simple text file containing the corresponding commands. It can therefore also be edited before the system is started. It can also be loaded into a running system with the reload-command.

[return to top]

Pool Groups

Pools can be grouped together to pool groups. Consider a host pool1 with two pools, pool1_1 and pool1_2, and a host pool2 with one pool pool2_1. If you want to treat them in the same way, you would create a pool group and put all of them in it:

psu create pgroup <normal-pools>
psu create pool <pool1_1>
psu addto pgroup <normal-pools> <pool1_1>
psu create pool <pool1_2>
psu addto pgroup <normal-pools> <pool1_2>
psu create pool <pool2_1>
psu addto pgroup <normal-pools> <pool2_1>

If you later want to treat pool1_2 differently from the others, you would remove it from this pool group and add it to a new one:

psu removefrom pgroup <normal-pools> <pool1_2>
psu create pgroup <special-pools>
psu addto pgroup <special-pools> <pool1_2>

In the following, we will assume that the necessary pool groups already exist. All names ending with “-pools” will denote pool groups.

Note that a pool-node will register itself with the pool manager: The pool will be created within the PSU and added to the pool group “default”, if that exists. This is why the dCache system will automatically use any new pool-nodes in the standard configuration: All pools are in “default” and can therefore handle any request.

[return to top]

Separate Write and Read Pools

The dCache we are going to configure receives data from a running experiment, stores the data onto a tertiary storage system, and serves as a read cache for users who want to analyze the data. While the new data from the experiment should be stored on highly reliable and therefore expensive systems, the cache functionality may be provided by inexpensive hardware. It is therefore desirable to have a set of pools dedicated for writing the new data and a separate set for reading.

The simplest configuration for such a setup would consist of two links “write-link” and “read-link”. The configuration is as follows:

psu create unit -net <0.0.0.0/0.0.0.0>
psu create ugroup <allnet-cond>
psu addto ugroup <allnet-cond> <0.0.0.0/0.0.0.0>

psu create link <read-link> <allnet-cond>
psu set link <read-link> -readpref=<10> -writepref=<0> -cachepref=<10>
psu add link <read-link> <read-pools>

psu create link <write-link> <allnet-cond>
psu set link <write-link> -readpref=<0> -writepref=<10> -cachepref=<0>
psu add link <write-link> <write-pools>

Why is the condition allnet-cond necessary? It is used as a condition which is always true in both links. This is needed, because each link contains at least one condition. The commands contain the words unit and ugroup for historical reasons. They denote elementary conditions and conditions in our nomenclature.

[return to top]

Restricted Access by IP Address

You might not want to give access to the pools for the whole network, as in the previous example (the section called “Separate Write and Read Pools”), though. Assume, the experiment data is copied into the cache from the hosts with IP 111.111.111.201, 111.111.111.202, and 111.111.111.203. As you might guess, the subnet of the site is 111.111.111.0/255.255.255.0. Access from outside should be denied. Then you would modify the above configuration as follows:

psu create unit -net <111.111.111.0/255.255.255.0>
psu create ugroup <allnet-cond>
psu addto ugroup <allnet-cond> <111.111.111.0/255.255.255.0>

psu create unit -net <111.111.111.201/255.255.255.255>
psu create unit -net <111.111.111.202/255.255.255.255>
psu create unit -net <111.111.111.203/255.255.255.255>
psu create ugroup <write-cond>
psu addto ugroup <write-cond> <111.111.111.201/255.255.255.255>
psu addto ugroup <write-cond> <111.111.111.202/255.255.255.255>
psu addto ugroup <write-cond> <111.111.111.203/255.255.255.255>

psu create link <read-link> <allnet-cond>
psu set link <read-link> -readpref=<10> -writepref=<0> -cachepref=<10>
psu add link <read-link> <read-pools>

psu create link <write-link> <write-cond>
psu set link <write-link> -readpref=<0> -writepref=<10> -cachepref=<0>
psu add link <write-link> <write-pools>

[return to top]

Reserving Pools for Storage and Cache Classes

If pools are financed by one experimental group, they probably do not like it, if it is also used by another group. The best way to restrict data belonging to one experiment to a set of pools is with the help of storage class conditions. If more flexibility is needed, cache class conditions can be used for the same purpose.

Assume, data of experiment A obtained in 2004 is written into subdirectories in the PNFS tree which are tagged with the storage class “exp-a:run2004@osm”, and similarly for the other years. (How this is done is described in the section called “Storage Classes”.) Experiment B uses the storage class “exp-b:alldata@osm” for all its data. Especially important data is tagged with the cache class “important”. (This is described in the section called “Cache Class”.) A suitable setup would be

psu create ugroup <exp-a-cond>

psu create unit -store <exp-a:run2003@osm>
psu addto ugroup <exp-a-cond> <exp-a:run2003@osm>
psu create unit -store <exp-a:run2004@osm>
psu addto ugroup <exp-a-cond> <exp-a:run2004@osm>

psu create link <exp-a-link> <allnet-cond> <exp-a-cond>
psu set link <exp-a-link> -readpref=<10> -writepref=<10> -cachepref=<10>
psu add link <exp-a-link> <exp-a-pools>

psu create ugroup <exp-b-cond>

psu create unit -store <exp-b:alldata@osm>
psu addto ugroup <exp-b-cond> <exp-b:alldata@osm>

psu create ugroup <imp-cond>
psu create unit -dcache <important>
psu addto ugroup <imp-cond> <important>

psu create link <exp-b-link> <allnet-cond> <exp-b-cond>
psu set link <exp-b-link> -readpref=<10> -writepref=<10> -cachepref=<10>
psu add link <exp-b-link> <exp-b-pools>

psu create link <exp-b-imp-link> <allnet-cond> <exp-b-cond> <imp-cond>
psu set link <exp-b-imp-link> -readpref=<20> -writepref=<20> -cachepref=<20>
psu add link <exp-b-link> <exp-b-imp-pools>

Data tagged with cache class “important” will always be written and read from pools in the pool group exp-b-imp-pools, except when all pools in this group cannot be reached. Then the pools in exp-a-pools will be used.

Note again that these will never be used otherwise. Not even, if all pools in exp-b-imp-pools are very busy and some pools in exp-a-pools have nothing to do and lots of free space.

The central IT department might also want to set up a few pools, which are used as fall-back, if none of the pools of the experiments are functioning. These will also be used for internal testing. The following would have to be added to the previous setup:

psu create link <fallback-link> <allnet-cond>
psu set link <fallback-link> -readpref=<5> -writepref=<5> -cachepref=<5>
psu add link <fallback-link> <it-pools>

Note again that these will only be used, if none of the experiments pools can be reached, or if the storage class is not of the form exp-a:run2003@osm, exp-a:run2004@osm, or exp-b:alldata@osm. If the administrator fails to create the elementary condition exp-a:run2005@osm and add it to the condition exp-a-cond, the fall-back pools will be used eventually.

[return to top]

Storage Classes

The storage class is a string of the form <StoreDescriptor>@<hsm>, where <hsm> denotes the type of tertiary storage system in use, and <StoreDescriptor> is a string describing the storage class in a syntax which depends on the used tertiary storage system. If no tertiary storage system is used, it is probably best to use <hsm>=osm, since this is tested best. Then the <StoreDescriptor> has the syntax <Store>:<StorageGroup>. These can be set within PNFS per directory. Consider for example the following setup:

[root] # cd /pnfs/<domain>/<experiment-a>/
[root] # cat ".(tag)(OSMTemplate)"
StoreName myStore
[root] # cat ".(tag)(sGroup)"
STRING

This is the setup after a fresh installation and it will lead to the storage class myStore:STRING@osm. An adjustment to more sensible values will look like

[root] # echo "StoreName exp-a" >! ".(tag)(OSMTemplate)"
[root] # echo "run2004" >! ".(tag)(sGroup)"

and will result in the storage class exp-a:run2004@osm. To summarize: The storage class will depend on the directory, the data is stored in and is configurable.

[return to top]

Cache Class

Storage classes might already be in use for the configuration of a tertiary storage system. In most cases they should be flexible enough to configure the PSU. However, in rare cases the existing configuration and convention for storage classes might not be flexible enough.

Consider for example a situation, where data produced by an experiment always has the same storage class exp-a:alldata@osm. This is good for the tertiary storage system, since all data is supposed to go to the same tape set sequentially. However, the data also contains a relatively small amount of meta-data, which is accessed much more often by analysis jobs than the rest of the data. You would like to keep the meta-data on a dedicated set of dCache pools. However, the storage class does not provide means to accomplish that.

The cache class of a directory is set by the tag cacheClass as follows:

[root] # echo "metaData" >! ".(tag)(cacheClass)"

In the above example the meta-data is stored in directories which are tagged in this way.

There is a nice trick for easy checking of the existing tags in one directory:

[root] # grep '' `cat '.(tags)()'`
.(tag)(OSMTemplate):StoreName exp-a
.(tag)(sGroup):run2004
.(tag)(cacheClass):metaData

This only works, if the quote-symbols are used correctly. (tick, tick, back-tick, tick, tick, back-tick).

Tags are inherited by sub-directories: Changing a tag of a directory will change the tag of each sub-directory, if the tag has never been changed for this sub-directory directly. Changing tags breaks these inheritance links. Directories in PNFS should never be moved, since this will mess up the inheritance structure and eventually break the whole system.

Prev	Up	Next
Chapter 7. Configuring the Pool Manager	Home	The Cost Module