The PSU is responsible for finding the set of pools which can be used for a specific transfer-request. By telling the PSU which pools are permitted for which type of transfer-request, the administrator of the dCache system can adjust the system to any kind of scenario: Separate organizations served by separate pools, special pools for writing the data to a tertiary storage system, pools in a DMZ which serves only a certain kind of data (e.g., for the grid). This section explains the mechanism employed by the PSU and shows how to configure it with several examples.
The PSU generates a list of allowed storage-pools for each incoming transfer-request. The PSU configuration described below tells the PSU which combinations of transfer-request and storage-pool are allowed. Imagine a two-dimensional table with a row for each possible transfer-request and a column for each pool - each field in the table containing either “yes” or “no”. For an incoming transfer-request the PSU will return a list of all pools with “yes” in the corresponding row.
Instead of “yes” and “no” the table really contains a preference - a non-negative integer. However, the PSU configuration is easier to understand if this is ignored.
Actually maintaining such a table in memory (and as user in a configuration file) would be quite inefficient, because there are many possibilities for the transfer-requests. Instead, the PSU consults a set of rules in order to generate the list of allowed pools. Each such rule is called a link because it links a set of transfer-requests to a group of pools.
A link consists of a set of unit groups and a list of pools. If all the unit groups are matched, the pools belonging to the link are added to the list of allowable pools.
A link is defined in the file
/opt/d-cache/config/PoolManager.conf
by
psu create link
<link> <unitgroup> psu set link
<link> -readpref
=<rpref> -writepref
=<wpref> -cachepref
=<cpref> -p2ppref
=<ppref> psu add link
<link> <poolgroup>
For the preference values see the section called “Preference Values for Type of Transfer”.
The main task is to understand how the unit groups in a link are defined. After we have dealt with that, the preference values will be discussed and a few examples will follow.
The four properties of a transfer request, which are relevant for the PSU, are the following:
The location of the file in the file system is not used directly. Each file has the following two properties which can be set per directory:
Storage Class. The storage class is a string. It is used by a tertiary storage system to decide where to store the file (i.e. on which set of tapes) and dCache can use the storage class for a similar purpose (i.e. on which pools the file can be stored.). A detailed description of the syntax and how to set the storage class of a directory in the namespace is given in the section called “Storage Classes”.
Cache Class. The cache class is a string with essentially the same functionality as the storage class, except that it is not used by a tertiary storage system. It is used in cases, where the storage class does not provide enough flexibility. It should only be used, if an existing configuration using storage classes does not provide sufficient flexibility.
The IP address of the requesting host.
The protocol respectively the type of door used by the transfer.
The type of transfer is either read
, write
, p2p
request or cache
.
A request for reading a file which is not on a read
pool will trigger a p2p
request and a subsequent
read
request. These will be treated as two separate
requests.
A request for reading a file which is not stored on
disk, but has to be staged from a connected tertiary
storage system will trigger a cache
request to fetch
the file from the tertiary storage system and a
subsequent read
request. These will be treated as
two separate requests.
Each link contains one or more unit groups, all of which have to be matched by the transfer request. Each unit group in turn contains several units. The unit group is matched if at least one of the units is matched.
There are four types of units: network
(-net
), protocol
(-protocol
), storage class
(-store
) and cache class
(-dcache
) units. Each type imposes a
condition on the IP address, the protocol, the storage class
and the cache class respectively.
For each transfer at most one of each of the four unit types will match. If more than one unit of the same type could match the request then the most restrictive unit matches.
The unit that matches is selected from all units defined in dCache, not just those for a particular unit group. This means that, if a unit group has a unit that could match a request but this request also matches a more restrictive unit defined elsewhere then the less restrictive unit will not match.
A network unit consists of an IP
address and a net mask written as
<IP-address>/<net
mask>
, say
111.111.111.0/255.255.255.0
. It
is satisfied, if the request is coming from a host
with IP address within the subnet given by the
address/netmask pair.
psu create ugroup <name-of-unitgroup>
psu create unit -net
<IP-address>/<net mask>
psu addto ugroup <name-of-unitgroup> <IP-address>/<net mask>
A protocol unit consists of the
name of the protocol and the version number written as
<protocol-name>/<version-number>,
e.g., xrootd/3
.
psu create ugroup <name-of-unitgroup>
psu create unit -protocol
<protocol-name>/<version-number>
psu addto ugroup <name-of-unitgroup> <protocol-name>/<version-number>
A storage class unit is given by
a storage class. It is satisfied if the requested file
has this storage class. Simple wild cards are allowed:
for this it is important to know that a storage class
must always contain exactly one
@
-symbol as will be explained in
the section called “Storage Classes”. In a storage class
unit, either the part before the
@
-symbol or both parts may be
replaced by a *
-symbol; for
example, *@osm
and
*@*
are both valid storage class
units whereas something@*
is
invalid. The *
-symbol represents a
limited wildcard: any string that doesn’t contain an
@
-symbol will match.
psu create ugroup <name-of-unitgroup>
psu create unit -store
<StoreName>:<StorageGroup>@<type-of-storage-system>
psu addto ugroup <name-of-unitgroup> <StoreName>:<StorageGroup>@<type-of-storage-system>
A cache class unit is given by a cache class. It is satisfied, if the cache class of the requested file agrees with it.
psu create ugroup <name-of-unitgroup>
psu create unit -dcache
<name-of-cache-class>
psu addto ugroup <name-of-unitgroup> <name-of-cache-class>
The conditions for the type of transfer
are not specified with units. Instead, each link contains four
attributes -readpref
,
-writepref
,
-p2ppref
and
-cachepref
, which specify a
preference value for the respective types of transfer. If all
the unit groups in the link are matched, the corresponding
preference is assigned to each pool the link points to. Since
we are ignoring different preference values at the moment, a
preference of 0
stands for
no
and a non-zero preference stands for
yes
. A negative value for -p2ppref
means, that the value for -p2ppref
should equal the one for the -readpref
.
This explanation of the preference values can be skipped
at first reading. It will not be relevant, if all non-zero
preference values are the same. If you want to try
configuring the pool manager right now without bothering
about the preferences, you should only use
0
(for no
) and, say,
10
(for yes
) as
preferences. You can choose -p2ppref=-1
if it should match the value for -readpref
. The first examples below are of this type.
If several different non-zero preference values are used, the PSU will not generate a single list but a set of lists, each containing pools with the same preference. The Pool Manager will use the list of pools with highest preference and select a pool according to the load balancing policy for the transfer. Only if all pools with the highest preference are offline, the next list will be considered by the Pool Manager. This can be used to configure a set of fall-back pools which are used if none of the other pools are available.
Pools can be grouped together to pool groups.
psu create pgroup <name-of-poolgroup> psu create pool <name-of-pool> psu addto pgroup <name-of-poolgroup> <name-of-pool>
Example:
Consider a host pool1
with two pools,
pool1_1
and pool1_2
,
and a host pool2
with one pool
pool2_1
. If you want to treat them in
the same way, you would create a pool group and put all of
them in it:
psu create pgroup normal-pools psu create pool pool1_1 psu addto pgroup normal-pools pool1_1 psu create pool pool1_2 psu addto pgroup normal-pools pool1_2 psu create pool pool2_1 psu addto pgroup normal-pools pool2_1
If you later want to treat pool1_2
differently from the others, you would remove it from this
pool group and add it to a new one:
psu removefrom pgroup normal-pools pool1_2 psu create pgroup special-pools psu addto pgroup special-pools pool1_2
In the following, we will assume that the necessary pool
groups already exist. All names ending with
-pools
will denote pool
groups.
Note that a pool-node will register itself with the
PoolManager
: The pool will be created within the PSU and
added to the pool group
default
, if that
exists. This is why the dCache system will automatically use
any new pool-nodes in the standard configuration: All pools
are in default
and can
therefore handle any request.
The storage class is a string of the form
<StoreName>:<StorageGroup>@<type-of-storage-system>
,
where
<type-of-storage-system>
denotes the type of storage system in use, and
<StoreName>
:<StorageGroup>
is a string describing the storage class in a syntax which
depends on the storage system. In general use
<type-of-storage-system>=osm
.
Consider for example the following setup:
Example:
[root] #
/opt/d-cache/bin/chimera lstag /data/experiment-a
Total: 2 OSMTemplate sGroup[root] #
/opt/d-cache/bin/chimera readtag /data/experiment-a OSMTemplate
StoreName myStore[root] #
/opt/d-cache/bin/chimera readtag /data/experiment-a sGroup
STRING
This is the setup after a fresh installation and it will lead
to the storage class
myStore:STRING@osm
. An adjustment to more
sensible values will look like
[root] #
/opt/d-cache/bin/chimera writetag /data/experiment-a OSMTemplate "StoreName exp-a"
[root] #
/opt/d-cache/bin/chimera writetag /data/experiment-a sGroup "run2010"
and will result in the storage class
exp-a:run2010@osm
for any data stored in
the /data/experiment-a
directory.
To summarize: The storage class depends on the directory the data is stored in and is configurable.
Storage classes might already be in use for the configuration of a tertiary storage system. In most cases they should be flexible enough to configure the PSU. However, in rare cases the existing configuration and convention for storage classes might not be flexible enough.
Consider for example a situation, where data produced by an
experiment always has the same storage class
exp-a:alldata@osm
. This is good for the
tertiary storage system, since all data is supposed to go to
the same tape set sequentially. However, the data also
contains a relatively small amount of meta-data, which is
accessed much more often by analysis jobs than the rest of the
data. You would like to keep the meta-data on a dedicated set
of dCache pools. However, the storage class does not provide
means to accomplish that.
The cache class of a directory is set by the tag
cacheClass
as follows:
Example:
[root] #
/opt/d-cache/bin/chimera writetag /data/experiment-a cacheClass "metaData"
In this example the meta-data is stored in directories which are tagged in this way.
Check the existing tags of a directory and their content by:
[root] #
/opt/d-cache/bin/chimera lstag /path/to/directory
Total: numberOfTags tag1 tag2 ...[root] #
/opt/d-cache/bin/chimera readtag /path/to/directory tag1
contentOfTag1
A new directory will inherit the tags from the parent directory. But updating a tag will not update the tags of any child directories.
Now we have everything we need to define a link.
psu create ugroup <name-of-unitgroup> psu create unit- <type>
<unit> psu addto ugroup <name-of-unitgroup> <unit> psu create pgroup <poolgroup> psu create pool <pool> psu addto pgroup <poolgroup> <pool> psu create link <link> <name-of-unitgroup> psu set link <link>-readpref=
<10>-writepref=
<0>-cachepref=
<10>-p2ppref=
<-1> psu add link <link> <poolgroup>
Find some examples for the configuration of the PSU below.
The dCache we are going to configure receives data from a running experiment, stores the data onto a tertiary storage system, and serves as a read cache for users who want to analyze the data. While the new data from the experiment should be stored on highly reliable and therefore expensive systems, the cache functionality may be provided by inexpensive hardware. It is therefore desirable to have a set of pools dedicated for writing the new data and a separate set for reading.
Example:
The simplest configuration for such a setup would consist of two links “write-link” and “read-link”. The configuration is as follows:
psu create pgroup read-pools psu create pool pool1 psu addto pgroup read-pools pool1 psu create pgroup write-pools psu create pool pool2 psu addto pgroup write-pools pool2 psu create unit-net
0.0.0.0/0.0.0.0 psu create ugroup allnet-cond psu addto ugroup allnet-cond 0.0.0.0/0.0.0.0 psu create link read-link allnet-cond psu set link read-link-readpref=
10-writepref=
0-cachepref=
10 psu add link read-link read-pools psu create link write-link allnet-cond psu set link write-link-readpref=
0-writepref=
10-cachepref=
0 psu add link write-link write-pools
Why is the unit group allnet-cond
necessary? It is used as a condition which is always true in
both links. This is needed, because each link must contain
at least one unit group.
You might not want to give access to the pools for the whole network, as in the previous example (the section called “Separate Write and Read Pools”), though.
Example:
Assume, the experiment data is copied into the cache from
the hosts with IP 111.111.111.201
,
111.111.111.202
, and
111.111.111.203
. As you might guess,
the subnet of the site is
111.111.111.0/255.255.255.0
. Access
from outside should be denied. Then you would modify the
above configuration as follows:
psu create pgroup read-pools psu create pool pool1 psu addto pgroup read-pools pool1 psu create pgroup write-pools psu create pool pool2 psu addto pgroup write-pools pool2 psu create unit-store
*@* psu create unit-net
111.111.111.0/255.255.255.0 psu create unit-net
111.111.111.201/255.255.255.255 psu create unit-net
111.111.111.202/255.255.255.255 psu create unit-net
111.111.111.203/255.255.255.255 psu create ugroup write-cond psu addto ugroup write-cond 111.111.111.201/255.255.255.255 psu addto ugroup write-cond 111.111.111.202/255.255.255.255 psu addto ugroup write-cond 111.111.111.203/255.255.255.255 psu create ugroup read-cond psu addto ugroup read-cond 111.111.111.0/255.255.255.0 psu addto ugroup read-cond 111.111.111.201/255.255.255.255 psu addto ugroup read-cond 111.111.111.202/255.255.255.255 psu addto ugroup read-cond 111.111.111.203/255.255.255.255 psu create link read-link read-cond psu set link read-link-readpref=
10-writepref=
0-cachepref=
10 psu add link read-link read-pools psu create link write-link write-cond psu set link write-link-readpref=
0-writepref=
10-cachepref=
0 psu add link write-link write-pools
For a given transfer exactly zero or one storage class
unit, cache class unit, net unit and protocol unit will
match. As always the most restrictive one will match, the
IP 111.111.111.201
will match the
111.111.111.201/255.255.255.255
unit
and not the 111.111.111.0/255.255.255.0
unit. Therefore if you only add
111.111.111.0/255.255.255.0
to the unit
group “read-cond”, the transfer request
coming from the IP 111.111.111.201
will
only be allowed to write and not to read. The same is true
for transfer requests from 111.111.111.202
and
111.111.111.203
.
If pools are financed by one experimental group, they probably do not like it if they are also used by another group. The best way to restrict data belonging to one experiment to a set of pools is with the help of storage class conditions. If more flexibility is needed, cache class conditions can be used for the same purpose.
Example:
Assume, data of experiment A obtained in 2010 is written
into subdirectories in the namespace tree which are tagged
with the storage class
exp-a:run2010@osm
, and
similarly for the other years. (How this is done is
described in the section called “Storage Classes”.)
Experiment B uses the storage class
exp-b:alldata@osm
for
all its data. Especially important data is tagged with
the cache class
important
. (This is
described in the section called “Cache Class”.) A suitable
setup would be
psu create pgroup exp-a-pools psu create pool pool1 psu addto pgroup exp-a-pools pool1 psu create pgroup exp-b-pools psu create pool pool2 psu addto pgroup exp-b-pools pool2 psu create pgroup exp-b-imp-pools psu create pool pool3 psu addto pgroup exp-b-imp-pools pool3 psu create unit-net
111.111.111.0/255.255.255.0 psu create ugroup allnet-cond psu addto ugroup allnet-cond 111.111.111.0/255.255.255.0 psu create ugroup exp-a-cond psu create unit -store exp-a:run2011@osm psu addto ugroup exp-a-cond exp-a:run2011@osm psu create unit -store exp-a:run2010@osm psu addto ugroup exp-a-cond exp-a:run2010@osm psu create link exp-a-link allnet-cond exp-a-cond psu set link exp-a-link-readpref=
10-writepref=
10-cachepref=
10 psu add link exp-a-link exp-a-pools psu create ugroup exp-b-cond psu create unit-store
exp-b:alldata@osm psu addto ugroup exp-b-cond exp-b:alldata@osm psu create ugroup imp-cond psu create unit-dcache
important psu addto ugroup imp-cond important psu create link exp-b-link allnet-cond exp-b-cond psu set link exp-b-link-readpref=
10-writepref=
10-cachepref=
10 psu add link exp-b-link exp-b-pools psu create link exp-b-imp-link allnet-cond exp-b-cond imp-cond psu set link exp-b-imp-link-readpref=
20-writepref=
20-cachepref=
20 psu add link exp-b-link exp-b-imp-pools
Data tagged with cache class
“important
” will always be
written and read from pools in the pool group
exp-b-imp-pools
, except when all pools
in this group cannot be reached. Then the pools in
exp-a-pools
will be used.
Note again that these will never be used otherwise. Not
even, if all pools in exp-b-imp-pools
are very busy and some pools in
exp-a-pools
have nothing to do and lots
of free space.
The central IT department might also want to set up a few pools, which are used as fall-back, if none of the pools of the experiments are functioning. These will also be used for internal testing. The following would have to be added to the previous setup:
Example:
psu create pgroup it-pools psu create pool pool_it psu addto pgroup it-pools pool_it psu create link fallback-link allnet-cond psu set link fallback-link-readpref=
5-writepref=
5-cachepref=
5 psu add link fallback-link it-pools
Note again that these will only be used, if none of the
experiments pools can be reached, or if the storage class
is not of the form exp-a:run2009@osm
,
exp-a:run2010@osm
, or
exp-b:alldata@osm
. If the administrator
fails to create the unit
exp-a:run2005@osm
and add it to the
unit group exp-a-cond
, the fall-back
pools will be used eventually.