
The PSU generates a list of allowable storage-pools for each incoming transfer-request. The PSU-configuration described below tells the PSU which combinations of transfer-request and storage-pool are allowed. Imagine a two-dimensional table with a row for each possible transfer-request and a column for each pool - each field in the table containing either “yes” or “no”. For an incoming transfer-request the PSU will return a list of all pools with “yes” in the corresponding row.
Instead of “yes” and “no” the table really contains a preference - a non-negative integer. However, the PSU configuration is easier to understand if this is ignored.
Actually maintaining such a table in memory (and as user in a configuration file) would be quite inefficient, because of the many possibilities for the transfer-requests. Instead, the PSU consults a set of rules in order to generate the list of allowed pools. Each such rule is called a link because it links a set of transfer-requests to a group of pools. A link consists of a set of condition and a list of pools. If all the conditions are satisfied, the pools belonging to the link are added to the list of allowable pools.
The main task is to understand how the conditions in a link are defined. After we have dealt with that, the preference values will be discussed and a few examples will follow.
The properties of a transfer-request, which are relevant for the PSU, are the following:
- Location of the File
The directory of the file in the file system (perfectly normal file system - pnfs).
- IP Address
The IP address of the requesting host.
- Type of Transfer
The type of transfer is either “read”, “write”, or "cache". A request for reading a file which is not stored in the cache, but has to be staged from a connected tertiary storage system will trigger a “cache”-request and a subsequent “read”-request. These will be treated as two separate requests.
The location of the file in the file system is not used directly. Each file has the following two properties which can be set per directory:
- Storage Class
The storage class is a string. It is used by a tertiary storage system to decide where to store the file (i.e. on which set of tapes) and dCache can use the storage class for a similar purpose (i.e. on which pools the file can be stored.). A detailed description of the syntax and how to set the storage class of a directory in the PNFS is given in the section called “Storage Classes”.
- Cache Class
The cache class is a string with essentially the same functionality as the storage class, except that it is not used by a tertiary storage system. It is used in cases, where the storage class does not provide enough flexibility. It should only be used, if an existing configuration using storage classes does not provide sufficient flexibility
Each link contains one or more conditions, all of which have to be satisfied by the transfer-request. Each condition in turn contains several elementary conditions. The condition is satisfied if at least one of the elementary conditions is satisfied. For the mathematically inclined this logical structure can be expressed by the following formula:
Example:
link == ( elemCond1 or elemCond2 ) and ( elemCond3 or elemCond4 or elemCond5 ) and ... and ( ... ),
where the parentheses are the conditions. The first
condition contains the elementary conditions
elemCond1
and elemCond2
, and the
second one contains elemCond3
,
elemCond4
, and elemCond5
.
There are 3 types of elementary conditions: elementary network
(-net
), storage class (-store
),
and cache class conditions (-dcache
). Each type
imposes a condition on the IP address, the storage class, and the cache
class, respectively.
An elementary network condition consists of
an IP address and a net mask written as
“<IP-address>/<net
mask>
”, say
“111.111.111.0/255.255.255.0
”. It is
satisfied, if the request is coming from a host with IP address within
the subnet given by the address/netmask pair.
An elementary storage class condition is
given by a storage class. It is satisfied if the requested
file has this storage class. Simple wild cards are allowed:
for this it is important to know that a storage class must
always contains exactly one @
-symbol as
will be explained in the section called “Storage Classes”. In an
elementary storage class condition, either the part before the
@
-symbol or both parts may be replaced by a
*
-symbol; for example,
*@osm
and *@*
are both
valid elementary storage class conditions whereas
something@*
is invalid. The
*
-symbol represents a limited wildcard: any
string that doesn’t contain an @
-symbol
will match.
An elementary cache class condition is given by a cache class. It is satisfied, if the cache class of the requested file agrees with it.
The conditions for the type of transfer are
not specified with elementary conditions. Instead, each link contains
three attributes “-readpref
”,
“-writepref
”, and
“-cachepref
”, which specify a preference
value for the respective types of transfer. If all the conditions in the
link are satisfied, the corresponding preference is assigned to each
pool the link points to. Since we are ignoring different preference
values at the moment, a preference of 0
stands for
“no” and a non-zero preference stands for
“yes”.
The following explanation of the preference values can be skipped
at first reading. It will not be relevant, if all non-zero preference
values are the same. If you want to try configuring the pool manager
right now without bothering about the preferences, you should only use
0
(for “no”) and, say,
10
(for “yes”) as preferences. The first
examples below are of this type.
If more than one preference value different from zero is used, the PSU will not generate a single list but a set of lists, each containing pools with the same preference. The Cost Manager will use the list of pools with highest preference and select the one with the lowest cost for the transfer. Only if all pools with the highest preference are unavailable, the next list will be considered by the Cost Manager. This can be used to configure a set of fall-back pools which are used if none of the other pools are available.
The syntax of the commands for configuring the PSU will be
explained with the examples below. These commands can be issued within
the PoolManager
-cell to change the configuration
while the system is running. The save-command can
then be used to save the current configuration to the file
config/PoolManager.conf
in the dCache
program-directory. This file will be parsed, whenever the dCache system
starts up. It is a simple text file containing the corresponding
commands. It can therefore also be edited before the system is started.
It can also be loaded into a running system with the
reload-command.
Pools can be grouped together to pool groups. Consider a host
pool1
with two pools, pool1_1
and pool1_2
, and a host pool2
with one pool pool2_1
. If you want to treat them in
the same way, you would create a pool group and put all of them in
it:
psu create pgroup <normal-pools> psu create pool <pool1_1> psu addto pgroup <normal-pools> <pool1_1> psu create pool <pool1_2> psu addto pgroup <normal-pools> <pool1_2> psu create pool <pool2_1> psu addto pgroup <normal-pools> <pool2_1>
If
you later want to treat pool1_2
differently from
the others, you would remove it from this pool group and add it to a
new one:
psu removefrom pgroup <normal-pools> <pool1_2> psu create pgroup <special-pools> psu addto pgroup <special-pools> <pool1_2>
In
the following, we will assume that the necessary pool groups already
exist. All names ending with “-pools
”
will denote pool groups.
Note that a pool-node will register itself with the pool
manager: The pool will be created within the PSU and added to the pool
group “default
”, if that exists. This
is why the dCache system will automatically use any new pool-nodes in
the standard configuration: All pools are in
“default
” and can therefore handle any
request.
The dCache we are going to configure receives data from a running experiment, stores the data onto a tertiary storage system, and serves as a read cache for users who want to analyze the data. While the new data from the experiment should be stored on highly reliable and therefore expensive systems, the cache functionality may be provided by inexpensive hardware. It is therefore desirable to have a set of pools dedicated for writing the new data and a separate set for reading.
The simplest configuration for such a setup would consist of two links “write-link” and “read-link”. The configuration is as follows:
psu create unit-net
<0.0.0.0/0.0.0.0> psu create ugroup <allnet-cond> psu addto ugroup <allnet-cond> <0.0.0.0/0.0.0.0> psu create link <read-link> <allnet-cond> psu set link <read-link>-readpref=
<10>-writepref=
<0>-cachepref=
<10> psu add link <read-link> <read-pools> psu create link <write-link> <allnet-cond> psu set link <write-link>-readpref=
<0>-writepref=
<10>-cachepref=
<0> psu add link <write-link> <write-pools>
Why
is the condition allnet-cond
necessary? It is used
as a condition which is always true in both links. This is needed,
because each link contains at least one condition. The commands
contain the words unit
and
ugroup
for historical reasons. They denote
elementary conditions and conditions in our nomenclature.
You might not want to give access to the pools for the whole
network, as in the previous example (the section called “Separate Write and Read Pools”), though. Assume, the experiment data is
copied into the cache from the hosts with IP
111.111.111.201
,
111.111.111.202
, and
111.111.111.203
. As you might guess, the subnet of
the site is 111.111.111.0/255.255.255.0
. Access
from outside should be denied. Then you would modify the above
configuration as follows:
psu create unit-net
<111.111.111.0/255.255.255.0> psu create ugroup <allnet-cond> psu addto ugroup <allnet-cond> <111.111.111.0/255.255.255.0> psu create unit-net
<111.111.111.201/255.255.255.255> psu create unit-net
<111.111.111.202/255.255.255.255> psu create unit-net
<111.111.111.203/255.255.255.255> psu create ugroup <write-cond> psu addto ugroup <write-cond> <111.111.111.201/255.255.255.255> psu addto ugroup <write-cond> <111.111.111.202/255.255.255.255> psu addto ugroup <write-cond> <111.111.111.203/255.255.255.255> psu create link <read-link> <allnet-cond> psu set link <read-link>-readpref=
<10>-writepref=
<0>-cachepref=
<10> psu add link <read-link> <read-pools> psu create link <write-link> <write-cond> psu set link <write-link>-readpref=
<0>-writepref=
<10>-cachepref=
<0> psu add link <write-link> <write-pools>
If pools are financed by one experimental group, they probably do not like it, if it is also used by another group. The best way to restrict data belonging to one experiment to a set of pools is with the help of storage class conditions. If more flexibility is needed, cache class conditions can be used for the same purpose.
Assume, data of experiment A obtained in 2004 is written into
subdirectories in the PNFS tree which are tagged with the storage
class “exp-a:run2004@osm
”, and
similarly for the other years. (How this is done is described in the section called “Storage Classes”.) Experiment B uses the storage class
“exp-b:alldata@osm
” for all its data.
Especially important data is tagged with the cache class
“important
”. (This is described in
the section called “Cache Class”.) A suitable setup would
be
psu create ugroup <exp-a-cond> psu create unit -store <exp-a:run2003@osm> psu addto ugroup <exp-a-cond> <exp-a:run2003@osm> psu create unit -store <exp-a:run2004@osm> psu addto ugroup <exp-a-cond> <exp-a:run2004@osm> psu create link <exp-a-link> <allnet-cond> <exp-a-cond> psu set link <exp-a-link>-readpref=
<10>-writepref=
<10>-cachepref=
<10> psu add link <exp-a-link> <exp-a-pools> psu create ugroup <exp-b-cond> psu create unit-store
<exp-b:alldata@osm> psu addto ugroup <exp-b-cond> <exp-b:alldata@osm> psu create ugroup <imp-cond> psu create unit-dcache
<important> psu addto ugroup <imp-cond> <important> psu create link <exp-b-link> <allnet-cond> <exp-b-cond> psu set link <exp-b-link>-readpref=
<10>-writepref=
<10>-cachepref=
<10> psu add link <exp-b-link> <exp-b-pools> psu create link <exp-b-imp-link> <allnet-cond> <exp-b-cond> <imp-cond> psu set link <exp-b-imp-link>-readpref=
<20>-writepref=
<20>-cachepref=
<20> psu add link <exp-b-link> <exp-b-imp-pools>
Data tagged with cache class
“important
” will always be written and
read from pools in the pool group exp-b-imp-pools
,
except when all pools in this group cannot be reached. Then the pools
in exp-a-pools
will be used.
Note again that these will never be used otherwise. Not even, if
all pools in exp-b-imp-pools
are very busy and some
pools in exp-a-pools
have nothing to do and lots of
free space.
The central IT department might also want to set up a few pools, which are used as fall-back, if none of the pools of the experiments are functioning. These will also be used for internal testing. The following would have to be added to the previous setup:
psu create link <fallback-link> <allnet-cond> psu set link <fallback-link>-readpref=
<5>-writepref=
<5>-cachepref=
<5> psu add link <fallback-link> <it-pools>
Note
again that these will only be used, if none of the experiments pools
can be reached, or if the storage class is not of the form
exp-a:run2003@osm
,
exp-a:run2004@osm
, or
exp-b:alldata@osm
. If the administrator fails to
create the elementary condition exp-a:run2005@osm
and
add it to the condition exp-a-cond
, the fall-back
pools will be used eventually.
The storage class is a string of the form
<StoreDescriptor>@<hsm>
,
where <hsm> denotes the type of
tertiary storage system in use, and
<StoreDescriptor> is a string
describing the storage class in a syntax which depends on the used
tertiary storage system. If no tertiary storage system is used, it is
probably best to use
<hsm>=osm
, since
this is tested best. Then the
<StoreDescriptor> has the syntax
<Store>:<StorageGroup>
.
These can be set within PNFS per directory. Consider for example the
following setup:
[root] #
cd
/pnfs/<domain>/<experiment-a>/
[root] #
cat "
StoreName myStore.(tag)(OSMTemplate)
"[root] #
cat "
STRING.(tag)(sGroup)
"
This is the setup after a fresh installation and it will lead
to the storage class myStore:STRING@osm
. An
adjustment to more sensible values will look like
[root] #
echo "StoreName exp-a" >! "
.(tag)(OSMTemplate)
"[root] #
echo "run2004" >! "
.(tag)(sGroup)
"
and
will result in the storage class exp-a:run2004@osm
.
To summarize: The storage class will depend on the directory, the data
is stored in and is configurable.
Storage classes might already be in use for the configuration of a tertiary storage system. In most cases they should be flexible enough to configure the PSU. However, in rare cases the existing configuration and convention for storage classes might not be flexible enough.
Consider for example a situation, where data produced by an
experiment always has the same storage class
exp-a:alldata@osm
. This is good for the tertiary
storage system, since all data is supposed to go to the same tape set
sequentially. However, the data also contains a relatively small amount
of meta-data, which is accessed much more often by analysis jobs than
the rest of the data. You would like to keep the meta-data on a
dedicated set of dCache pools. However, the storage class does not
provide means to accomplish that.
The cache class of a directory is set by the tag
cacheClass
as follows:
[root] #
echo "metaData" >! "
.(tag)(cacheClass)
"
In the above example the meta-data is stored in directories which are tagged in this way.
There is a nice trick for easy checking of the existing tags in one directory:
[root] #
grep '' `cat '.(tags)()'`
.(tag)(OSMTemplate):StoreName exp-a .(tag)(sGroup):run2004 .(tag)(cacheClass):metaData
This only works, if the quote-symbols are used correctly. (tick, tick, back-tick, tick, tick, back-tick).
Tags are inherited by sub-directories: Changing a tag of a directory will change the tag of each sub-directory, if the tag has never been changed for this sub-directory directly. Changing tags breaks these inheritance links. Directories in PNFS should never be moved, since this will mess up the inheritance structure and eventually break the whole system.