Edit me on GitHub

CHIMERA



The inner dCache components talk to the namespace via a module called PnfsManager, which in turn communicates with the Chimera database using a thin Java layer. In addition to PnfsManager a direct access to the file system view is provided by an NFSv3 and NFSv4.1 server. Clients can NFS-mount the namespace locally. This offers the opportunity to use OS-level tools like ls, mkdir, mv for Chimera. Direct I/O-operations like cp and cat are possible with the NFSv4.1 door.

The properties of Chimera are defined in /usr/share/dcache/defaults/chimera.properties. For customisation the files /etc/dcache/layouts/mylayout.conf or /etc/dcache/dcache.conf should be modified (see the section called “Defining domains and services”.

Example:

This example shows an extract of the /etc/dcache/layouts/mylayout.conf file in order to run dCache with NFSv3.

[namespaceDomain]
[namespaceDomain/pnfsmanager]
[namespaceDomain/nfs]
nfs.version=3

Example:

If you want to run the NFSv4.1 server you need to add the corresponding nfs service to a domain in the /etc/dcache/layouts/mylayout.conf file and start this domain.

[namespaceDomain]
[namespaceDomain/pnfsmanager]
[namespaceDomain/nfs]
nfs.version = 4.1

If you wish dCache to access your Chimera with a PostgreSQL user other than chimera then you must specify the username and password in /etc/dcache/dcache.conf.

chimera.db.user=myuser
chimera.db.password=secret

Attribute consistency policy

On new filesystem object creation in a directory the modification and change id attributes must be updated to provide a consistent, up-to-date view of the changes. In highly concurrent environments such updates might create so-called hot inodes and serialize all updates in a single directory, thus, reducing the namespace throughput.

As such strong consistency is not always required, to improve concurrent updates to a single directory the POSIX constraints can be relaxed. The chimera.attr-consistency attribute controls the namespace attribute update bahaviour of a parent directory on update:

policy behaviour
strong a creation of a filesystem object will right away update parent directory’s mtime, ctime, nlink and generation attributes
weak a creation of a filesystem object will eventually update (after 30 seconds) parent directory’s mtime, ctime, nlink and generation attributes. Multiple concurrent modifications to a directory are aggregated into single attribute update.
soft same as weak, however, reading of directory attributes will take into account pending attribute updates.

Read-write exported NFS doors SHOULD run with strong consistency or soft consistency to maintain POSIX compliance. Read-only NFS doors might run with weak consistency if non-up-to-date directory attributes can be tolerated, for example when accessing existing data, or soft consistency, if up-to-date information is desired, typically when seeking for newly arrived files through other doors.

chimera.attr-consistency=strong

Mounting Chimera through NFS

dCache does not need the Chimera filesystem to be mounted, but a mounted file system is convenient for administrative access. This offers the opportunity to use OS-level tools like ls and mkdir for Chimera. However, direct I/O-operations like cp are not possible, since the NFSv3 interface provides the namespace part only. This section describes how to start the Chimera NFSv3 server and mount the name space.

If you want to mount Chimera for easier administrative access, you need to edit the /etc/exports file as the Chimera NFS server uses it to manage exports. If this file doesn’t exist it must be created. The typical exports file looks like this:

/ localhost(rw)
/data
# or
# /data *.my.domain(rw)

As any RPC service Chimera NFS requires rpcbind service to run on the host. Nevertheless rpcbind has to be configured to accept requests from Chimera NFS.

On RHEL6 based systems you need to add

RPCBIND_ARGS="-i"

into /etc/sysconfig/rpcbind and restart rpcbind. Check your OS manual for details.

service rpcbind restart
|Stopping rpcbind:                     [  OK  ]
|Starting rpcbind:                     [  OK  ]

If your OS does not provide rpcbind Chimera NFS can use an embedded rpcbind. This requires to disable the portmap service if it exists.

/etc/init.d/portmap stop
|Stopping portmap: portmap

and restart the domain in which the NFS server is running.

dcache restart namespaceDomain

Now you can mount Chimera by

mount localhost:/ /mnt

and create the root of the CHIMERA namespace which you can call data:

mkdir -p /mnt/data

If you don’t want to mount chimera you can create the root of the Chimera namespace by

chimera mkdir /data

You can now add directory tags. For more information on tags see the section called “Directory Tags”.

chimera writetag /data sGroup "chimera"
chimera writetag /data OSMTemplate "StoreName sql"

Using DCAP with a mounted file system

If you plan to use dCap with a mounted file system instead of the URL-syntax (e.g. dccp /data/file1 /tmp/file1), you need to mount the root of Chimera locally (remote mounts are not allowed yet). This will allow us to establish wormhole files so dCap clients can discover the dCap doors.

mount localhost:/ /mnt
mkdir /mnt/admin/etc/config/dCache
touch /mnt/admin/etc/config/dCache/dcache.conf
touch /mnt/admin/etc/config/dCache/'.(fset)(dcache.conf)(io)(on)'
echo "<door host>:<port>" > /mnt/admin/etc/config/dCache/dcache.conf

The default values for ports can be found in Chapter 29, dCache Default Port Values (for dCap the default port is 22125) and in the file /usr/share/dcache/defaults/dcache.properties. They can be altered in /etc/dcache/dcache.conf

Create the directory in which the users are going to store their data and change to this directory.

mkdir -p /mnt/data
cd /mnt/data

Now you can copy a file into your dCache

dccp /bin/sh test-file
|735004 bytes (718 kiB) in 0 seconds

and copy the data back using the dccp command.

dccp test-file /tmp/testfile
|735004 bytes (718 kiB) in 0 seconds

The file has been transferred succesfully.

Now remove the file from the dCache.

rm  test-file

When the configuration is complete you can unmount Chimera:

umount /mnt

NOTE

Please note that whenever you need to change the configuration, you have to remount the root localhost:/ to a temporary location like /mnt.

Communicating with Chimera

Many configuration parameters of Chimera and the application specific meta data is accessed by reading, writing, or creating files of the form .(command)(para). For example, the following prints the ChimeraID of the file /data/some/dir/file.dat:

cat /data/any/sub/directory/'.(id)(file.dat)'
|0004000000000000002320B8

From the point of view of the NFS protocol, the file .(id)(file.dat) in the directory /data/some/dir/ is read. However, Chimera interprets it as the command id with the parameter file.dat executed in the directory /data/some/dir/. The quotes are important, because the shell would otherwise try to interpret the parentheses.

Some of these command files have a second parameter in a third pair of parentheses. Note, that files of the form .(command)(para) are not really files. They are not shown when listing directories with ls. However, the command files are listed when they appear in the argument list of ls as in

ls -l '.(tag)(sGroup)'
|-rw-r--r-- 11 root root 7 Aug 6 2010 .(tag)(sGroup)

Only a subset of file operations are allowed on these special command files. Any other operation will result in an appropriate error. Beware, that files with names of this form might accidentally be created by typos. They will then be shown when listing the directory.

IDs

Each file in Chimera has a unique 18 byte long ID. It is referred to as ChimeraID or as pnfsID. This is comparable to the inode number in other filesystems. The ID used for a file will never be reused, even if the file is deleted. dCache uses the ID for all internal references to a file.

Example:

The ID of the file /example.org/data/examplefile can be obtained by reading the command-file .(id)(examplefile) in the directory of the file.

cat /example.org/data/'.(id)(examplefile)'
|0000917F4A82369F4BA98E38DBC5687A031D

A file in Chimera can be referred to by the ID for most operations.

Example:

The name of a file can be obtained from the ID with the command nameof as follows:

cd /example.org/data/
cat '.(nameof)(0000917F4A82369F4BA98E38DBC5687A031D)'
|examplefile

And the ID of the directory it resides in is obtained by:

cat '.(parent)(0000917F4A82369F4BA98E38DBC5687A031D)'
|0000595ABA40B31A469C87754CD79E0C08F2

This way, the complete path of a file may be obtained starting from the ID.

Directory tags

In the Chimera namespace, each directory can have a number of tags. These directory tags may be used within dCache to control the file placement policy in the pools (see the section called “The Pool Selection Mechanism”). They might also be used by a tertiary storage system for similar purposes (e.g. controlling the set of tapes used for the files in the directory).

NOTE

Directory tags are not needed to control the behaviour of dCache. dCache works well without directory tags.

Create, list and read directory tags if the namespace is not mounted

You can create tags with

chimera writetag <directory> <tagName> "<content>"

list tags with

chimera lstag <directory>

and read tags with

chimera readtag <directory> <tagName>

Example: Create tags for the directory data with

chimera writetag /data sGroup "myGroup"
chimera writetag /data OSMTemplate "StoreName myStore"

list the existing tags with

chimera lstag /data
|Total: 2
|OSMTemplate
|sGroup

and their content with

chimera readtag /data OSMTemplate
|StoreName myStore
chimera readtag /data sGroup
|myGroup

Create, list and read directory tags if the namespace is mounted

If the namespace is mounted, change to the directory for which the tag should be set and create a tag with

cd <directory>
echo '<content1>' > '.(tag)(<tagName1>)'
echo '<content2>' > '.(tag)(<tagName2>)'

Then the existing tags may be listed with

cat '.(tags)()'
|.(tag)(<tagname1>)
|.(tag)(<tagname2>)

and the content of a tag can be read with

cat '.(tag)(<tagname1>)'
|<content1>
cat '.(tag)(<tagName2>)'
|<content2>

In the following example, two tags are created, listed and their contents shown.

First, create tags for the directory data with

cd data
echo 'StoreName myStore' > '.(tag)(OSMTemplate)'
echo 'myGroup' > '.(tag)(sGroup)'

list the existing tags with

cat '.(tags)()'
|.(tag)(OSMTemplate)
|.(tag)(sGroup)

and their content with

cat '.(tag)(OSMTemplate)'
|StoreName myStore
cat '.(tag)(sGroup)'
|myGroup

A nice trick to list all tags with their contents is

grep "" $(cat  ".(tags)()")
|.(tag)(OSMTemplate):StoreName myStore
|.(tag)(sGroup):myGroup

Directory tags and command files

When creating or changing directory tags by writing to the command file as in

echo '<content>' > '.(tag)(<tagName>)'

one has to take care not to treat the command files in the same way as regular files, because tags are different from files in the following aspects:

  1. The tagName is limited to 62 characters and the content to 512 bytes. Writing more to the command file, will be silently ignored.

  2. If a tag which does not exist in a directory is created by writing to it, it is called a primary tag.

  3. Tags are inherited from the parent directory by a newly created subdirectory. Changing a primary tag in one directory will change the tags inherited from it in the same way. Creating a new primary tag in a directory will not create an inherited tag in its subdirectories.

    Moving a directory within the CHIMERA namespace will not change the inheritance. Therefore, a directory does not necessarily inherit tags from its parent directory. Removing an inherited tag does not have any effect.

  4. Empty tags are ignored.

Directory tags for dCache

The following directory tags appear in the dCache context:

OSMTemplate:

Must contain a line of the form StoreName <storeName> and specifies the name of the store that is used by dCache to construct the storage class if the HSM Type is osm.

HSMType:

The HSMType tag is normally determined from the other existing tags. E.g., if the tag OSMTemplate exists, HSMType=osm is assumed. With this tag it can be set explicitly. A class implementing that HSM type has to exist. Currently the only implementations are osm and enstore.

sGroup:

The storage group is also used to construct the storage class if the HSMType is osm.

cacheClass:

The cache class is only used to control on which pools the files in a directory may be stored, while the storage class (constructed from the two above tags) might also be used by the HSM. The cache class is only needed if the above two tags are already fixed by HSM usage and more flexibility is needed.

hsmInstance:

If not set, the hsmInstance tag will be the same as the HSMType tag. Setting this tag will only change the name as used in the storage class and in the pool commands.

WriteToken:

Assign a WriteToken tag to a directory in order to be able to write to a space token without using the SRM.

Storage class and directory tags

The storage class is a string of the form StoreName:StorageGroup@hsm-type, where StoreNameis given by the OSMTemplate tag, StorageGroup by the sGroup tag and hsm-type by the HSMType tag. As mentioned above the HSMType tag is assumed to be osm if the tag OSMTemplate exists.

In the examples above two tags have been created.

Example:

chimera lstag /data
|Total: 2
|OSMTemplate
|sGroup

As the tag OSMTemplate was created the tag HSMType is assumed to be osm. The storage class of the files which are copied into the directory /data after the tags have been set will be myStore:myGroup@osm.

If directory tags are used to control the behaviour of dCache and/or a tertiary storage system, it is a good idea to plan the directory structure in advance, thereby considering the necessary tags and how they should be set up. Moving directories should be done with great care or even not at all. Inherited tags can only be created by creating a new directory.

Example:

Assume that data of two experiments, experiment-a and experiment-b is written into a namespace tree with subdirectories /data/experiment-a and /data/experiment-b. As some pools of the dCache are financed by experiment-a and others by experiment-b they probably do not like it if they are also used by the other group. To avoid this the directories of experiment-a and experiment-b can be tagged.

chimera writetag /data/experiment-a OSMTemplate "StoreName exp-a"
chimera writetag /data/experiment-b OSMTemplate "StoreName exp-b"

Data from experiment-a taken in 2010 shall be written into the directory /data/experiment-a/2010 and data from experiment-a taken in 2011 shall be written into /data/experiment-a/2011. Data from experiment-b shall be written into /data/experiment-b. Tag the directories correspondingly.

chimera writetag /data/experiment-a/2010 sGroup "run2010"
chimera writetag /data/experiment-a/2011 sGroup "run2011"
chimera writetag /data/experiment-b sGroup "alldata"

List the content of the tags by

chimera readtag /data/experiment-a/2010 OSMTemplate
|StoreName exp-a
chimera readtag /data/experiment-a/2010 sGroup
|run2010
chimera readtag /data/experiment-a/2011 OSMTemplate
|StoreName exp-a
chimera readtag /data/experiment-a/2011 sGroup
|run2011
chimera readtag /data/experiment-b/2011 OSMTemplate
|StoreName exp-b
chimera readtag /data/experiment-b/2011 sGroup
|alldata

As the tag OSMTemplate was created the HSMType is assumed to be osm. The storage classes of the files which are copied into these directories after the tags have been set will be

  • exp-a:run2010@osm for the files in /data/experiment-a/2010
  • exp-a:run2011@osm for the files in /data/experiment-a/2011
  • exp-b:alldata@osm for the files in /data/experiment-b

To see how storage classes are used for pool selection have a look at the example ’Reserving Pools for Storage and Cache Classes’ in the PoolManager chapter.

There are more tags used by dCache if the HSMType is enstore.