1.9.3 Release Notes

The main focus areas of the 1.9.3 release are ACL support, NFS 4.1 support and asynchronous srmLs support. Other highlights include further refactoring of the pool code, first steps to refactor the pool manager, and support for using JMS for communication between dCache cells.

Upgrade Instructions

Important Changes

Please consider the following changes when upgrading from a version before 1.9.3-1:

Compatibility

It is safe to mix pools of releases 1.9.1 to 1.9.3, and they can be used with any version of the head nodes. Head nodes and doors must be upgraded to 1.9.3 together and cannot be mixed with head nodes or doors of 1.9.1 or 1.9.2. Components of different 1.9.3 releases can be mixed freely. 1.9.0 is no longer supported.

Compatibility Matrix

We distinguish between pool components and head nodes. Any component which is not a pool is considered a head node, including doors. The following table shows compatibility between different releases.

1.9.3-4 Head1.9.3-4 Pool
Head 1.9.1-1..7,9-11noyes
1.9.1-8noyes[1]
1.9.2-1..5,8-11noyes
1.9.2-6,7noyes[1]
1.9.3-1..4yes yes
Pool 1.9.1-1..7,9-11yesyes
1.9.1-8yes[1]yes
1.9.2-1..5,8-11yesyes
1.9.2-6,7yes[1]yes
1.9.3-1..4yes yes
  1. The migration module will not work for -target=pgroup and -target=link.

1.9.3-4

Changelog 1.9.3-3 to 1.9.3-4

1.9.3-3

Updates to ftp doors: Fixed race condition.

Updates to srm: Fixed race condition. Security fix. Reduced log verbosity.

Updates to info provider: Added missing XSLT processor. The execution bit of info-based-infoProvider.sh is now set.

Updates to pools: Lower log level of XROOTD mover. Fixed race conditions in DCAP, XROOTD and pool-to-pool support.

Changelog 1.9.3-2 to 1.9.3-3

1.9.3-2

Updates to doors: A problem in the meta data lookup of some door (at least FTP doors, maybe others) was fixed: The timeout was falsely set to several days rather than 5 minutes. In error scenarious this could lead to hanging FTP doors.

Updates to dCacheDomain: Message handling was fixed in PoolManager. The main side effect of this bug were unintended error messages in the log files.

Updates to srm: A missing newline in srm_setup.env was added. This problem made the srm uninstallable in 1.9.3-1.

Updates to scripts: A typo was fixed in the init scripts. This typo caused the xrootd door to be shown as 'stopped' when running dcache status even though it was started.

Updates to pools: For DCAP transfers with non-sequential writes, the checksum computation during transfer was incorrect. This is fixed in this release.

Changelog 1.9.3-1 to 1.9.3-2

1.9.3-1

  1. ACL
  2. NFS 4.1
  3. Asynchronous srmLs
  4. Pool
  5. Pool Manager
  6. FTP
  7. Xrootd
  8. GSI
  9. JMS
  10. Scripts
  11. Miscellaneous
  12. Changelog

ACL

Starting from dCache 1.9.3, support for Access Control Lists (ACLs) is provided according to the: "NFS version 4 Protocol" specification.

By default (without any additional ACLs configuration) dCache uses the UNIX permissions check as before. To enable ACLs, the "ACL Configuration" section in /opt/d-cache/config/dCacheSetup has to be configured (see Cases 2 and 3 below). The default case is to use UNIX permissions (see Case 1 below).

Configuration cases:

  1. Only UNIX permissions are used: permissionHandler=diskCacheV111.services.acl.UnixPermissionHandler
  2. Only ACL permissions are used: permissionHandler=diskCacheV111.services.acl.ACLPermissionHandler
  3. ACL permissions are used first. If ACLs do not state whether an operation is allowed or denied then UNIX permissions are used: permissionHandler=diskCacheV111.services.acl.ACLPermissionHandler,diskCacheV111.services.acl.UnixPermissionHandler

If migration to Chimera has been done already on your system and the 'chimera' database already exists, then ACL-table has to be added:

 $ psql chimera < /opt/d-cache/libexec/chimera/sql/addACLtoChimeraDB.sql
 $ psql chimera < /opt/d-cache/libexec/chimera/sql/pgsql-procedures.sql

If 'chimera' database does not exist, please create it using:

$ createdb chimera
$ psql chimera < /opt/d-cache/libexec/chimera/sql/create.sql
$ createlang plpgsql chimera
$ psql chimera < /opt/d-cache/libexec/chimera/sql/pgsql-procedures.sql

Additional information on how to use ACLs in dCache can be found here: http://trac.dcache.org/projects/dcache/wiki/Integrate.

NFS 4.1

The NFS 4.1 protocol is a revision of NFS 4.0. It has currently status as a Proposed Standard within the IETF. From the point of view of dCache, the most interesting new feature is the support for parallel NFS. This makes it possible for data flow to go directly between NFS clients and the dCache pools holding the data.

Experimental NFS 4.1 client support is present in the latest Linux kernels. dCache 1.9.3 introduces experimental NFS 4.1 server support. The dCache project sees NFS 4.1 as an important strategic development, however the current support should be treated as a technology preview. If you are interesting in testing NFS 4.1 support, please read the NFS 4.1 setup instructions.

For more information, please see the dCache NFS 4.1 manual.

Asynchronous srmLs

The SRM specification gives an SRM server the freedom to process an SRM file listing operation either synchronously or asynchronously. When processed synchronously, the TCP connection between the client and the server is maintained while the server processes the request. When processed asynchronously, the server replies immediately that the request has been queued. The client can then periodically poll the server to obtain the result.

Asynchronous processing has the advantage that if the file list operation is slow for some reason (e.g. when listing large directories or directories, or when the server is overloaded), then the TCP connection can be released and the slot used for other purposes. The downside is that from the point of view of the client, obtaining the result takes more time, since the TCP connection needs to be reestablished to poll for the result. CPU consumption may also increase on both the client and server side due to overhead from the GSI authentication.

Although asynchronous processing is defined in the SRM specification, many clients do not support it. Most clients are now being updated to correctly handle asynchronous srmLs.

Starting from dCache 1.9.3, both synchronous and asynchronous srmLs processing is supported. Whether to use one or the other is entirely controlled by the server. By default dCache uses synchronous processing. Besides the differences in the communication pattern, dCache internally queues asynchronous requests, meaning that the number of concurrent list operations can be bounded, thus limiting the load on SRM and the name space server. Synchronous requests are always processed immediately.

Asynchronous srmLs is optional and can be switched on by setting the following variables in config/dCacheSetup:

	
srmAsynchronousLs=true

Further, the behavior of srmls (both asyn. and sync.) can be controlled with two other parameters:

	
srmLsMaxNumberOfEntries=1000     # 1000 is the default

which sets maximum number of allowed entries to be returned in a single srmls call, and

srmLsMaxNumberOfLevels=100   # 100 is default

sets maximum recursion depth of a single srmls call (is checked when recursion depths is specified by srmls client, default recursion depth is 1).

Asynchronous srmls scheduler parameters are similar to already existing parameters of other SRM schedulers (put, get, copy, bring-online...):

srmLsRequestThreadQueueSize=1000
srmLsRequestThreadPoolSize=30
srmLsRequestMaxWaitingRequests=1000
srmLsRequestReadyQueueSize= 1000
srmLsRequestMaxReadyRequests=60
srmLsRequestMaxNumberOfRetries=10 
srmLsRequestRetryTimeout=60000
srmLsRequestMaxNumberOfRunningBySameOwner=100
srmLsRequestLifetime=3600000

Pool

The pool code underwent major refactoring in dCache 1.9.1. In dCache 1.9.3 we are continuing this work by restructuring the part of the code that keeps track of which files are stored on the pool, including keeping track of free space, garbage collecting cached files, etc. Besides being more maintainable, this work fixes several issue with the garbage collector inherent in the old architecture.

Changes in the user interface of the pool are minor. The syntax of some of the rep commands has changed, as have some of the log messages generated while recovering broken meta data during pool startup.

dCache 1.9.1 and 1.9.2 still contained the old pool code, which could be activated by using a different pool.batch file. In dCache 1.9.3 the old pool code has been deleted and can no longer be activated.

The pool to pool component has been refined for better error handling. In particular transfers now time out if the mover on the source pool is lost. The network interface used by the pool to pool destination is now configurable through the pp listen command. If no explicit port is configured for the pool to pool component, then the port range defined by the org.dcache.net.tcp.portrange system property is now used to find a free port. This is the same port range used for DCAP transfers.

The migration module has been extended with several new features: Jobs now retain the last 20 errors. Those errors can be viewed through the migration info command. Migration jobs can now be marked as permanent. A permanent migration job does not terminate when the transfer queue becomes empty. Instead it watches whether any files on the pool change state and if they do, those files are added to the queue if they match the selection criteria. Permanent migration jobs are saved in the pool setup file if the save command is executed. Finally several new filter options were added. It is now possible to select files based on their size and last access time.

A minor technical change is that "pool up" messages are now send via the broadcast cell. Each pool sens this message every 30 seconds. In previous releases this message was send directly to the PoolManager, however starting with dCache 1.9.3 the message is sent to the broadcast cell which then forwards it to interested parties. This allows other components to be informed about pool availability without relying on the PoolManager. We highlight this change, since concerns about the performance impact on the message passing layer have been voiced. We do not believe that this will be an issue, however please contact us if you observe this to be a problem.

Data is now explicitly synced to disk after a file has been uploaded to a pool. The transfer is not reported as complete to the client until the sync has finished. The risk of loosing the file in case the pool crashes is reduced. This will have a performance impact, although it is believed to be minor for large files. Notice that data is only forced to disk after a file is uploaded. If the pool crashes during the upload, then it is unknown how much made it to the disk.

To prevent multiple pool instances to be created for the same pool data, pools now create a lock file upon startup. The pool will acquire an exclusive POSIX lock on the lock file and not release it until the pool shuts down. If a cluster file system supporting POSIX locks is used, then locking will work over multiple hosts.

Pool Manager

The pool manager internally keeps a list of requests being processed. As doors or clients time out and retry, those requests may queue up and thus increase memory usage. In some situations this can exhaust the memory available to the pool manager. The pool manager now discard such requests when it knows the door or client has timed out.

We have started to refactor the pool manager to clean up the code base and add features that have been requested but were hard to implement in the old architecture. This work mimics the work we did for the pool code in dCache 1.9.1, however the pool manager changes in dCache 1.9.3 are minor. The biggest change is that we now use the Spring Framework for instantiating components of the pool manager. This will make future refactoring easier. The only visible change from this is in the batch file and in the output of the info command. The info command output is now split into sections similar to how it is done for the pool.

The pool manager now supports the bean commands that were introduced in the new pool in dCache 1.9.1. One new command is worth mentioning. The bean property command makes it possible to query individual values of pool manager or pool components. For instance to query the amount of free space in a pool, one can use the following command in the admin interface.

[dcache-vm] (pool_0) admin > bean property account.free  
1842163632

The hope is that this will in time be a more stable solution for extracting values from cells compared to parsing the output of the info command. One should however expect the pool manager to change drastically in future releases.

FTP

Important: The weak FTP (FTP without strong authentication) is no longer read-only by default. If you use the weak FTP and want it to be read-only, you have to explicitly mark it read-only by specifying -read-only=true.

We changed several default parameters for GridFTP:

Several fixes in the FTP door improve standard compliance and compatibility. The NLST command now returns an error when invoked for something not a directory. The MDTM command now uses RFC 3659 compliant time stamps. UTF-8 encoded file names are now allowed (RFC 3659). The MODE-E-PERF feature string is now published (GFD.20). The MLST and MLSD commands from RFC 3659 and now supported. The ARC grid middle-ware makes use of these commands.

We are aware that the FTP LIST command is relatively slow. This has not improved in 1.9.3, however we now start streaming the list to the client as we collect it. Previously the complete list would be constructed in memory before a data connection was established to the client. This would cause some clients to time out on large directories.

Xrootd

The xrootd support in dCache has seen a number of minor updates: The verbosity of logging was reduced and log messages have been reclassified. Communication with PnfsManager has been reduced, thus reducing the load on PNFS and the reducing the time it takes to open a file.

GSI

We moved from using JGlobus 1.4 to JGlobus 1.6. JGlobus provides support for GSI authentication and encryption, proxy certificate handling and FTP client support. Several components in dCache rely on this library, including SRM, GridFTP, GSIDCAP, and srmCopy. The upgrade should be transparent, but we mention it here as it an important library that may affect compatibility.

JGlobus 1.6 verifies the age of the Certificate Revocation Lists (CRLs). If your CRLs are outdated, you will see authentication failures.

JGlobus 1.6 enforces signing policies. Signing policies are defined by Certificate Authorities and define a name space for certificates signed by that authority. Signing policies are defined in files in /etc/grid-security/certificates/ with the ending .signing_policy.

We upgraded from Bouncy Castle 1.33 to Bouncy Castle 1.43. Bouncy Castle is used for certificate handling.

JMS

dCache has since its conception used the Cells framework. The Cells framework has served dCache well, and we will not stop to use it. There are however better alternatives for certain aspects of the Cells framework. One of those alternatives is the Java Message Service, JMS for short. JMS specifies an API for message passing. Apache ActiveMQ is a popular and free implementation of the JMS API.

dCache 1.9.3 provides the option to switch to Apache ActiveMQ for communication between dCache cells. Apache ActiveMQ is shipped with dCache as a library, and can be easily configured to run embedded inside dCache. Care has been taken to make the change as transparent as possible, and dCache 1.9.3 supports a mixed mode in which JMS communication is used between some components, while classic cells communication is used between others.

There are currently few benefits to using JMS in dCache 1.9.3. Error handling in case of network problems may be slightly more robust. Besides being able to use an embedded broker, one may choose to use a standalone broker installed and maintained independently of dCache. There are some benefits in such a setup, including the option to use SSL/TLS for message passing and to provide a redundant message passing infrastructure with multiple brokers.

The real benefits of using JMS will not be apparent until developers utilize the much richer JMS API. However once we do so, JMS will no longer be an optional component. To this end the JMS support in dCache 1.9.3 should be seen as a stepping stone to gain experience with Apache ActiveMQ in production environments. We invite sites to try the JMS support and report whether there is any negative (or positive) impact on performance or stability. If we are satisfied with both performance and stability, then we may decide to use JMS exclusively in a future release of dCache.

To enable the embedded JMS broker in dCache, one should define messageBroker=embedded-jms in dCacheSetup of the dCacheDomain. This will embed an Apache ActiveMQ broker inside the dCacheDomain, the domain will however be able to accept both classic Cells connections and JMS connections. The dCacheDomain will act as a gateway between those two mechanisms. To switch other domains to use JMS for communication, define messageBroker=embedded-jms in their dCacheSetup file.

Scripts

The init script /opt/d-cache/bin/dcache and related scripts in /opt-dcace/bin/jobs/ have been refactored.

PID files are now stored in /var/run/. Most, if not all, supported operating systems erase the content of this directory on reboot. Hence storing PID files in this directory ensures that there are no stale PID files after a reboot. Such stale PID files have in the past caused problems in case of sporadic reboots, as they could prevent dCache from starting. The location of pid files can be defined by defining the variable pidDir in /opt/d-cache/config/dCacheSetup.

The /opt/d-cache/etc/node_config node configuration file has been extended with a new variable, SERVICES. This variable may be set to a space separated list of services or domains to start on this node. This is an alternative to setting individual service variables to no or yes. For instance rather than:

GRIDFTP=yes
SRM=yes
DCAP=yes

one may now use:

SERVICES=gridftp srm dcap

The advantage to this scheme is that the list is not limited to predefined services and may contain any domain. Thus it would be possible to add a maintenance domain to the list and have the init script start and stop it.

The logic for checking if the name space file system is mounted has been changed. Previously the init script would use df to find the mounted file systems. With Chimera this happened to be slow, as Chimera would dynamically compute how much space was used. Instead the init script now attempts to use a dot command to determine if the name space has been mounted.

A new command was added to recover corrupted Berkeley DB meta data databases found on pools. The command dcache pool reconstruct can in most cases recover all meta data into a new database. Please read the man page for details.

The install script install/install.sh script now shares much of the logic with the init script. This should provide a more consistent behaviour. The deprecreated etc/pool_path based method of creating pools is still supported, however the overwrite option is no longer respected: The install script always refuses to overwrite existing pools. As always we encourage careful testing on a preproduction system before upgrading a production installation.

The legacy init scripts bin/dcache-core and bin/dcache-pool have been removed. bin/dcache was introduced as a replacement during the 1.8.0 release cycle.

Miscellaneous

We have added rudimentary support for automatically restarting dCache domains if they experience out of memory conditions and other fatal JVM errors. There may be situations in which the restart logic is not triggered, but we will extend support for this in future releases.

Changelog 1.9.2-8 to 1.9.3-1