1.9.13 Release Notes

During this release cycle much work went into finalizing various aspects of 1.9.12. Still, the 1.9.13 release contains some interesting new features:

Table of contents

Upgrade Instructions

Incompatibilities

Please consider the following changes when upgrading from a version before 1.9.13-1:

Compatibility

It is safe to mix pools of releases 1.9.12 and 1.9.13. Head nodes and doors must be upgraded to 1.9.13 together and cannot be mixed with head nodes or doors of releases before 1.9.13. Components of different 1.9.13 releases can be mixed freely.

Compatibility Matrix

We distinguish between pool components and head nodes. Any component which is not a pool is considered a head node, including doors. The following table shows compatibility between different releases.

1.9.13 Head1.9.13 Pool
Head 1.9.12nono
1.9.13yesyes
Pool 1.9.12yesyes
1.9.13yesyes

1.9.13-4

This is the final release of 1.9.13. While earlier releases of 1.9.13 are not compatible with 2.0, 1.9.13-4 is compatible with versions of 2.0.

Service: ftp, gridftp, kerberosftp

Make timeout of write mover creation configurable. The timeout is shared with read mover creation. The timeout can be configured by adjusting the gsiftpPoolTimeout property. A side effect is that the default timeout for write mover creation increases from 10 seconds to 10 minutes.

Service: pool

Fix several HTTP compliance issues related to range request (ie partial reads).

Fix several race conditions that could lead to stale hidden movers, TCP connection leaks and link count inconsistencies.

Service: pnfsmanager, nfsv3, nvfsv41

Fix name space corruption when moving directories (chimera only).

Fix RFC 5661 compliance of ACL delete permission check. A file or directory can now be deleted if the subject has the DELETE_CHILD permission on the parent directory, or the DELETE permission on the entry being deleted, or in case neither are specified if the subject has the ADD_FILE permission on the parent directory.

Service: pnfsmanger

Fix instantiation with PNFS backend.

Service: poolmanager

Fix a race condition.

Service: ftp, gridftp, kerberosftp

Add glob support to the NLST command. This allows mget to work in some clients.

Service: dcap, gsidcap, kerberosdcap

Resolve an issue when the dcap client is in active mode (ie when the -A option is used) and the server cannot resolve the host name sent by the client (eg when the client is behind NAT). Instead the door now uses the remote address of the control channel when selecting an appropriate pool.

Improve logging in case of errors during GSI or Kerberos sessions.

Service: pinmanager

Improved robustness of pinmanager. In particular an out-of-memory problem has been fixed.

Service: admin

Fix several bugs related to line editing. Add history search (press Ctrl-R).

An unfortunate side effect of these fixes is that line editing capabilities need to know the terminal size. Some ssh clients do not transmit terminal resize messages. For those clients resizing the terminal after login to the admin service will break line editing.

Fix listing of admin service in dcache services.

Service: hopping

Add service definition for hopping manager. The name of the new service is hopping.

Service: srm

Improve robustness against ill formed srmLs requests.

Info provider

Fix GLUE2 compliance. Be sure you have at least v2.0.8 of glue-schema RPM installed on the node running the info provider.

Miscellaneous

Improve documentation in configuration defaults.

Fix well known cell name resolving when using JMS messaging.

Changelog 1.9.13-3 to 1.9.13-4

1.9.13-3

Service: poolmanager

Fixed display of the -max-options option.

Fixed a race condition that caused a null pointer exception in rebalancer.

Fixed persistence of the slope setting.

Updated format of PoolManager.conf such that settings are persistet using the partition manager commands introduced in dCache 1.7. The legacy commands are still supported, but will be removed in a future version. Starting with version 1.9.12-9 the pool manager save command save in a format that will be compatible with future versions.

Service: spacemanager

Fixed pool selection cost model for systems with space manager enabled. Without the fix a system with space manager would perform pool selection with outdated cost information.

Service: pnfsmanager

Fixed batch file errors for PNFS backend.

Fixed a bug in the PNFS backend that prevented listing through SRM.

Service: pool

Ensure forward compatibility of the HTTP mover.

Fixed bug in which failure to stage would leave behind a repository entry in state D. The only way to get rid of the entry was by restarting the poo.

Avoid verbose logging of NoSuchElementException when listing movers. The fix possible solves similar problems for other operations and other services.

Fixed file system size check for the dcache pool create command. It didn't actually check the file system size and in some cases it caused the operation to fail, leaving an incomplete pool behind.

Avoid verbose logging of non-fatal DCAP client failures.

Fixed the 'dcache pool reconstruct' on Solaris. Without the fix the command would fail with the error "File name too long".

Fixed URI generation in hsmcp.rb. The pool now also logs incorrect URIs returned by the HSM script.

Fixed HTTP range support. Added support for multiple ranges. Fixed path check. These fixes resolve compatibility issues with ARC and ROOT.

Service: billing

Fixed a race condition related to formating of timestamps.

Service: ftp, gridftp, kerberosftp

Added a workaround for a bug in Globus Online.

Fixed a race condition that would cause the door to become unresponsive to client commands.

Fixed a potential denial of service problem.

Fixed a null pointer exception triggered by running the info command before the client was authenticated.

Avoid verbose logging when clients do not disconnect cleanly.

Services: ftp,gridftp,kerberosftp,xrootd,webdav

Use correct transaction duration for billing records.

Service: srm,webdav,webadmin

Upgraded to Jetty 7.3.1.

Service: webdav

Made HTML directory listing sortable and fixed minor layout problems.

Service: xrootd

Improved robustness of transfer timeout logic.

Changelog 1.9.13-1 to 1.9.13-3

1.9.13-2

Not released

1.9.13-1

Space manager

Added the database index srmspacefile_pnfspath_stat_idx to avoid bad PostgreSQL query plans triggered by outdated table statistics. The schema is automatically updated the first time space manager starts.

Pool manager

Pool manager is used by doors to perform pool selection. Essentially, pool manager routes transfer to pools, control staging from tape and coordinates pool to pool transfers.

In previous versions the retry logic in case of pool selection failures was placed inside pool manager. The consequence of that design decision was that doors would never know what was happening inside pool manager: Was a file being staged or copied, or was the transfer suspended because the pool with the file was offline. Another consequence was that pool manager needed logic to query file meta data from PNFS manager. The query logic replicated similar logic already present in doors and would add latency to the pool selection process.

Starting with dCache 1.9.13 pool manager never retries requests internally. Instead a pool selection failure causes the request to fail and be sent back to the door. It is at the discretion of the door to query PNFS manager for updated meta data and to retry the request.

The consequence is that pool selection latency is reduced and that the retry logic can be tuned for every type of door. For instance, xrootd doors can rely on clients retrying requests and the door thus propagates a failure all the way back to the client. The SRM door on the other hand may return SRM_FILE_UNAVAILABLE, letting the client know that the pool with the file is offline. An FTP door will retry the pool selection internally.

The logic for suspending requests has not changed. A request that repeatedly fails will eventually get suspended. As before, doors will wait for a suspended request to be unsuspended.

Pin manager

Pin manager is used by SRM and DCAP to trigger staging from tape and to ensure that the file is not garbage collected for a certain amount of time. It does this by placing a sticky flag (a pin) on the file on one of the pools.

In previous versions pin manager would unconditionally delegate pool selection to pool manager. Starting with dCache 1.9.13, pin manager will handle some cases without delegating pool selection to pool manager. This is the case when a file is already online, or when a disk only file is offline. In other cases, eg when a pool to pool transfer or a stage from tape is required, pin manager continues to delegate pool selection to pool manager.

The benefit of running the pool selection algorithm in pin manager is that it reduces latency for the common cases that don't require any internal transfers. It also reduces load on pool manager.

Pool selection in pin manager is implemented by periodically exporting a snapshot of the configuration and pool status information from pool manager. Changes to the pool manager configuration may take up to 30 seconds to propagate to pin manager.

Pools

Minor usability improvements in the command line interface.

The checksum scanner has been extended with configurable continous background checksuming. Any checksum errors are logged and files are marked as broken and will not be available for download. The new -scrub option to the csm set policy command allows the feature to be enabled. Consult the help output of that command for information about setting throughput limits and scan frequency.

PNFS manager

ACL support has been moved into Chimera. ACLs are no longer supported with the PNFS name space backend.

Permission checks have been optimized for Chimera. The consequence is that fewer round trips to the database are made, thus increasing throughput and reducing latency.

Earlier releases of dCache did not implement the correct POSIX semantics for lookup permissions: Only lookup permissions of the parent directory would be enforced. This was done to improve performance with the PNFS backend. The default behaviour is unchanged in dCache 1.9.13. Setting the new configuration property pnfsVerifyAllLookups true will turn on POSIX semantics. The property is only supported with the Chimera backend.

Chimera

The stored PostgreSQL procedures used by Chimera have been updated. During upgrade, the SQL script to create/update the stored procedures has to be applied:

psql -f /opt/d-cache/libexec/chimera/sql/pgsql-procedures.sql chimera

SRM

The SRM list operation provides information about file locality, among other things. In previous versions the SRM door would query pool manager to compute the file locality for each file being listed. Starting with dCache 1.9.13 the file locality is computed internally in the SRM. The effect is that latency is reduced. The algorithm relies on a periodic snapshot of the pool manager configuration and pool state being transferred from pool manager to the SRM door (similar to how it is done in pin manager).

dCache 1.9.13 introduces the srmPinOnlineFiles property. This property controls whether dCache pins files that have ONLINE access-latency. If set to false then dCache will refrain from pinning ONLINE files; dCache still ensures that the file is available on a read pool before returning the transfer URL to the client, but no guarantee is made that the file will not be garbage collected before the transfer URL expires.

Pinning ONLINE files

In previous versions of dCache, when SRM clients asks dCache to prepare a file for download, the SRM door would always ask the pin manager to pin the file. This was to ensure that the file is indeed online, that the file's data is available on a pool the user may read from, and that the data will not be garbage collected during the transfer URL's lifetime. A correct implementation of the SRM protocol must provide these three guarantees so, for the general case, pinning is required even when access latency is ONLINE.

The disadvantage of always pinning ONLINE files is that it introduces latency that, in many cases, is unnecessary; for example, if a file is permanently available on a pool that the end user can read from then pinning the file is unnecessary.

Some dCache deployments only store files on pools that are readable: they have no pools dedicated for writing or staging. Pinning ONLINE files isn't required for such deployments as dCache already makes the necessary guarantees.

Other sites may know that the risk of a replicated file becoming garbage-collected during the lifetime of the transfer URL is small. If it is garbage-collected then opening the file will still succeed, but will incur a delay. The site-admin may know that their user community will accept this small risk in exchange for improved throughput, in which case pinning ONLINE files is unnecessary.

A side effect of disabling srmPinOnlineFiles is that it becomes possible to setup a tapeless system without pin manager. The default access latency in dCache is however NEARLINE, even when no HSM system is attached. The access latency has to be changed to ONLINE if dCache is to run without a pin manager.

SRM keeps score of free and used space in dCache. To do this, previous versions of the SRM door would periodically query each pool for such information. Starting with dCache 1.9.13 the information is computed using the snapshot of the pool manager state periodically copied from pool manager to the SRM door. The effect is that the information is more up to date and that fewer queries need to be sent to the pools.

Due to the changes to pin manager, SRM can now report SRM_FILE_UNAVAILABLE if files are offline, that is, when the pools holding the file are down and no tape copy is available.

Upgraded from Axis 1.3 to Axis 1.4. The new version provides minor bug fixes to SOAP handling.

FTP

The plain FTP door now supports gPlazma. The existing configuration properties useGPlazmaAuthorizationModule and useGPlazmaAuthorizationCell are used to control whether and how gPlazma is used.

Please note that password authentication is only supported by gPlazma 2.

NFS

The dCache version and build is now available to NFS 4 clients.

TODO: Principal mapping; what's the status?

gPlazma

The misspelled configuration property gPlazmaNumberOfSimutaneousRequests has been deprecated and replaced by gPlazmaNumberOfSimultaneousRequests.

Chimera client tools

The ACL command line clients have been merged into the Chimera client tools.

Bash integration

A Bash tab-completion script is provided. It provides rudimentary tab completion support for the dcache script.

Changelog 1.9.12-1 to 1.9.13-1

Greyed out entries have been merged into the 1.9.12 branch.