1.9.5 Release Notes

The main focus areas of the 1.9.5 release are optimisations and polish ahead of LHC data taking. This release is expected to be maintained throughout the first LHC data taking run.

Upgrade Instructions

Incompatibilities

Please consider the following changes when upgrading from a version before 1.9.5-1:

Compatibility

It is safe to mix pools of releases 1.9.4 and 1.9.5. Head nodes and doors must be upgraded to 1.9.5 together and cannot be mixed with head nodes or doors of releases before 1.9.5. Components of different 1.9.5 releases can be mixed freely.

Compatibility Matrix

We distinguish between pool components and head nodes. Any component which is not a pool is considered a head node, including doors. The following table shows compatibility between different releases.

1.9.5-29 Head1.9.5-29 Pool
Head 1.9.1-1..7,9..11nono
1.9.1-8nono
1.9.2-1..5,8..11nono
1.9.2-6,7nono
1.9.3-1..4nono
1.9.4-1..7nono
1.9.5-1..28yes yes
Pool 1.9.1-1..7,9..11yesyes[2]
1.9.1-8yes[1]yes[2]
1.9.2-1..5,8..11yesyes[2]
1.9.2-6,7yes[1]yes[2]
1.9.3-1..4yesyes[2]
1.9.4-1..7yesyes
1.9.5-1..28yesyes
  1. The migration module will not work for -target=pgroup and -target=link.
  2. The migration module does not work.

1.9.5-30

Chimera

Added support for chown, chgrp and chmod on a directory tags

1.9.5-29

Pool

fixed behavior on removal of non exisintg files.

Doors

fixed request duration value sent to the billing. In the past all doors except dcap was sending request timestamp instead of time interval.

dcap

cleaned log messages in case of invalig gsi/kerberos credentials

Detailed changelog 1.9.5-28 to 1.9.5-29

1.9.5-28

Space manager

Fixed pool selection cost model for systems with space manager enabled. Without the fix a system with space manager would perform pool selection with outdated cost information.

FTP

Implemented wildcard matching for the NLST command. This allows ensure compatibility with the mget command in many clients.

Detailed changelog 1.9.5-27 to 1.9.5-28

1.9.5-27

gPlazma

Fixed banning in gplazmalite-vorole-mapping plugin. Adding the entry:

"DN" -

will now ban the DN no matter whether it is used in combination with an FQAN or not. Note that banning in the gplazmalite-vorole-mapping plugin doesn't prevent other plugins from authorizing the request.

Fixed a bug which would cause a missing entry in storage-authzdb to authorize a user rather than reject the request. The problem would only occur if grid-vorolemap mapped a user to a user name that isn't mapped in storage-authzdb.

Info provider

Removed the GlueSAAccessControlBaseRule work-around that publishes legacy values for broken clients. Any problems with lcg-infosites should be solved by upgrading to the latest version of the UI.

Pools

Improve error handling in case file deletion fails.

Pin manager

Fixed a race condition when moving pins on behalf of the migration module.

RPM

Added provides to make the RPM work with gLite meta packages.

Detailed changelog 1.9.5-26 to 1.9.5-27

1.9.5-26

Pin manager, pool manager and DCAP

Fixed an issue that broke cost management when bulk staging files already online. The issue could trigger cost based file replication and restaging of files.

gPlazma

Fixed an issue with multiple FQANs being collapsed when they should not.

NFS

Fixed a permission problem on rename. Improved performance.

DCAP

Fixed a race condition.

Detailed changelog 1.9.5-25 to 1.9.5-26

1.9.5-25

PNFS manager

Improved stability.

Pool manager

Pool manager now logs when pools change state. The messages are logged at warning level.

Pool

Fixed a bug in the migration module that could cause migration jobs to never terminate. When the transfer queue became empty it would stay in the RUNNING state indefinitely.

Fixed a deadlock in space allocation.

Xrootd door

Added an implementation of the kXR_prepare message. The current implementation confirms the request, but ignores it otherwise. This is allowed according to the spec. A complete implementation will be added in future versions of dCache. The present implementation restores compatibility with the latest CMS software release.

Added logging support for the Netty IO framework on which the xrootd door is implemented.

SRM

Fixed an IllegalArgumentException in case of clock shift.

Fixed misreported file locality.

httpd

Fixed security issue.

Detailed changelog 1.9.5-24 to 1.9.5-25

1.9.5-24

FTP

Fixed partial retrieve of files in GridFTP mode E.

RFC 3659 compliance. This improved compatibility with Uberftp.

Pool

Fixed cancellation of pool to pool transfers. The symptom of the problem was that the task would not be removed from the pool and would stay in Failed forever.

Fixed a bug that caused the pool to reset the file size in the name space after pool to pool transfers and after staging the file from tape.

Fixed a bug that could cause files to be deregistered from the name space.

Solaris

Fixed Solaris compatibility of the shell scripts used by dCache.

Chimera

Fixed a problem in Chimera that caused the SRM to report the locality of a directory as UNAVAILABLE rather than NONE.

Enstore support was added. Please contact support@dcache.org for details.

PinManager

Fixed message timeout handling.

SRM

Fixed an issue with implicit space reservation. The symptom of the problem was that the error message Record(s) Already Exist was generated.

Detailed changelog 1.9.5-23 to 1.9.5-24

1.9.5-23

Space manager

Database access was optimized to make better use of indexes.

Pool

Fixed read from pool that was disabled with the -rdonly flag of the pool disable command.

Fixed xrootd mover TCP port allocation to avoid reusing a port until previous transfers on the port have finished.

Fixed implementation of rep ls -l=c command.

SRM

Fixed file locality computation when file is present on a write pool.

Detailed changelog 1.9.5-22 to 1.9.5-23

1.9.5-22

Pools

Fixed several bugs in the xrootd mover. These bugs severily impacted performance and any site using xrootd in dCache are urged to upgrade their pools.

Failure handling in the migration module has been fixed. The bug caused migration jobs to get stuck.

FTP Doors

A misleading error message for directory listing was fixed.

A socket leak in the proxy component for active transfers was fixed.

PoolManager

The settings of the psu set regex on and psu set allpoolsactive are now stored in config/PoolManager.conf when the save command is executed.

Detailed changelog 1.9.5-21 to 1.9.5-22

1.9.5-21

dCacheConfigure

Configure the information system correctly so that it correctly specifies GlueVOInfoPath.

Improve logging of dcache configuration.

Pool

Fixed a race condition.

FTP

Fixed RFC 3659 compliance in MLSD implementation.

PnfsManager

Fixed bugs in request folding. The bugs only affected system with pnfsFolding=true with PNFS as the name space backend. The bugs caused failures with DCAP. For other doors when permissionPolicyEnforcementPoint=PnfsManager, the bugs could in some cases lead to permission checks not being enforced.

SpaceManager

Fixed regression introduced in 1.9.5-17. The regression caused failures with non-voms proxies (e.g. grid-proxy).

Detailed changelog 1.9.5-20 to 1.9.5-21

1.9.5-20

Tape Protection

The improved tape protection system is described in our wiki.

Detailed changelog 1.9.5-19 to 1.9.5-20

1.9.5-19

SRM

Fixed regression introduced in 1.9.5-18.

Detailed changelog 1.9.5-18 to 1.9.5-19

1.9.5-18

FTP

When authorizing stage requests, FTP doors used to only consider the primary FQAN of the user. This has now been fixed to consider all FQANs.

Fixed several race conditions in the proxy component of the FTP door. Those race conditions would cause proxied active transfers to get stuck.

Space Manager

Fixed an issue that caused SRM internal retries of srmCopy requests to fail.

Pool

Fixed a network interface enumeration problem in the xrootd mover. The bug would cause the xrootd mover to fail for some combination of source IP and particular network setups.

Fixed an out of memory situation caused by huge broken control files being read into memory before being identified as corrupted and discarded.

SRM

The output of the administrative info command has been changed such that it lists the protocols actually supported by this dCache installation rather than a hardcoded list of all protocols.

Info provider

Removed the duplicated GlueSAFreeOnlineSize property and added the missing GlueSATotalOnlineSize. Please copy glue-1.3.xml.template to glue-1.3.xml and reapply configuration changes to deploy this fix.

Chimera

Chimera will now honour the file-mode specified by client when creating a new file.

httpd service

Added support for starting the PoolCollector cell and start it by default. The PoolCollector will update the web-page with new pool information as pools change their configuration. Support for this service existed in earlier releases but enabling it required editing the httpd.batch file. Setting the httpdEnablePoolCollector property to any value other than 'yes' will disable starting the PoolCollector cell.

gsiftp, gsidcap, SRM, gPlazma

Fixed a bug that prevented CRL files from being reloaded. The bug would cause the CRL in memory to eventually expire and blacklist certificates from the corresponding CA.

Detailed changelog 1.9.5-17 to 1.9.5-18

1.9.5-17

Space Manager

Fix reserve admin command so it works if no retention policy is specified.

SRM

Fix handling of srmSetPermission operation to allow interoperability with DPM.

Provide Space Manager with all VOMS groups when a user attempts to reserve or release space. (NB: doesn't affect implicit space reservation)

FTP client

Check whether the remote FTP server supports uploading checksums before attempting to upload a file's checksum. This fixes an interoperability issue with DPM.

dcap door

The dcap door now rejects certain malformed URLs created by incorrect client usage of dcap client software.

Web-admin

Cells are initially hidden in the Cell Services page until they start to respond. If a cell should go off-line then it will continue to be displayed and marked as being off-line (as happens prior to 1.6.5-17).

Pool

Relax some over zealous internal checks as the pool accounting design allows for brief periods of inconsistency between certain pool metrics.

Migrating namespace

Fix bug that resulted in the storage-info migration test to fail.

Info provider

Fix bug where some sites would publish only a subset of their SRM reservations.

JMS

Fix a couple of minor timing issues that resulted in some harmless errors during starting up and incorrect behaviour when resolving well-known cells under rare conditions.

Detailed changelog 1.9.5-16 to 1.9.5-17

1.9.5-16

Space Manager

Fixed update space reservation command such that lifetime="-1" sets reservation to be non-expiring.

FTP door

Implemented gPlazma support in Kerberos FTP door.

Added support for running multiple FTP doors on the same host. The mechanism for doing this is the same as for running multiple DCAP doors on the same host.

PNFS Manager

Improved handling of flush notification. This improves the error reporting when using HSM integration scripts using the URI approach.

Pin Manager

Added the bulk unpin command.

InfoProvider

Added missing GlueChunkKey attributes for nearline and offline accounting.

Pool

Fixed POSIX shell compliance of bin/meta2yaml script.

All

Flagged libexec/infoprovidercms.rb as an executable.

Detailed changelog 1.9.5-15 to 1.9.5-16

1.9.5-15

Chimera

Fixed a problem with setting file mode for files created through the mounted NFS 3 file system. The file mode for such files was previously hardcoded. This problem did not affect files created through dCache.

Pool

Fixed several interoperability issues and bugs in the xrootd mover. If xrootd is used we strongly recommend upgrading to 1.9.5-15.

Pool Manager

Fixed a bug in which Pool Manager in rare occasions could create an orphaned file.

DCAP

Fixed an issue with stage authentication.

All

Fixed a compatibility issues with Solaris grep.

Detailed changelog 1.9.5-14 to 1.9.5-15

1.9.5-14

All

Fixed a Solaris compatibility issue in the init scripts introduced in 1.9.5-13. This bug prevented domains from starting on Solaris.

The log4j related commands previously exposed through the System cell are now available in all cells. The commands still affect the logging of the complete domain.

SRM

Fixed a bug in the init script that caused the init script to fail to stop or restart the SRM. The bug was introduced in 1.9.5-13.

Info

Fixed a potential starvation issue in the info service. This should improve responsiveness of the info service on systems that push lots of updates into the info service.

Detailed changelog 1.9.5-13 to 1.9.5-14

1.9.5-13

SRM

The DB schema detection code for the srmLs related tables has been fixed. The bug caused those tables to be recreated on every restart.

PoolManager

An error propagation bug was fixed. The bug caused failure to resolve client host names to be retried forever.

Chimera

A compatibility issue between the Chimera NFS daemon and some Linux distributions (e.g. Ubuntu) has been fixed. The issue caused errors to be printed during startup of the NFS daemon and the daemon took a long time to start.

DCAP

Failure logging in the DCAP door was improved. A number of non-critical stack traces have been removed.

Pool

The migration module now ignores files that have been deleted in the name space. Before such files would be retried until the file was removed by the cleaner.

The fsync failure handling that was changed in 1.9.5-12 has been changed again. In 1.9.5-12 failure to fsync caused a pool to be disabled. We have discovered a number of common sources to fsync failures that are not to be considered critical. Hence in 1.9.5-13 we changed the behaviour such that failure to fsync causes a transfer to fail, but the pool is not disabled.

A bug related to HSM script failures has been fixed. The bug would cause the pool to essentially ignore an error code returned by the HSM script when trying to stage a file. The stage would still fail if checksum verification was turned on (which it is by default), but it would cause the error message in the log files to be misleading.

All

Error reporting in the dCache batch processor has been improved. The line number and file name is now reported on fatal errors.

A bug in the dCache init script (bin/dcache) related to stopping domains has been fixed. The bug would cause domains that were in the middle of an automatic restart to be ignored by the stop command. Also the start and restart commands would not recognize such domains as already running and would start a second instance. Part of the fix is that the bin/dcache status command now shows when a domain is restarting. One consequence of the fix is that we now store two PID files; one for the Java process and one for the shell script that handles the restart logic. The scripts shipped with dCache have been updated to handle this change, but if you have custom scripts that rely on the PID files, then those need to be updated.

A script compatibility issue with Solaris has been fixed. The bug caused the automatic restart of domains to fail on Solaris.

The init script now uses an environment variable to pass the CLASSPATH to the Java Virtual Machine. This solves a problem where the command line length exceeded the maximum length allowed. It also makes the output of utilities like ps nicer to read as the excessively long CLASSPATH is no longer included.

Detailed changelog 1.9.5-12 to 1.9.5-13

1.9.5-12

A new version of PNFS was released in November 2009. The new version supports registering all deleted files in a special table in the PNFS database. dCache 1.9.5-12 has new options to configure the PnfsManager to let dCache consult this table when it cannot find a PNFS entry for a file. If found in this table, dCache pools will delete such orphaned files. The configuration options are pnfsDeleteRegistration, pnfsDeleteRegistrationDbUser, pnfsDeleteRegistrationDbPass. Please consult etc/dCacheSetup.template for details.

An issue with recovery of broken SI files in the pool's control/ directory has been fixed. That issue prevented pools with such files from starting. With dCache 1.9.5-12 the SI file is recovered from PnfsManager.

Another problem affecting recovery of meta data on pools was that the recovery logic did not respect the lfs=volatile setting. This has been fixed to correctly mark files on such pools as cached.

Starting with dCache 1.9.3-1, dCache would call fsync(2) after upload to synchronize the new file's in-core state with the storage device. Failure during sync would be logged, but was not considered a fatal error. After talking to several sites, we have decided to consider failure to sync to be a fatal error. Starting with dCache 1.9.5-12 the transfer will be considered failed and the pool will be disabled.

The command to stage files administratively through the PinManager was broken since dCache 1.9.5-1. This problem has been fixed.

Chimera was updated such that it respects subnet masks in the export file.

The callout to the HSM script on dCache pools has been updated such that all HSM locations of a file are passed to the script.

dCache supports running multiple DCAP doors on the same host. Support for this has however been broken since dCache 1.9.3-1 due to bugs in the init script. This has now been fixed.

For other changes, please consult the detailed changelog below.

Detailed changelog 1.9.5-11 to 1.9.5-12

1.9.5-11

An XROOTD write corruption issue has been fixed in the pool. This issue was introduced in version 1.9.5-1 and we recommend that all data written through XROOTD to an affected version of the pool code is verified for integrity. Notice that the dCache checksum cannot be used to verify the integrity of the file, as the checksum is computed from corrupted data. We urge all sites running a prior version of 1.9.5 and using XROOTD for writing into dCache to immediately upgrade to at least version 1.9.5-11. dCache 1.9.6-1 is affected from the same issue. dCache 1.9.6-2 contains the fix.

After reverting to JGlobus 1.4 in dCache 1.9.5-10 we discovered that the problem we tried to avoid also existed in that version of the library. Thus in dCache 1.9.5-11 we are back to using JGlobus 1.7.0. In addition we have found and fixed at least one race condition that could lead to GSI authentication failures in GridFTP and the SRM.

The Kerberos FTP door has been broken since the release of dCache 1.9.5-1. The problem has been resolved in dCache 1.9.5-11.

A couple of NullPointerException fixes have been made in GPlazma and in the pool migration module.

Detailed changelog 1.9.5-10 to 1.9.5-11

1.9.5-10

A security issue was fixed in the DCAP door. We recommend all sites to upgrade the DCAP doors.

JGlobus was downgraded to version 1.4. In dCache 1.9.3 we moved to JGlobus 1.6 and in dCache 1.9.5 we moved to JGlobus 1.7. We have however recently received reports of GSI authentication issues with the SRM. When the issue occurs, certificates from a subset of the CAs are rejected for no apparent reason. A restart resolved the issue, however then certificates from other CAs could begin to fail. Given the current status of the LHC, we decided to downgrade to JGlobus 1.4 until this issue has been resolved. This unfortunately also means we reverted to the pre-1.9.3 behaviour regarding CRL handling.

A race condition was fixed in the pool code. On pools with -replicaOnArrival enabled, this race condition has on some sites been reported to cause 10% upload failures. The symptoms are that the upload hangs at the end of the transfer. We recommend that all sites using the -replicaOnArrival option upgrade their pools.

Since dCache 1.9.3-1, the PoolManager had a bug which caused it to cluster writes. This bug has now been resolved and version 1.9.5-10 should achieve much better write balancing. The write pool selection logic has also been refined to take the file size into account when selecting a write pool. In particular when using random pool selection this should reduce the risk of sending several writes to an almost full pool at once.

The info provider now publishes the GlueSESizeTotal, GlueSESizeFree and GlueSAReservedOnlineSize attributes. The name space implementation used is now published as part of the GlueSEImplementationVersion attribute. An example of the new version string is "1.9.5-10 (ns=Chimera)". Finally, the info provider now correctly publishes information related to the default store unit. Please consult the information in etc/glue-1.3.xml.template regarding the default-store-unit-VOs list.

The dCacheConfigure.sh script has been updated to work with the current information provider.

The SRM component had an annoying habit of leaving the files createDDL.jdbc and dropDDL.jdbc in the current working directory during startup. This is now fixed and those files are now always created in /opt/d-cache/etc/. The files are only created if they do not already exist. Once created, the files are not modified.

The SSH admin shell of dCache had an issue with how it closed the connection to the SSH client. This caused trouble in scripts, as the output from the session would sometimes be incomplete. This issue has now been fixed.

A bug related to the -lfs option was fixed in the pool. The bug caused the option -lfs=volatile to be interpreted as if it had been -lfs=precious.

An issue in the interaction between the migration module and pin manager has been resolved: It some cases a pool may have a sticky bit on a file owned by the PinManager which the PinManager no longer has in its database. In previous versions, the PinManager would fail to remove such a sticky bit, thus causing the migration module to retry the move indefinitely. Now the PinManager removes the sticky bit even when it no longer appears in its database.

The GridFTP door now implements the ALLO command. The command was already accepted in previous versions, however it was ignored internally. The ALLO command allows the client to specify an upper bound on the size of a file before upload. dCache now uses this value to choose a pool with sufficient space to hold the file. This is in particular important when using random pool selection. Unfortunately many GridFTP clients do not support the ALLO command.

A NullPointerException was fixed in the DB backend of the billing cell.

Server side SRM copy now uses 8443 as a default port if the SURL lacks a port number.

Several minor fixes have been made in Chimera scripts, NFS4 door, PnfsManager and other components. Please consult the detailed change log below.

Detailed changelog 1.9.5-9 to 1.9.5-10

1.9.5-9

The info provider, glue-1.3.xml, was extended with support for nearline accounting. WLCG requires that this information take into account tape compression; therefore, we cannot use any component within dCache to figure out nearline information. Published information must come from an external source. This external source is an XML file tape-info.xml, which is at a fixed location. The default contents of this file will publish no nearline space, but the site-admin may override the file with the information they wish to publish. The anticipation is that the site admin writes a small script that creates the file with up-to-date information. A description of this file's format is in the /opt/d-cache/share/xml/tape-info-empty.xml file.

The handling of the SRM_PARTIAL_SUCCESS return code was fixed in the server side srmCopy implementation. The bug caused a compatibility issue with Castor.

PoolManager write balancing was broken since 1.9.3-1. This problem would mean that many concurrent writes had a tendency to go to the same pool. With 1.9.5-9 the old behaviour should now be restored.

A race condition in a meta data printing routine was fixed. This race condition could lead to ArrayIndexOutOfBoundsExceptions in SRM and other components.

Version 1.9.5-3 added an explicit mover kill at the end of a DCAP session. In some cases this could lead to an excessive amount of log messages about missing movers in pools. In version 1.9.5-9 we have therefore reduced the log level of this message.

Logging of uncaught exceptions has been improved in several components. In older versions, critical errors could in some cases go unnoticed because the error was not logged.

For other fixes, see the detailed change log below.

Detailed changelog 1.9.5-8 to 1.9.5-9

1.9.5-8

Several minor typos have been fixed in the info provider. Those fixes improve the standard compliance of the info provider output.

The stage protection that was introduced in 1.9.4-1 would break access through SRM when enabled. This problem has now been fixed. The problem only occurred when the stage permission enforcement point was placed in the doors.

Passive mode DCAP reliability has been improved. In rare cases two concurrent transfers could cross and cause file corruption.

In dCache 1.9.3 we upgraded from JGlobus 1.4 to JGlobus 1.6. The upgrade introduced enforcement of Certificate Revocation Lists. Unfortunately this new functionality also introduced a performance regression that would reduce the scalability of all dCache components that rely on GSI. We have submitted a patch for this issue to the JGlobus project. dCache 1.9.5-8 includes this patch.

Detailed changelog 1.9.5-7 to 1.9.5-8

1.9.5-7

Previous releases of 1.9.5 contained a bug in the new directory listing code. The bug caused listings of directories containing more than 100 entries to sometimes be incomplete. The issue affected all doors supporting directory listing, but did not affect listing of individual files. This issue is fixed in 1.9.5-7.

The ls command in the SRM cell (that is the command to list SRM requests, not a directory list command) was non-functional in previous 1.9.5 releases. This functionality has now been restored.

The info provider has seen a few more WLCG compatibility fixes, and the NFS 4.1 door also got a protocol compliance fix.

Detailed changelog 1.9.5-6 to 1.9.5-7

1.9.5-6

We continued to receive reports that 1.9.5-5 generated root owned directories with PNFS. With the help of PIC and FZK we have now finally identified the real cause of the problem. The fix for this problem has been confirmed to work at FZK and PIC, and the fix is included in dCache 1.9.5-6.

We have also received reports about problems with the new info provider included since 1.9.5-4. We believe most of these have been tracked down, including bugs in numerous client applications (see the following Savannah tickets: 57787, 57799, 57840, and 57827). dCache 1.9.5-6 includes fixes and workarounds for all issued we could identify.

Despite our claims that 1.9.5-1 supports ACLs for a number of SRM operations, previous 1.9.5 releases did not interpret ACLs for any SRM requests. This has now been resolved and srmRm, srmMove, srmMkdir and srmLs requests are subject to ACLs. There is still one known issue: ACLs that restrict access based on the source IP of a request do now work with the SRM. This will be fixed in the next release.

In all releases since 1.9.1-1, the pool would in some cases ignore certain pool parameters. In particular the tag.hostname parameter would in some cases be dropped, but other parameters could be affected as well. This has been fixed in 1.9.5-6.

For non-SRM transfers, the message Already have 1 records(s) would be printed in the log file. The transfers would complete and space management was correct. dCache 1.9.5-6 eliminates the log message for non-SRM transfers.

Previous versions of Chimera contained a bug in the PNFS ID to path mapping. The bug broke directory listing from DCAP. This issue has now been fixed.

Other changes:

Detailed changelog 1.9.5-5 to 1.9.5-6

1.9.5-5

Detailed changelog 1.9.5-4 to 1.9.5-5

1.9.5-4

The previous patch level release contained an RPM dependency on the java-package RPM. This broke compatibility with a number of distributions that did not provide this package. The dCache 1.9.5-4 release removes this dependency.

In addition to the dependency change, the following fixes are included:

Detailed changelog 1.9.5-3 to 1.9.5-4

1.9.5-3

Detailed changelog 1.9.5-2 to 1.9.5-3

1.9.5-2

Detailed changelog 1.9.5-1 to 1.9.5-2

1.9.5-1

Permission Checking and ACLs

In dCache, file permission checking such as for create, read, and delete has traditionally been the responsibility of the doors. Starting with the 1.9.5 release, this check can optionally be moved to PnfsManager. Besides the structural benefits of enforcing permissions at a single point, there are performance gains from avoiding extra round trips between the door and PnfsManager. To enable permission checking inside PnfsManager, define permissionPolicyEnforcementPoint to PnfsManager in config/dCacheSetup of doors:

permissionPolicyEnforcementPoint=PnfsManager

Currently PnfsManager based permission checking is only fully supported by the DCAP and FTP doors. For the SRM door, permission checking for srmRm, srmMove, srmMkdir and srmLs is always delegated to PnfsManager, no matter the definition of permissionPolicyEnforcementPoint. This speeds up those operations and enforces ACLs if ACLs are enabled in the PnfsManager. Permission checking for other SRM operations are still performed in the SRM door and are not subject to ACLs yet.

The configuration parameter PermissionHandlerDataSource was removed. Permission handlers in doors now always query meta data from PnfsManager rather than from the mounted name space file system.

Enabling ACLs has been simplified in dCache 1.9.5. To enable ACLs, define aclEnabled to true in config/dCacheSetup of doors and PnfsManager. This must be done in addition to defining the database connection parameters. There is no longer a need to redefine the permissionHandler parameter - it is however still respected if defined. If ACLs are used, these must now also be configured in pnfsDomain or chimeraDomain.

Directory Listing

Directory listing in FTP, SRM and the dirDomain used to be performed on the mounted name space. Starting with 1.9.5, doors request the directory listing from the PnfsManager.

One consequence is that it is no longer required to mount the name space on FTP doors and the dirDomain. SRM still uses the mounted file system for some other list related operations. With PNFS, PnfsManager must have access to the mounted name space. With Chimera, even PnfsManager does not require the mounted file system.

Another consequence is that directory listing through FTP is now significantly faster.

PnfsManager executes the directory listing on dedicated threads. The number of threads used is defined by the parameter pnfsNumberOfListThreads in config/dCacheSetup.

For Chimera, the directoryLookupPool was previously started inside the chimeraDomain. Starting with version 1.9.5, the regular directoryLookupPool in the dirDomain works with Chimera, and thus chimeraDomain no longer contains directoryLookupPool.

Starting dCache as an unprivileged user

Until dCache 1.9.5, there was no support for running dCache as a user different from root. This has now changed. If the variable user is defined in config/dCacheSetup, then the init scripts will drop privileges and start dCache as that user.

Log files are still generated as root, which means they can still be written to the default location of /var/log/. Ownership of PID files is changed to the unprivileged user, which means they can still be written to the default location of /var/run/. To support automatic restart, the dCache init script generates a stop file to surpress restarts when dcache stop is executed. This used to be generated in the jobs/ directory. Starting with dCache 1.9.5, these files are now generated as hidden files in /tmp.

Please take care that the user under which dCache is executed has sufficient priviledges. Watch out for the following:

Hot-spot detection

The trigger mechanism for hot-pool replication has been enhanced by integrating an algorithm contributed by Jon Bakken, FNAL. The algorithm ranks pools based on their CPU cost. The n-th percentile pool cost is chosen, where the n-th percentile is the cost of the pool within that ranking: 0% selects the lowest pool cost, 50% selects the median cost and 100% selects the highest pool cost. This cost is used as the threshold for establishing pool-to-pool "on cost" transfers.

In PoolManager, specifying a on-cost value as a number not ending with "%" will result in the old behaviour; all current dCache deployments will have such a value. Specifying a value ending with "%" will result in the percentile cost being calculated dynamically and the resulting value used as the threshold for on-cost pool-to-pool transfers.

Stage Protection

Stage protection was added in dCache 1.9.4. In version 1.9.4, stage protection had to be configured in every door and in the PinManager. In version 1.9.5, the stage protection can now optionally be configured in the PoolManager rather than in the doors and PinManager. Thus the white-list only needs to be present on a single node. To enable this, define the following in config/dCacheSetup:

stagePolicyEnforcementPoint=PoolManager

The file name of the white-list must still be configured by setting the stageConfigurationFilePath parameter, however the parameter only needs to be defined on the nodes which enforce the stage protection, i.e. either on the doors and PinManager, or in PoolManager.

Cell Communication

Robustness of the cell message tunnel has been improved. In particular we moved to the Java NIO API for I/O and disabled Nagle's algorithm on the TCP connections used for cell communicaton. This has dramatically reduced the latency of cell communication.

PnfsManager

PnfsManager was restructured internally. As a consequence the option -storageinfo-provider is no longer accepted.

PnfsManager now supports an operation to set several attributes of a file in one operation. This has cut down on the number of messages a door or a pool needs to send to PnfsManager during upload of a file. This also reduces the number of PNFS operations required for setting these attributes. However as a consequence of this change, pools from 1.9.5 releases will not work with older head nodes.

SrmSpaceManager

SrmSpaceManager supports changing the lifetime of a reservation with the update space reservation command.

The SpaceManagerDefaultRetentionPolicy parameter was removed, as it was no longer used. The SpaceManagerDefaultAccessLatency parameter was renamed to DefaultAccessLatencyForSpaceReservation to better reflect its purpose. The old parameter is still respected if it is defined.

PinManager

PinManager now supports the command bulk pin for administratively pinning a large number of files. Which files to pin is defined by a local file on the node hosting the PinManager.

Pools

The migration module can now filter on access latency and retention policy using the -al and -rp options. The -exclude option now supports single character and multi character wildcards.

The flush logic was extended to handle FILE_NOT_FOUND errors from PnfsManager: Such an error now causes the file to be deleted from the pool, thus avoiding an infinite retry loop in case the file was not properly registered in the companion. This change will only have an effect with Chimera or with PNFS supporting a trash table.

The pool every 60 seconds checks the amount of free space on the file system and adjusts the pool size if the amount of free space on the file system is smaller than the configured amount of free space in the pool.

Info Service

The info service has been refactored to ease unit testing. This is now the most well-tested component in dCache.

External libraries

The JGlobus library used for GSI and GridFTP handling has been updated to version 1.7.

The log4j logging library has been updated to version 1.2.15.

Chimera

Scalability of the Chimera NFS daemon was improved. Log and PID files are now stored in /var/log/ and /var/run/, respectively.

Protocol support

SRM

Verbosity of logging in the SRM has been reduced. Some of the code has been transitioned to use log4j, which exposes more log levels than used in previous versions.

SRM has seen a few performance related changes. In particular the srmPutDone operation performed at the end of an upload is now faster.

The SRM code has been refactored internally to prepare for multiple SRM doors running on top of the Terracotta distributed shared memory framework. Running multiple SRM doors is not yet supported for production setups, however much of the infrastructure is in place to support such setups.

Error reporting of srmMkdir and srmRmdir has been improved. In particular we now use specific error codes rather than the generic SRM_FAILURE.

FTP

The legacy callouts to encp from inside the GridFTP door have been removed. Hence the option -encp-put is no longer supported.

Xrootd

In version 1.9.4, the XROOTD door (also known as the XROOTD redirector) was reimplemented for better scalability. In version 1.9.5, the XROOTD mover (also known as the XROOTD data server) underwent the same kind of transformation. The new versions uses significantly fewer threads, and we hope it will scale better and be more robust under load than the old version.

The old mover is still shipped with dCache and can be activated by modifying the pool movermap.

NFS 4.1

The NFS 4.1 implementation was refactored for better scalability and thread management.

Detailed changelog 1.9.4-1 to 1.9.5-1