Executive summary

Highlights from this release:

  • major overhaul of admin interface,

  • robust internal delete and flush notification,

  • info-provider generates timestamps on published objects,

  • pool migration module maintains last-access-time,

  • pool implements strict LRU garbage collection,

  • improved shutdown in space-manager,

  • reserving space from a linkgroup may be authorised by GID,

  • better scalability in FTP doors,

  • new admin commands for nfs service,

  • recording of FTS-specific information,

  • xrootd doors support the protocol’s asynchronous open.

  • support for custom HTTP response headers

Incompatibilities

  • Java 8 is required for running dCache 2.13. Both the Oracle JDK and OpenJDK are supported.

  • The chimera schema is updated upon upgrade.

  • The admin shell has been reimplemented and the user interface has changed.

  • The broadcast service has been dropped and needs to be removed from the layout file.

  • The dir service has been dropped and needs to be removed from the layout file.

  • The loginbroker service has been dropped and needs to be removed from the layout file.

  • Support for JMS brokers (ActiveMQ and OpenMQ) has been removed.

  • Support for Terracotta load balancing has been removed.

  • The httpd service reads certificates from /etc/grid-security and no longer uses Java Key Store files.

  • The dcache import commands have been removed.

  • pnfsmanager and cleaner services need to be configured according to the presence of a replica or spacemanager service.

  • dCache now defaults to expect a spacemanager service. If not present, several services need to be configured not to expect a spacemanager.

  • The default cell name of the spacemanager service was changed from SrmSpaceManager to SpaceManager.

  • The default database names and owners for all services have changed.

  • The pool Berkeley DB meta data format has changed. No change is required upon upgrading, but downgrading is not possible without explicitly converting the format.

Release 2.13.61

admin

While migration move tasks on pools were working correctly, for migration info command an error occurred, that the current user (root) wasn’t allowed to execute anything (due to missing ACLs). This is now fixed.

Changelog 2.13.60..2.13.61

33e017e
[maven-release-plugin] prepare release 2.13.61
2c67b5f
admin: Fix Inconsistent ACL enforcement, RT 9207
e07091b
[maven-release-plugin] prepare for next development iteration

Release 2.13.60

httpd

The “Disk Space Usage” webpage (/usageInfo) contains a table showing information about each pool in the dCache cluster. The “Layout” column showed the capacity usage graphically, with different colours showing how much of that pool’s capacity is being used for different tasks. This release fixes the Layout heading to describe a previously undocumented colour.

Changelog 2.13.59..2.13.60

290ec72
[maven-release-plugin] prepare release 2.13.60
822f63e
httpd: Fixed table headers in usageInfo
f3e0be2
[maven-release-plugin] prepare for next development iteration

Release 2.13.59

webdav

A recent update, commit 5abc0e1c, improved the behaviour of the Milton WebDAV libraries if an IOException occurs during an upload. That patch, unfortunately, did not address all issues, and when non-spec-conformant clients are used against dCache, stacktraces can be triggered.

This patch corrects that behaviour. Also, in case of errors, the error code returned in case of any problems was changed from 400 to 500, which should signal cliens that they are free to retry the transfer after a timeout.

Changelog 2.13.58..2.13.59

3130940f92
[maven-release-plugin] prepare release 2.13.59
882f2f364b
webdav: make Milton work-around more robust
0141baa2f2
[maven-release-plugin] prepare for next development iteration

Release 2.13.58

Changes affecting multiple services

The version of the PostgreSQL driver used by dCache internally was brought up to 9.4.1212. This fixes the issue described in liquibase bug 2939.

system-test

The system-test module, used for demonstration or testing purposes, comes with a built-in X.509 infrastructure. With this release, expired certificates are replaced by new ones.

Changelog 2.13.57..2.13.58

ec161bc28e
[maven-release-plugin] prepare release 2.13.58
864cf7e6e1
system-test: update disposable-CA generated credentials
76b372074e
postgresql driver: update version to 9.4.1212
8dfb654b5a
[maven-release-plugin] prepare for next development iteration

Release 2.13.57

chimera

There was an issue with a symbolic link to a directory where destination where destination contained trailing slash. This is now fixed.

Changelog 2.13.56..2.13.57

adcdac1
[maven-release-plugin] prepare release 2.13.57
758e472
fixed changelog XML
2f66c97
[maven-release-plugin] prepare for next development iteration
94b806d
chimera : handle empty paths elements in path2inode stored procedure

Release 2.13.56

chimera

The current release fixed database query for storing multiple checksums for a file.

ftp

The Socket read method may return zero to indicate that no bytes were read. Although this is not an error, such occurances will result in a transfer failing.

This is now fixed.

Changelog 2.13.55..2.13.56

e517783
[maven-release-plugin] prepare release 2.13.56
81f24b7
chimera: fixed database query for storing multiple checksums for a file.
09a0fce
ftp: do not fail proxy transfer if read returns zero bytes
204f8fe
[maven-release-plugin] prepare for next development iteration

Release 2.13.55

ftp

The current release added implementation of MLSC. As a result Globus is able to query the contents of dCache directories using FTP and without creating additional TCP connections.

Changelog 2.13.54..2.13.55

2eb434e
[maven-release-plugin] prepare release 2.13.55
177aee1
ftp: implement the MLSC command
194a1d7
[maven-release-plugin] prepare for next development iteration

Release 2.13.54

ftp

The current release improves compatibility between dCache FTP client and Globus GridFTP server.

srm

During an ATLAS stress-test of tape recalls, it was discovered that various sites had relatively short request lifetimes. However, the SRM spec provides the opportunity for the server to inform the client (FTS, in this case) of what lifetime a request actually has. The current release includes the requests remaining lifetime in the response from the server.

The current release improves the documentation to help Admins to have a better understanding how to configure dCache correctly.

Changelog 2.13.53..2.13.54

ea7aee0
[maven-release-plugin] prepare release 2.13.54
0b50e92
ftp: add support for paths relative to home directory
b8838c4
ftp: Add support for SITE WHOAMI command
13d7da2
ftp: update parsing of CLIENTINFO command
99c9da1
srm: include remaining request lifetime in various responses
258a3fd
srm: update srm request.*.lifetime configuration properties documentation
7ad0287
ftp: modify facts describing namespace ownership
09aa4e1
ftp: add support for SITE TASKID command
619d006
ftp: add initial support for checksum performance markers
3c669c7
ftp: show SIZE facts for directories
376e5a6
ftp: add support in OPTS RETR for specifying performance marker frequency
df08fb3
[maven-release-plugin] prepare for next development iteration

Release 2.13.53

xrootd

In https://github.com/xrootd/xrootd/issues/459, it became apparent that dCache could improve xrdcp compatibility by sending checksum information in lower case. This release contains this change, which should improve xrootd operations.

Changelog 2.13.52..2.13.53

98d2e34
[maven-release-plugin] prepare release 2.13.53
3fb86ad
xrootd : use lower case for checksum algorithm names when replying to checksum queries.
a9513b3
[maven-release-plugin] prepare for next development iteration

Release 2.13.52

srm

The SRM code has been made more robust against races between file deletions and copies.

systemtest

The ‘system-test’ script was updated to ensure anonymous dcap tests succeed.

Changelog 2.13.51..2.13.52

7d355f9
[maven-release-plugin] prepare release 2.13.52
31b386f
systemtest: allow anonymous dcap activity
6b0bba4
srm: fix recovery procedure in internal copy if source is deleted
75e7a49
[maven-release-plugin] prepare for next development iteration

Release 2.13.51

dcache

The current release updated to jglobus library to fix RFC proxy support.

Changelog 2.13.50..2.13.51

c78fd46
[maven-release-plugin] prepare release 2.13.51
5f68975
pom: update jglobus2 to fix RFC proxy support
3869f4a
[maven-release-plugin] prepare for next development iteration

Release 2.13.50

cleaner

Users reported that they wanted to see space freed up by cleaner processes to be reported as free as soon as possible. This patch sends notifications about freed up space more often, resulting in quicker status updates.

pool

A problem was fixed that could cause the csm check to fail on pools containing broken files.

Changelog 2.13.49..2.13.50

993f056
[maven-release-plugin] prepare release 2.13.50
a762278
cleaner: Send notification more often
34446c0
pool: Fix csm check command in the pressence of broken files
a58b62f
[maven-release-plugin] prepare for next development iteration

Release 2.13.49

dcap

Connections of non-DCAP clients to a dCache no longer result in stack-traces in the logs.

poolmanager

PoolManager was updated to properly handle the dcache.authz.staging.pep and dcache.authz.staging parameters. This allows to enable stage protection properly.

Changelog 2.13.48..2.13.49

8777492
[maven-release-plugin] prepare release 2.13.49
704121e
dcap: don’t create stack-trace if tunnel fails due to bad client
6ffa136
PoolManager : stage protection, fix error in stage.fragment
f26ad6b
[maven-release-plugin] prepare for next development iteration

Release 2.13.48

Changes affecting multiple services

Internal notification processing between cleaners and Pin Manager, Replica Manager and Space Manager was improved and runs quicker now.

billing

If a dCache instance was shut down while the billing service was in the middle of a refresh, an exception was logged and shutdown was delayed. A change in exception handling fixes this rare scenario, ensuring a quick shutdown and no unnecessary log entries.

dcap

This release makes it possible for admins to ban outdated, problematic versions of dcap clients. Some old client versions contain a bug that causes the client to make unsatisfiable requests to a pool with no way for dCache to reject the request: the client will simply retry.

The client version limits are exposed using the new configuration property dcap.limits.client-version. The default is to allow all dcap client versions, unchanged from the previous behaviour.

This release fixes a regression through which Kerberos dcap would not work for host principals containing a ‘-’ character. GSI dcap was not affected.

doors

Fixed a bug in the lb set tags command in doors that prevented setting an empty list of tags.

pool

A bug that caused non-critical stack traces to be logged on the pool after stage or deletion failure from nearline storage has been fixed.

Changelog 2.13.47..2.13.48

0b59736
[maven-release-plugin] prepare release 2.13.48
b910bde
dcap: expose dcap client version limit
3b6e88f
dcap: fix Kerberos dcap if principal contains a ‘-’
6af2dee
dcap: fix regression in handling old version
4dab89f
pool: Suppress two stack traces in nearline storage handling
3b88d07
cleaner: Send notifications concurrently
6bb0e3e
billing: fix stacktrace and slow shutdown if in refresh
0d33435f
doors: Allow setting an empty list of login broker tags
82b4fb4
[maven-release-plugin] prepare for next development iteration

Release 2.13.47

doors

Fixed a bug in which information on stage pool and number of attempts were lost when retrying pool selection requests.

Changelog 2.13.46..2.13.47

4151cd4
[maven-release-plugin] prepare release 2.13.47
0afd0a5
doors: Ensure that pool selection context survives between retries
78a7963
[maven-release-plugin] prepare for next development iteration

Release 2.13.46

billing

Fixed a problem in the output of the dcache billing command using JSON or YAML when the billing format includes a custom date format.

dcap

Add support for the dcap client supplying additional version information.

gplazma

Fix explain login and test login commands so they are able to test logging in with username and password.

Add examples to explain login command help. Our thanks to Onno Zweers for this change.

httpd

Fix regression in the transfers.txt output format.

Update the transfers.html page so it no longer includes <unknown> for default/unknown protocols. With this version of dCache, these are represented by a ? character. The webadmin page is updated to give consistent output.

scripts

Fix the billing indexer to ignore the format string, if present.

Changelog 2.13.45..2.13.46

b8f2059
[maven-release-plugin] prepare release 2.13.46
a2961a0
Active Transfers: substitute ? for <unknown> on html pages
3febc01
common: add support for UserNamePrincipal as user:<name>
d5fb2ab
Added ‘explain login’ examples to help text in Gplazma2LoginStrategy.java
32d32a4
billing: Strip format string from attribute name
829d5ea
transferObserverV1: replace Args with Joiner to construct transfers.txt linesMotivation:
e168d1f
billing: Make billing indexer work with custom format strings
66526b6
[maven-release-plugin] prepare for next development iteration
eded608
dcap: add support for clients presenting more version metadata

Release 2.13.45

commons

If dCache’s internal ShellApplication framework detects a critical behaviour that might indicate a bug in the application, error messages now include an explicit request to send a mail to the developers and more relevant information for assessing and reproducing the situation.

gplazma2

This release fixes a small bug in GPlazma which inappropriately tried to handle non-DN subjects in x509 certificates. These will usually fail in gPlazma anyway, but the reported error was confusing.

srm

Handling of DNS names without trailing dots in certificates has been made more robust and universal.

Changelog 2.13.44..2.13.45

dff8c78
[maven-release-plugin] prepare release 2.13.45
6d1268c
commons: log bugs with stack-trace and instructions
7672ba7
gplazma2-xacml: remove erroneous creation of placeholder extensions
b2edf86
[maven-release-plugin] prepare for next development iteration
e156ca8
srm: remove trailing dot from reverse lookup result

Release 2.13.44

cells

In rare cases, an interrupt needed to cleanly shut down the location manager connector would not arrive. This issue was corrected, ensuring more reliable behaviour on cell shutdown.

Fixed a problem in which threads could inappropriately be created as daemon threads, causing problems in killing those threads when the cell shuts down.

Changelog 2.13.43..2.13.44

fca4642
[maven-release-plugin] prepare release 2.13.44
de18b6b
cells: Fix lost interrupt exception
b16ab08
cells: Ensure that newly created threads are non-daemon normal priority threads
1a68ff2
[maven-release-plugin] prepare for next development iteration

Release 2.13.43

commons

The \s admin command uses the toString method to serialise the requested so that the remote cell may correctly parse it. This did not always work: ‘=’ characters in arguments were escaped but not unescaped; arguments that start with a ‘-’ character were not escaped; empty words were lost. In the current release this is fixed and \s command works as expected.

dcache

The current release fixes a regression in which the exit code of check-config would always be zero even when errors were detected.

Changelog 2.13.42..2.13.43

cb2a342
[maven-release-plugin] prepare release 2.13.43
c92dd9d
dcache: Generate proper exit code for check-config command
c1505b5
commons: fix Args string parsing and toString method
ac47c18
[maven-release-plugin] prepare for next development iteration

Release 2.13.42

alarms

Alarm email notifications are now sent only on the first occurrence of given alarm (i.e., for that alarm instance’s unique ID).

If an alarm has been closed and not deleted, but then occurs again, the counter for receiving that alarm is now reset to 1, in order to treat this as a new (set of) occurrences, and to guarantee a new notification will be sent.

webadmin

Due to an implementation detail in a library used for the webadmin pages, filtering tables was a bit unintuitive until now: A filter that was set in a certain table column would reappear on tables in other browser windows if they had similar columns.

This behaviour was corrected, and tables on different pages exposed simultaneously in different browser tabs are now filtered independently. However, the fix also has the side effect that now with page reloads and form submissions the filters are cleared. Any commands, however, will always be issued correctly.

Filtering rows in webadmin tables could occasionally lead to unintuitive behaviour with regard to selections: Filtering a table and hiding rows may be included in a “select all” or “deselect all” operation. This was fixed, and selection/Deselection of hidden rows is now prevented.

Issues with filter boxes disappearing or filters resetting have been solved by disabling AJAX auto refresh for the affected pages.

Changelog 2.13.41..2.13.42

656c953
[maven-release-plugin] prepare release 2.13.42
8a40cf9
dcache-webadmin: synchronize client-side filtering with server-side selection of rows on pages using picnet table filters
36e489d
alarms: reset count history on reopened alarm
7e6854c
dcache-webadmin: disable saving table filter settings to browser cookies
d5aa421
dcache-webadmin: disable AJAX autorefresh on pages using picnet table filter library
0740b36
alarms: only send email on first alarm occurrence
0496445
[maven-release-plugin] prepare for next development iteration

Release 2.13.41

chimera

Chimera occasionally suffered from (operationally irrelevant) IllegalStateExceptions. Those are now avoided.

doors

Doors could get stuck temporarily if a file was deleted during pool selection. This has been fixed, and in such cases, transfers are now aborted properly.

Changelog 2.13.40..2.13.41

b83ec5a
[maven-release-plugin] prepare release 2.13.41
1f380e5
doors: Abort transfer if file is deleted during pool selection
64e01e1
chimera: Fix IllegalStateException in inode cache
369e0b7
[maven-release-plugin] prepare for next development iteration

Release 2.13.40

billing

When using the dcache billing command with a non-default date format in the billing file, an unneccessary stack trace was printed. This has been corrected.

srm

The SRM should periodically (by default every 10 minutes) delete obsolete historic data (older than 10 days by default) from the database. For cases where there are problems with that process, error logging and robustness against temporary database problems have been improved.

Changelog 2.13.39..2.13.40

e3f3e15
[maven-release-plugin] prepare release 2.13.40
d01040a
srm: make out-of-date historic data deletion more robust
19ce372
[maven-release-plugin] prepare for next development iteration
4470b99
billing: Removing erroneous stack trace output

Release 2.13.39

ftp

The Apache Commons FtpClient can issue the LIST command with the non-standard -a option. Which was causing dCache to switch output format from the long (ls -l-like) to the short (ls-like) response. This is fixed now and dCache is more compatible with Apache Commons FtpClient.

Changelog 2.13.38..2.13.39

a18bd39
[maven-release-plugin] prepare release 2.13.39
e19c303
ftp: improve compatibility with Apache Commons FtpClient
361870a
[maven-release-plugin] prepare for next development iteration

Release 2.13.38

cells

If the create command in CellShell fails because of unreadable setup files, it throws an IOException. This was incorrectly reported as a bug. Reporting has been corrected now.

In rare cases, active transfers would show up with an incorrect state in the active transfers page of the admin backend. This was fixed, so that ransfers which are staging from non-DCAP doors are correctly indicated (in yellow) on the active transfers page, instead of showing up as “No Mover found” (in red).

common

If there is an IOException when trying to read a setup file, the corresponding file name is now listed in the error message.

poolmanager

An issue with PoolManager prevented it from delivering correct cost estimates. This was fixed, resulting in improved estimations of pool load.

Changelog 2.13.37..2.13.38

36cd699
[maven-release-plugin] prepare release 2.13.38
df5e103
cells: IOException is not a bug in create command
5ab196c
common: include filename in error message
3fcccca
poolmanager: Fix incorrect correction of pool cost
83eb6f0
cells: handle empty string pool value on staging in TransferObserver
6a86a29
[maven-release-plugin] prepare for next development iteration

Release 2.13.37

alarms

Log entries that were promoted to alarm status and that show up in the webadmin table can now contain more detailed information.

Changelog 2.13.36..2.13.37

f26fe94
[maven-release-plugin] prepare release 2.13.37
a7c704d
alarms: add ndc info to alarm info
df47f33
[maven-release-plugin] prepare for next development iteration

Release 2.13.36

gplazma2-argus

Fixed a problem with the gPlazma argus plugin that caused it to fail with a ClassNotFoundException.

Changelog 2.13.35..2.13.36

eab2ef8
[maven-release-plugin] prepare release 2.13.36
8e4e670
gplazma2-argus: Update to Argus client 2.2.0 to fix dependency on VOMS library
23946c2
[maven-release-plugin] prepare for next development iteration

Release 2.13.35

alarms

A change to the alarms system improves handling of alarms with unset types.

nfs

A race condition in the NFS door that could result in the creation of multiple inconsistent copies of a file being uploaded has been fixed.

pool

When a transfer’s status is queried before the transfer is initiated, which can occasionally happen for queued requests, Exceptions were logged. This behaviour has now been corrected, providing more robust operation.

Several race conditions in the pool’s migration module are fixed now.

A regression was fixed that caused pools to “leak” movers if those were cancelled while still being queued.

spacemanager

Fixes a compatibility problem with NFS in which space manager would fail with a duplicate key error.

srm

A hint to describe the necessity to include escaping has been added.

The current release fixes issues in which the use of SRM third party copy operations could cause the SRM cell to become unresponsive, possibly even run out of memory.

Changelog 2.13.34..2.13.35

c5efd10
[maven-release-plugin] prepare release 2.13.35
d93faa9
srm: add hint to escape IDs
9f0a13b
nfs: Fix race condition in transfer startup
1dde23d
info: fix broken unit-test
0eec340
srm: Resolve message thead blocking issues with SRM third party copy
45dfaf4
spacemanager: Work around for doors resubmitting PoolAcceptFileMessage
39a82e1
pool: Fix several race conditions in migration module
5cbe420
pool: Fix mover leak
2e2895e
pool: avoid NPE when querying status of a 3rd-party HTTP transfer
7be8cc2
alarms: fix NPE in type setter
ab980e6
[maven-release-plugin] prepare for next development iteration

Release 2.13.34

admin

The admin door now generates SSH keys to ensure compatibility with OpenSSH 7. Additionally, a new property admin.paths.host-keys was introduced in the admin.properties file, allowing to specify the location of keys.

script

When a pool’s metadata conversion operations would fail, an error caused a script to report successful conversions. This error has been fixed now.

Changelog 2.13.33..2.13.34

6648594
[maven-release-plugin] prepare release 2.13.34
a28fd19
admin: Fix compatibility with OpenSSH 7
685c80d
script: Do not claim success if meta data conversion failed
7bf4f5a
[maven-release-plugin] prepare for next development iteration

Release 2.13.33

admin

First observed on Ubuntu Xenial, dCache fails to install on modern Linux distributions due to the short key length of the SSH 1 keys generated in the post install script. This patch removes those keys and their generation code. dCache has been supporting only modern key formats for quite some time now, so this change should not have any impact on users.

pool

When creating movers, some error conditions are expected to occur and dCache is designed to transparently recover from these. Consequently, this patch lowers the log level for the related error messages to reflect that their causes are harmless.

Fixed a staging problem that would lead to failures in nearline COPY operations.

srm

Some race conditions during SRM startup were fixed. Those race conditions could potentially have lead to failures to expire jobs and to wrong job counts in the SRM schedulers.

webdav

Until now, trying to access a file for which the client was not authorized would generate a reply with a status code 200 OK, but an empty body, rather than an error page. This patch corrects that behaviour and also improves exception handling for that case.

Changelog 2.13.32..2.13.33

b220fb5
[maven-release-plugin] prepare release 2.13.33
5d1253c
admin: Drop old ssh 1 keys
7b033db
pool: Lower log level of certain failures to create mover
769596d
pool: fix staging for CopyNearlineStorage
7e78ca4
webdav: Fix error reporting when client is unauthorized
5aff175
srm: Fix job expiration during service startup
dcafde7
[maven-release-plugin] prepare for next development iteration

Release 2.13.32

billing

The data used to create the 24 hour billing overviews is aggregated in hourly intervals before creating the plots. However, if there is very high activity on the system during an entire 24 hour period, there have occasionally been timeouts when querying the database for this aggregate data. This patch makes the data aggregation more robust against such situations, resulting in lower latency for histogram generation and no more timeouts.

gplazma

Previously, attempts to authenticate users against an htpasswd entry that was malformed resulted in a stack trace. This patch modifies the error handling so that only a detailed error message (“Bad entry in file: hash does not start ‘$1$’ or ’$apr1$”) is logged.

Changelog 2.13.31..2.13.32

f3609fe
[maven-release-plugin] prepare release 2.13.32
15b85b6
(2.13) billing: use in-memory buffer for hourly aggregate data
fd0e96b
[maven-release-plugin] prepare for next development iteration
b9b2e8c
gplazma: don’t generate a stack-trace if htaccess is malformed

Release 2.13.31

Changes affecting multiple services

Fixed an issue with the dcache heap dump command when called with a simple file name as the output path. In this case the dump could in some cases be written to a different directory while the script claimed the dump had failed. The dcache dump heap command has a --force option for cases in which the JVM is unresponsive. This option was ignored for processes not running as root. This is fixed now.

cells

Fixed a problem causing FTP and DCAP per connection instances to subscribe to topics they should not subscribe to. This reduces overhead caused by routing updates.

Fixes a problem during shutdown in which communication tunnels between domains were shut down too early.

Fixed an issue that would cause log messages in which placeholders had not been replaced with actual values.

A bouncing message bug in System cell is fixed.

pnfsmanager

Setting atime-gap to –1 (default value) should disable file’s last access time updates. Nevertheless, this was not the case and atime update was always enabled. This is fixed now and file’s last access time can be disabled as described in the documentation.

pool

Fixes an issue with pools becoming unresponsive in case of slow DNS reverse lookups.

Fix race condition in request scheduler.

If FTP clients disconnect mid-transfer, pools log a DoorTransferFinished delivery failure as the door is gone. This is fixed now and log messages like Failed to deliver DoorTransferFinishedMessage message are suppressed in pools.

When cancelling a job in a state that doesn’t allow cancellation, an illegal state exception is thrown. This was logged as a bug. This is now fixed.

Pool to pool transfers are supposed to be cancellable. However, due to an implementation issue, migration and rebalance job cancellation could occasionally appear to hang. This is now fixed.

srm

Fixed a bug that caused delivery failures of credential service announcements to be logged. The ls -completed=n command has been observed to fail with SRMInvalidRequestException. This is fixed now and the output format of listing list requests has been changed to match that of other requests.

Changelog 2.13.30..2.13.31

60866f5
[maven-release-plugin] prepare release 2.13.31
8a28e76
srm: Fix listing of completed list requests
88f341b
dcache: fix heap dump to simple file names
42ffa11
script: Make dump heap –force work for non-root processes
9f4a55a
pool: Fix p2p cancellation
fd7dbce
script: Add missing she-bang
604a445
srm: Do not expose TURL before request is ready
b09a57e
pool: Don’t log illegal state exception on migration job cancellation as a bug
ee522df
pool: Fix race condition in request scheduler
7804a19d
pool: Avoid reverse DNS lookup in HTTP mover
c2a31cd
billing: additional fixes to insert triggers
de9dd24
pool: Suppress logging of delivery failure of DoorTransferFinished
ab1346f
system-test: update disposable-CA generated credentials
5120dc7
pnfsmanager: fix atime update regression
83ef92e
cells: Fix logging formatting string
b74af9a
cells: Avoid bouncing message on no-route errors in System cell
d9b13c4
srm: Suppress message delivery failures for credential service announcements
af50af8
cells: Fix tunnel shutdown order
aac3f33
cells: Do not subscribe to topics in per-session door instances
3a36df6
[maven-release-plugin] prepare for next development iteration

Release 2.13.30

chimera

When accessing a file for reading, the atime value must be updated. Previously, due to an error, the ctime (intended to reflect the time of changes to file attributes) was also changed. This update corrects that problem.

http

In order to increase the performance of the Billing system, reverse DNS lookups were removed from the code. While this will result in IP addresses representing hosts in the billing file, DNS performance no longer impacts overall system performance.

many

When representing checksums in the admin interface and configuration files, checksums are now presented in an improved format.

pool

The nearline storage subsystem uses thread pools to manage its workload. Since some tasks are blocking, very high activity can cause these thread pools to grow beyond effective sizes. This may even lead to the pool becoming unresponsive.

This change introduces a new configuration property, pool.limits.nearline-threads, which limits the thread pool size. The default value, 30, is chosen to be sufficient for almost all imaginable use cases while at the same time avoiding potential problems with resource exhaustion.

srm

Due to a timing issue, an initial service announcement in the SRM was sent before any listeners could register for those announcements. Thus, upon startup, a delivery error would be logged. With this patch, sending of the initial message is delayed until after the registration of listeners, and the irrelevant error messages are avoided.

Changelog 2.13.29..2.13.30

3333a7d
[maven-release-plugin] prepare release 2.13.30
b7ab80b
common: fix ChecksumType.toString()
ecc9c34
http: avoid dns reverse lookup on HttpProtocolInfo#toString()
c2bd53c
chimera: do not update ctime on atime only attribute update
9ee5873
[maven-release-plugin] prepare for next development iteration
d6dca39
srm: Delay announcing credential service after cell start
01d1436
pool: Improve scalability of nearline storage subsystem

Release 2.13.29

doors

Fixes a bug in which a host name set in *.net.listen properties was not preserved when publishing a door or generating SURLs.

Changes affecting multiple services

When building rpm files a package which is now explicitly required as a dependency.

cells

This change fixes a bug in routing manager that would leave orphaned topic routes in dCache domain.

pool

A regression caused pools to fail, reporting an IllegalArgumentException, when replication manager was enabled. This patch fixes that problem.

Changelog 2.13.28..2.13.29

f1ab4b1
[maven-release-plugin] prepare release 2.13.29
f53bfb4
pool: Fix regression breaking hopping mananger
3ee2582
cells: Fix route removal in routing manager
3934813
rpm: explicitly require which package
01417c5
doors: Preserve name when publishing the address of doors
ef453fd
[maven-release-plugin] prepare for next development iteration

Release 2.13.28

cells

LoginManager would occasionally generate error messages similar to “Discarding listening on $LOCATION 53684’ because its age of 18721640 ms exceeds its time to live of 4500 ms.”. This was due to erroneous reuse of old message envelopes in the internal messaging. This change fixes that problem.

This change addresses a potential problem in which messages sent between cells in the same domain could appear older than they are and thus would risk being discarded due to the time-to-live being expired.

Contains corrections to cells logging. The routing manager pinboard now shows information previously logged to various other cells.

nfs

This change avoids ERR_PERM errors on NFS writes in situations where NFS doors do not receive redirect messages within the allowed timeframe.

pool

This patch avoids errors occuring when mover doors would not receive a start mover request within the allowed timeframe, providing more robustness on heavily loaded instances.

srm

A Tier–1 site reported problems with a major WLCG VO’s read requests. Investigating the source of the problems showed that the srm_ifce library, used by the (outdated) GFAL v1 and the (supported) GFAL v2 SRM libraries, drastically limits the permitted lifetime of requests without providing admins any way to configure this.

For sites seeing errors related to desiredTotalRequestTime being exceeded, this change provides the new configuration option srm.request.maximum-client-assumed-bandwidth in srm.properties as a work-around.

Sites not observing such errors do not need to change anything with regard to this value.

Changelog 2.13.27..2.13.28

7f928ca
[maven-release-plugin] prepare release 2.13.28
05c6c5e
srm: add short request lifetime work-around
ff4ab79
chimera: Fix regression in inheriting ACLs on directory creation (HSQLDB)
036e91c
cells: Set correct logging context in cell callbacks
8e94b3f
cells: Improve robustness of message time to live
91a35b7
cells: Fix erroneous reuse of message envelope in location manager registration
12e857d
nfs: try to re-use transfer class on client retry
f8c9721
pool: handle duplicated start mover requests
b016214
pool: java8 into IoQueueManager
e91ad9a
pool: simplify pool queues
5d4b3a0
[maven-release-plugin] prepare for next development iteration

Release 2.13.27

chimera

Fix a regression causing directories to inherit ACLs as if they were files rather than directories. Soon we will provide a procedure to clean-up already existing wrongly inherited directories.

Changelog 2.13.26..2.13.27

27a7b49
[maven-release-plugin] prepare release 2.13.27
a4ea0fc
chimera: Fix regression in inheriting ACLs on directory creation
3fa0ded
[maven-release-plugin] prepare for next development iteration

Release 2.13.26

ftp

On very short transfers, two internal messages could occasionally arrive in the wrong order. This would cause clients to see the “226 Transfer complete.” message without the “150 Opening BINARY data connection for” immediate reply. This rare issue is now fixed.

pnfsmanager

Billing entries for SRM uploads recently lost the storage class part of the entry. This update fixes that issue. We observed an error caused by the parallel execution of two uploads, both trying to create the same (previously non-existing) directory). This modification fixes the underlying race condition, allowing such transfers to succeed.

pool

Fixed a regression introduced in 2.13 in which internal options like -c:puts were erroneously included to the call out to the HSM script.

poolmanager

This modification fixes a potential race condition in pool manager.

This modification fixes a race condition in pool manager that could theoretically have provided erroneous data to pin manager, space manager, srm, xrootd and webdav. This modification fixes a race condition in pool manager that could theoretically have provided erroneous data to pin manager, space manager, srm, xrootd and webdav.

scripts

The ctlcluster utility is now fully supported under Linux.

srm

When writing to a path /a/b/c, if b exists and is a file, SRM currently returns SRM_INTERNAL_ERROR. This modification changes that behaviour so that the more appropriate status SRM_INVALID_PATH is returned. During SRM-based operations, some sites reported problems with the delegation of user credentials. This modification remedies those problems, while at the same time reducing the CPU usage of establishing 3rd-party SRM connections.

webdav

This modification corrects the error reporting under WebDAV. When attempting to delete a non-existing file, unauthenticated users receive a 401 Unauthorized response, while authenticated users receive a 404 Not Found response.

Changelog 2.13.25..2.13.26

0318b25
[maven-release-plugin] prepare release 2.13.26
2d9ae01
pnfsmanager: Fix regression in SRM billing entries
57ec6b2
pool: Fix filtering of options in script HSM driver
6699508
pnfsmanager: Fix race leading to transaction failures in Chimera
6f92a2b
srm: Disable delegation on srmCopy to or from other SRMs
7e61671
srm: Return SRM_INVALID_PATH when target directory is a file
461aa36
ftp: Fix race on short transfers
979a710
[maven-release-plugin] prepare for next development iteration
7ff568d
Fix compatibility issues with ctlcluster
70ece68
poolmanager: Acquire read lock when serializing cost module and partition manager
e71f5df
poolmanager: Acquire read lock when serializing pool selection unit
9fac027
poolmanager: Fix race condition in pool selection unit
ec574b5
webdav: fix 404 error if attempting to delete a nonexistent file

Release 2.13.25

doors

Fixes a race condition and a responsiveness issue in doors.

pool

Fix regression introduced by single thread model.

spacemanager

Fixes a race condition during space manager startup that could lead to log messages about failed link group updates and failed transfers.

Changelog 2.13.24..2.13.25

957f8ad
[maven-release-plugin] prepare release 2.13.25
4e2ec5a
libs: use nfs4j–0.10.8
da2866d
dcache: Fix compilation regression
3b40077
spacemanager: Fix pool monitor fetch race during startup
1fc57eb
doors: Fix contention point and race in login broker publishing
a55ee13
[maven-release-plugin] prepare for next development iteration

Release 2.13.24

pnfsmanager

With this release, PnfsManager adds safety checks rejecting invalid upload paths that SRM might erroneously supply. This release hardens an installation against possible bugs triggering data loss.

This release adds a check that detects failed or incomplete SRM uploads and prevents the file from being committed to its final path. Common symptoms of this bug were zero sized files that experiment catalogues registered as successfully uploaded.

pool

This patch improves support for HTTP Keep-Alive connections. On closing a connection, in accordance with RFC 2616, an HTTP header indicating closure of the connection is sent out.

On highly loaded systems, DNS slowdowns could lead to pools and/or domains becoming unresponsive. This fix increases resilience against DNS problems.

Increased performance and resilience in situations where NFS clients send many requests at once.

srm

This release adds a check to detect broken uploads using SRM during the final stage of file transmission. While it causes transfers to take a little more time, resilience against upload failures is increased.

webdav

dCache has supported 3rd-party HTTP push requests for WebDAV for some time. This release adds support for 3rd-party HTTP pull requests.

Changelog 2.13.23..2.13.24

3a0e199
[maven-release-plugin] prepare release 2.13.24
9b79c8b
pool: send connection header when closing a connection.
b5b0b14
pool: Avoid lock contention on DNS lookup in p2p component
1a9e04b
srm: Check for broken files during srmPutDone
d98d4a1
pnfsmanager: Check file size and upload completion when committing temporary upload paths
e34706b
pnfsmanager: Protect against erroneous upload paths
a70bb34
pool: use “same thread strategy” for nfs movers
7f4d235
webdav: add support for 3rd-party HTTP pull
18afc47
[maven-release-plugin] prepare for next development iteration

Release 2.13.23

Changes affecting multiple services

Several cases of slow performance were reported while deleting directory in Chimera. This is now fixed.

pool

dCache pool yaml command dumps the meta data of a pool using the human an machine readable YAML format. A regression causing a null pointer exception has been fixed.

When command execution to migrate files between pools (e.g. migration concurrency or migration copy) is interrupted due to the failure to find migration job the returned error message is considered as a bug. This is now fixed so that a new message is returned indicating that the job being requested does not exist.

srm

When file upload is cancelled the value of temporary upload path tracked by SRM could be a value different from a regular path, either because it was changed outside of dCache, or it contains entries from a very old version of dCache. This could result in data loss while canceling upload. The current release fixed a potential data loss scenario.

statistics

The statistics service creates static HTML pages that describe dCache usage over time as simple files that the webadmin service can serve. This includes information about pools and store-units. The problem is that the statistics webpages do not show information about any pool or store-unit that contains a / in the name. This is now fixed. A side-effect is that the history of any pool or store-unit containing a ^ in the name is lost.

Changelog 2.13.22..2.13.23

893f893
[maven-release-plugin] prepare release 2.13.23
9fea1cc
chimera: Alter statistics target for t_tags(itagid)
3dd6d29
srm: Add safe-guard against invalid file ID in put requests
815a0d8
pool: Don’t consider failure to find migration job a bug
9813dcc
statistics: encode ‘/’ in filenames
24832ea
[maven-release-plugin] prepare for next development iteration
98c6971
pool: Fix NPE in pool yaml tool

Release 2.13.22

alarms

On heavily loaded systems creating alarms at very high frequencies and with comparatively slow databases for alarm logging, components ran out of memory and blocked. This release introduces a mechanism that will drop alarm messages that would cause the system to fail, ensuring continued operation. Note that no messages that are critical to system operation will ever be dropped by this change.

nfs

Pinning files is now a non-blocking operation. For files stored on tape, this should result in a more responsive system behaviour, avoiding NFS blocking in situations with many concurrent pin requests.

Changelog 2.13.21..2.13.22

bb355ba
[maven-release-plugin] prepare release 2.13.22
71dca46
(2.13) alarms: change executor to have bounded queue and discard events on overrun
5f257a0
nfs: use noitify instead of blocking sendAndWait when sending pin/unpin messages via touch “.(get)(<file_name>)(pin)” command
898fc95
[maven-release-plugin] prepare for next development iteration

Release 2.13.21

Changes affecting multiple services

Sometimes when a cell start up was interrupted an error message was logged as a bug. This is now fixed.

info-provider

The GLUE infomation provider supplies information about the dCache instance, which is important for the clients in WLCG area. Because in dCache different doors can have different roots, clients may need to adjust their path when accessing dCache through different doors. The info-provider is updated so that a new path root property is provided. This allows clients to modify paths, as necessary. Note that the SRM door already supports this translation when redirecting clients for transfers.

Changelog 2.13.20..2.13.21

45e0d20
[maven-release-plugin] prepare release 2.13.21
9f4dc35
info-provider: publish door root path
f28f229
cells: Suppress illegal state exception during initialization
aa5d526
[maven-release-plugin] prepare for next development iteration

Release 2.13.20

nfs

This release includes the PNFS-ID in the door’s logging output when proxying data movement.

spacemanager

Spacemanager backs off when it encounters a problem writing to the database. Previously, if the problem was due to deadlocks then the two tasks involved are delayed by the same amount, which means it is possible that subsequent attempt will also deadlock. This release randomises the delay to reduce the likelihood of this problem occuring.

Changelog 2.13.19..2.13.20

7007e1d
[maven-release-plugin] prepare release 2.13.20
fd4c7ea
spacemanager: Randomize backoff in case of transient errors
e24f480
nfs-proxy: include file’s pnfsid into debug context
ddda974
[maven-release-plugin] prepare for next development iteration

Release 2.13.19

pool

Fix buffer leak in the HTTP mover.

Improve scalability of pools when opening files or changing sticky flags.

Resolve problem where starting a migration job on a pool with many files can result in an unresponsive pool.

This release fixes a bug that caused the output of the rep ls -s admin command to contain negative values.

Improve pool scalability of pool, allowing it to remain responsive even when undertaking many pool-to-pool transfers.

xrootd

The prevelance of old versions (3.x) of xrootd client along with that version’s unfortunate behaviour when replying asynchronously has forced this release to revert the asychronous response to open requests.

Fix dCache handling of open requests where uploads were considered downloads.

Changelog 2.13.18..2.13.19

ac22958
[maven-release-plugin] prepare release 2.13.19
9756abb
pool: Fix lock contention during heavy p2p activity
829e03c
Revert “pool: Expose Berkeley DB configuration as dCache properties”
8bbd088
pool: Fix buffer leak in HTTP mover
7955b7c
xrootd: Fix classification of uploads
c6d1dd2
pool: Expose Berkeley DB configuration as dCache properties
f27dc3f
xrootd: Roll back asynchronous reply on open
a8ea4e2
pool: Fix lock starvation in migration module
7c99295
pool: Fix accounting error in repository statistics
58087f2
pool: Avoid lock contention when opening files and setting sticky flags
2a4e8bc
pool: Simplify synchronization during repository setup
576cda3
[maven-release-plugin] prepare for next development iteration

Release 2.13.18

Changes affecting multiple services

Don’t log Error while reading from tunnel: java.nio.channels.AsynchronousCloseException when a domain shuts down.

cleaner

Fix cleaner so it no longer sends cleaning requests to pools it knows cannot enact these requests. This prevents the cleaner from spamming a disabled pool.

pool

No longer spam log file with 666:"Client disconnected without closing file." when an xrootd client closes a file it was reading. This is still logged if the client fails to close a file opened for writing.

This release provides more robust monitoring of a pool’s health by periodically testing the data directory in addition to testing the metadata.

With this release, the pool provides a more robust behaviour when the client requests a file without being redirected from a door. The pool will redirect the client back to its previously used door, if known. Additionally, the pool no longer logs when this happens in the pool log file, but only in the pinboard.

This release fixes buffer leaks in the xrootd mover. It also removes endsess spam from pool log files.

The pool’s info output now includes a progress report during initialisation.

Changelog 2.13.17..2.13.18

6bb0336
[maven-release-plugin] prepare release 2.13.18
f437ba1
pool: Upgrade xrootd4j to 3.0.2
5f2d038
pool: Redirect xrootd client to door on failure to open file
2ce299b
pool: Do not fail read if xrootd client doesn’t close file
4ece761
cleaner: Fix black listing in case of disabled pools
da521a8
cells: Don’t log AsynchronousCloseException when tunnel closes
9fca9ea
pool: Show progress during repository initialization
28af1ea
pool: Fix health check of file store
cfffbc2
[maven-release-plugin] prepare for next development iteration

Release 2.13.17

Changes affecting multiple services

Update the Spring, Milton, AspectJ, Jetty and DataNucleus-core libraries to latest version. All dCache services are affected.

gplazma

Fix debug logging of the voms gPlazma plugin so it shows a clear and complete description of which FQAN is mapped; previously the log would contain confusing entries if the first FQAN is not mapped exactly.

pool

If a 3rd-party transfer fails then the pool may log and report incomplete information on why this happened. This release fixes this problem.

srm

This release fixes the interpretation of srm.persistence.enable.store-transient-state when srm.persistence.enable.history is disabled. Sites that have history disabled may put additional load on the SRM database if store-transient-state is true; the old behaviour can be restored by setting srm.persistence.enable.store-transient-state to false.

Fix the report sent to the srm client so that the failure code and error message always match. Earlier releases of this branch sometimes returned an inconsistent combination of failure code and error message.

Fix potential for inconsistent information being logged in the SRM database, where the request state was different than the last logged transition.

Changelog 2.13.16..2.13.17

8e6af59
[maven-release-plugin] prepare release 2.13.17
4946340
srm: Lock job while saving to create consistent persistent state
e717b09
srm: Fix saving of transient states to database
5a4f907
2.13: upgrade third party dependencies
f2a6c19
http–3rd-party: ensure IOException logged with toString
7188e67
srm: Fix race in state reporting
c78067e
gplazma: Improve logging in voms plugin
5134bcb
info: fix test to be less critical on timing
9589a9a
[maven-release-plugin] prepare for next development iteration

Release 2.13.16

Changes affecting multiple services

Enable SRM to discover FTP and dcap doors faster after restart. May also resolve messages like Failed to deliver LoginBrokerInfoRequest message <1449649391771:176> to [>LoginBrokerRequestTopic@local]: Route for >*@srm-bombayDomain< not found at >dCacheDomain<. All ftp and dcap doors should be updated.

Fix tab completion for non-default cells (messaging) topologies; all domains must be upgraded to deploy the fix. Sites with default topologies are unaffected.

For non-default star topologies, there are intermediate domains that pass messages between dCacheDomain and end domains. This release fixes propagation of routing information by these intermediate domains. Sites with default topologies are unaffected; those with non-default topologies should upgrade all intermediate domains.

admin

When using the admin interface, there are three possible behaviours: interactive mode (new interface), no-terminal mode (new interface targeted for scripts) and legacy mode (old interface). The latter two could become stuck in an endless loop if the client disconnects before all data was sent. This is fixed with this release.

pool

Add new options to the rep set sticky command to allow filtering by access-latency, retention-policy, storage class and cache class.

Add an option to migration module to support filtering by cache class.

Add the -meta-only option to migration module. This limits the affected replicas to those where the file’s data is not transferred; i.e., the target is some existing replica of the file. Local replicas that do not exist on any other pool are skipped.

Extend the migration module’s -sticky option to allow negated selection. This is marked by prefixing the sticky owner with a -. For example, -sticky=-system selects replicas that do not have the system sticky.

Fix persistency of the sticky bits. Should a replica’s list of sticky bits be modified by either the rep set sticky command or the migration module modifying some existing replica then the sticky flags are held in memory but not written to the Berkeley DB. The next restart of the pool will loose that information, potentially resulting in data loss. Pools that store metadata as files are not affected.

poolmanager

Previous releases of dCache contained a bug where replicas generated by pool-to-pool copies failed to include the access latency and retention policy. While not directly affecting dCache operations, the result is that this information is no longer reliable.

spacemanager

Fix listing by PNFS-ID. Glob support is removed as it was non-functional.

Changelog 2.13.15..2.13.16

d44d735
[maven-release-plugin] prepare release 2.13.16
98bb612
pool: Do not output expired sticky flags
aafadae
pool: Only save sticky bits if not already set
1496520
poolmanager: Fix missing access latency and retention policy on pool to pool copy
a72b22b
admin: Fix endless loop in non-interactive mode
82e4af5
pool: Make bulk sticky bit operation robust against repository changes
c9c2cca
pool: Throw IllegalArgumentException on rep set sticky errors
1cb1c58
spacemanager: Fix listing by pnfs id
f25601b
pool: Extend migration module with -meta-only option
7c053a2
pool: Add option to migration module to filter by cache class
e890b17
pool: Add bulk mode for rep set sticky command
5555f8e
pool: Add migration option to filter by absense of sticky flags
2a85046
pool: Fix persistence of sticky bits
09c1520
cells: Tab complete on distant downstream domains too
0bf50cd
cells: Fix route propagation trigger on non-default topologies
779a55e
ftp,dcap: Fix LoginBrokerRequestTopic subscription
1bb070b
[maven-release-plugin] prepare for next development iteration

Release 2.13.15

Changes affecting multiple services

A request to read a file stored only on tape when the user does not have permission to stage files will fail. Additionally, a user attempting to upload a file that would trigger creating a new space-reservation but the user is not authorised to create such a reservation will also fail. With earlier versions of dCache, in both these cases, the door would retry the request until the times out. This version fixes this issue. All doors are affected.

This release fixes a caching issue where changes to inode metadata (e.g., ownership or permissions) for / (the root directory of Chimera) are not visible until the service is restarted. This affects NFS doors and pnfsmanager service.

nfs

Fix the NFS door so it no longer logs Post-processing failed: No such file or directory if a file is deleted while still open.

Changelog 2.13.14..2.13.15

ac0e317
[maven-release-plugin] prepare release 2.13.15
3ae0c31
transfers: fail request if pool/space manager reject request
8e3361e
chimera: Prevent filling of stat cache of root inode
309ac32
nfs: ingore file_not_found on close
89f0312
[maven-release-plugin] prepare for next development iteration

Release 2.13.14

Changes affecting multiple services

The chimera library, used by PnfsManager and NFS, contains possible race conditions that can lead to a NullPointerException. These are updated so that Chimera gives the correct error message under these circumstances.

The chimera library, used by the pnfsmanager and nfs services, contains a bug where two near-simultaneous attempts to delete a hitherto empty directory and write a file into the same directory will both succeed but leave an orphaned file: it exists in the t_dirs table but the parent does not exist in t_inodes table. This seems to be triggered when an ftp door fails with no write pool configured. This release fixes this problem.

pnfsmanager

Fix the error message (logged by the domain hosting pnfsmanager) if an attempt to finalise an SRM upload fails within pnfsmanager, or if an attempt to cancel an SRM upload fails within pnfsmanager.

pool

Fix the NFS mover so that, when the client is reading a file and reaches the end of file, the EOF flag is set correctly. This bugfix applies both to files read through pNFS and those read through the door.

poolmanager

When reporting an alarm that a pool is declared DOWN, the message mistakenly omitted the name of the pool. This is now fixed.

webdav

Update to the latest version of milton.

xrootd

Update the alice-token plugin to allow the host name check to succeed on dual-stack (IPv4 and IPv6) machines.

Fix door so that it no longer logs: An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.

Log initial response (usually that an asynchronous reply is forthcoming) when client attempts to open a file.

When authenticating with the GSI plugin, provide an informative message if the host private key file cannot be understood.

Changelog 2.13.13..2.13.14

01946df
[maven-release-plugin] prepare release 2.13.14
07539db
xrootd: Update alice token plugin to fix IPv6 compatibility
f83fb31
xrootd: Upgrade to xrootd4j 2.1.3
a48d8d8
chimera: Detect races in directory deletion
4801355
chimera: Detect races during move
1169c85
xrootd: Fix logging of Netty exceptions
9c2f60c
webdav: update to latest milton
d758409
PnfsManager: remove copy-n-paste error in error message
42cd4bc
(2.13) pool manager: add pool name to alarm
b54b976
system-test: Fix grid-security settings
6b383c9
nfs: set EOF flag when channel indicated EOF
692cd29
rpm: enforce SL5 compatibility when building RPM packages
b8e5edb
[maven-release-plugin] prepare for next development iteration

Release 2.13.13

Changes affecting multiple services

Fix shutdown of the nfs, srm and xrootd doors; previously, shutting down any of these doors would block for two seconds.

Eliminate race condition that can lead to a NullPointerException if a cell does not shut down cleanly.

The dcache script no longer checks whether the hostkey.pem file is in PKCS#8 format when dcache stop is invoked. Previously this could lead to orphaned dCache domains; for example, when upgrading dCache RPM.

admin

Fix potential IndexOutOfBoundsException should the response from the acm cell be malformed.

alarms

Update the text format of logged and emailed alarms so that messages include the type of alarm rather than simply the phrase ALARM [ ALARM_TYPE ]. Note that the alarms.email.encoding-pattern and alarms.history.encoding-pattern properties have been updated; sites that have modified either of these properties must review their configuration to take advantage of this update.

pool

Fix an intended pool-to-pool transfer optimisation: the receiving pool failed to reuse a delayed mover, should the pool-to-pool request timeout and be retried.

Fix pools so that they do not log NullPointerException if a pool receives a request to restore a file from an HSM to which it has no access.

Scripts

Fix how the writedata command in the chimera shell accepted data: the command-line argument was ignored if supplied and data was taken from stdin, if no argument was supplied then the command would fail with NullPointerException. Note: this command does NOT write data into dCache, but into Chimera.

srm

Fix the context information included when logging failures to write job information to the database.

Changelog 2.13.12..2.13.13

16b3dab
[maven-release-plugin] prepare release 2.13.13
79c4f58
srm-client, dcache: fixed passing incompatible arguments to functions
4708a79
dcache-nfs, dcache: removed unecessary use of non-short-circuit logic
634513b
(2.13) alarms: modify logback encoding to show actual alarm type
e21d8a6
pool: Fix NPE when restoring file
0352f4f
scripts: do not check for PKCS#8 formatted hostkey.pem on shutdown
c68c762
chimera: Null value passed to non-null parameter in org.dcache.chimera.cli.Shell$WriteCommand.call()
92c8316
Fix shutdown of bounded executor
1a70878
srm: Use correct logging context when saving jobs
3a48e14
cells: Fix NPE during shutdown
b53db95
[maven-release-plugin] prepare for next development iteration

Release 2.13.12

pool

In earlier versions of dCache, a pool will announce itself (in disabled state) to poolmanager before it can handle any queries. This has the potential to result in such messages timing out. With this release, a pool announces its availability only when it is able to handle basic queries.

This release updates how dCache configures the Berkeley DB when used for storing pool metadata. In addition, dCache will now no longer disable the pool when suffering a Berkeley DB-related problem if the Berkeley DB environment is still valid. Combined, these two changed should greatly reduce the occurances of pools disabling themselves when under heavy IO load.

srm

Both versions of the SRM protocol (v1.1 and v2.2) support 3rd-party copying of data. In both cases, the client polls the server to discover what progress has been made. When queried, the server provides a hint when the client should next query the status of the transfer. The SRM v2 code hinted the client should retry only after a very long delay. This is now fixed.

Changelog 2.13.11..2.13.12

b1a8234
[maven-release-plugin] prepare release 2.13.12
683aaf8
pool: Refine Berkeley DB failure handling
942ba41
pool: Do not announce cell until the cell is running
604b82a
srm: Report correct estimated wait time for SRM 2.2 copy requests
f023360
[maven-release-plugin] prepare for next development iteration

Release 2.13.11

Changes affecting multiple services

The ftp, webdav and xrootd doors will delete the target file if an upload was unsuccessful. The copy manager (part of the transfermanagers service) has a similar behaviour if an internal copy is unsuccessful. If this delete was unsuccessful (e.g., the client deleted the file itself) previous dCache versions would log this at ERROR level. With this dCache version, such occurrences are logged at DEBUG level.

Previous versions of dCache would leak resources, potentially leading to slow-down and risking out-of-memory exceptions. Affected services are: dcap (if gsi authentication is used), gplazma (if the voms or xacml plugins are used) and srm.

nfs

One challenge with the NFS protocol is that the server provides the client a directory listing in chunks, with the client controlling when the next chunk is delivered; however, there is no mechanism for the client to say the directory listing is no longer needed. Since directory listings cannot be retained indefinitely, dCache NFS server stores a listing for a maximum of 10 minutes and a maximum of 512 listings are retained (with an LRU policy). Previously, dCache returned an error if the client requests further content from a directory listing that is no longer stored. Although the protocol allows this, the Linux client reacts badly. With this version, dCache tries harder to continue a directory listing: clients that request more content when that listing is no longer stored but where the directory has not changed will succeed with NFS re-reading that directory; only if the content has changed is an error returned.

This release of dCache fixes the error returned to the client if it requests more of a directory’s contents and the NFS server cannot generate that response (e.g., the directory’s contents have changed). Previously BAD_COOKIE (10003) was returned; with this release, NOT_SAME (10027) is returned instead.

Fix how mask names are logged when a GETATTR request is processed.

Fix race condition that can occur when a pool is first accepting pNFS transfers if multiple requests are processed almost simultaneously.

pool

Fix regression introduced in 2.13.10 where the -uri argument(s) for the HSM script were mistakenly omitted when staging a file (the script’s get operation).

srm

Previous versions of dCache failed to honour the srm.authn.capath and srm.authn.vomsdir configuration properties.

webdav

Adds support for the Robots Exclusion Standard (/robots.txt). The default advice is that web-crawlers should avoid indexing all content in dCache.

The webdav door has separate configuration allowing the admin to configure the door-local path that contains site-local files and the URI prefix to access those files. Earlier versions of dCache mistakenly used the former for the latter, which this release fixes.

Changelog 2.13.10..2.13.11

6377960
[maven-release-plugin] prepare release 2.13.11
a256b3d
voms: Fix resource leak regression
062b051
ScriptNearlineStorage: Fix missing -uri parameter for fetch
746dc14
libx: update to nfs4j–0.10.7
3ad1b20
webdav: Respect webdav.static-content.uri property
983a3c0
doors: Do not log failure to delete absent files on upload failures:
3b98e53
nfs4: fix race in request processing
1cc3ecf
webdav: Add robots.txt
290f199
[maven-release-plugin] prepare for next development iteration

Release 2.13.10

Changes affecting multiple services

In earlier versions of dCache, the code-base would always establish a node’s FQDN through a DNS query on start-up. For some services and for dCache scripts (for example, the chimera script), this information is not used. With this release, dCache only makes the DNS query when it is necessary, so domains hosting services that do not need this information and scripts will start faster.

dcap

Doors describe their root path to SRM so it can calculate appropriate TURLs. Previous versions of dCache had dcap doors register incorrect paths, which this release fixes.

dCache configuration allows an admin to control if certain ciphers are allowed. In particular, this allows sites to remove support for problematic ciphers or hashing algorithms. This release fixes a problem where the GSI-dcap door failed to honour such settings.

ftp

A problem within the FTP door could be triggered if the door was proxying a transfer that completes while, at the same time, another component within dCache queries the current status of this transfer. This can have a knock-on effect; for example, causing the System cell to be unresponsive, so preventing tab-completion in the admin shell. This problem is now fixed.

A problem existed since 2.13.0 where FTP transfers, especially those involving data proxying, can hang if the transfer fails. This is fixed with this release.

gplazma

The xacml plugin is primarily used with GUMS servers to discover under which username a user should be logged in. This decision is influenced by the user’s VOMS group-membership, which the xacml plugin extracts. In previous versions of dCache, these group-membership attributes were not checked (as the GUMS server does this). With this release, this is updated so the VOMS attributes are checked. As a consequence, the gplazma.xacml.vomsdir.dir and gplazma.xacml.vomsdir.ca properties are introduced, which take default values from dcache.authn.vomsdir and dcache.authn.capath respectively.

pool

The replica-manager periodically requests a list of file replicas that a pool is hosting. In previous versions of dCache, if the pool finds a broken file then an error is returned to replica-manager. The replica-manager then considered the entire pool as being offline. With this version od dCache, such errors are logged on the pool. The replica-manager will not consider the pool as hosting that file’s data, but will otherwise consider the pool online.

The different HSM operations (flush, stage and remove) have internal timeouts after which the pool considers the request as failed. In previous versions of dCache, the default pool setup includes a four hour timeout for flush and stage but neglected to set a default for delete. This omission caused delete operations to time-out very quickly. With this release, delete operations also have a default of four hours.

Fix the Ruby implementation of the hsmcp script (hsmcp.rb) so it can parse new command-line arguments that include concurrency options.

The concurrency for active HSM operations is configurable and may be adjusted dynamically. In earlier versions of dCache, decreasing the concurrency only became effective when that operation started to idle. This has been fixed so the limits start to have an effect as operations complete.

Each movers can have one of three priority (LOW, MEDIUM, HIGH) and a selection discipline (FIFO or LIFO). The documented behaviour was for queue with names that start with - have LIFO discipline and those that start with any other character have FIFO discipline. Due to a bug, the order was wrong, with the priorities inverted and the two disciplines swapped, so LOW priority movers were started before MEDIUM level and MEDIUM were started before HIGH. This release fixes this so HIGH priority movers are selected preferentially over MEDIUM and, MEDIUM priority movers are chosen over LOW priority; however, it was decided to keep the disciplines as in previous versions and updated the documentation accordingly. There are several reasons for this: first, there is no difference between LIFO and FIFO when movers are not queued; second, neither discipline will help if the pool is persistently overloaded; third, LIFO discipline (although unfair) is documented as providing a better overall throughput during a time-limited overload; fourth, by default dCache has been running with LIFO discipline since v1.9.11 (released 2011–01–13) without any apparent problems.

In earlier versions of dCache, if one of the HSM script arguments contained any whitespace then the script would be invoked with incorrectly formatted arguments. Such whitespace is mostly likely to come from the storage-info. This is now fixed: the arguments are as expected even when they contain whitespace.

HTTP third-party transfers report back if there was a problem verifying that the transfer was successful. One possible problem is that the remote server failed to supply checksum information. Reporting of such situations is now fixed.

webadmin

The dCache alarms page in webadmin can be slow to respond when first visited. This is because the page attempts to load all alarm events when showing this page. With this version of dCache, the initial visit is limited to the most recent 100 events. The auto-refresh of this page has become an option shown within the page and is switched off by default.

webdav

In previous versions of dCache, should a user cancel a transfer shortly after a mover is created then there was a risk that the mover is abandoned. This is fixed with this release.

xrootd

Fix race-condition that can lead to a stack-trace like:

Uncaught exception in thread xrootd-net-13 java.util.ConcurrentModificationException: null
        at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) ~[na:1.8.0_60]
        at java.util.HashMap$KeyIterator.next(HashMap.java:1453) ~[na:1.8.0_60]
        at org.dcache.xrootd.door.XrootdRedirectHandler.channelInactive(XrootdRedirectHandler.java:147) ~[dcache-xrootd-2.13.9.jar:2.13.9]

In earlier versions of dCache, if the xrootd door times out for a write request while waiting for the mover to send the redirection information then the mover is abandoned. This is fixed with this release.

Changelog 2.13.9..2.13.10

2c8b8d4
[maven-release-plugin] prepare release 2.13.10
2dc54e0
hsmcp: update to match new HSM interface.
d1dee38
(2.13) webadmin: improve user-friendliness of alarms page on load
e304879
(2.13) old replica manager: prevent pool being listed as offline when there are files with corrupt metadata
5fb1472
webdav: Kill abandoned movers
afa03c0
xrootd: Kill mover on aborted write
b8dca9e
pool: Fix transfer prioritization
0e11022
pool: Add nearline storage default timeouts
268e141
xrootd: Fix race condition in request cancelation
942c55e
pool: Let script nearline storage provider scale down when lowering limits
56d4737
HSM Script: pass arguments in array
01f5e69
dcap: Fix broken argument parsing
c896635
xacml: fix voms attribute validation
583d58a
dcap: Fix socket factory argument parsing
175d440
Fix build and startup regression
300cc5d
pool: Fix error reporting in remote HTTP mover
267e91e
ftp: Don’t hold locking during proxy shut down
2514775
ftp: Fix deadlock triggered by failures
ad4d205
prepare for next development iteration

Release 2.13.9

srm

Fix security vulnerability in srm EGI-SVG–2015–9495 (restricted).

Release 2.13.8

Changes affecting multiple services

In many cases, poolmanager would timeout after ten seconds when asked which pool to use for a transfer. This behaviour was not intended. The consequence of this bug is protocol specific: for some protocols, the door retries internally while other doors propagate this error to the client. Another consequence was the increased risk of the domain hosting poolmanager running out of memory, particularly when staging files. This release fixes the underlying problem. It is recommended that all doors be upgraded.

Fix a performance regression when deleting directories; the fix affects the pnfsmanager and nfs services.

Doors announce themselves with tags that identify for which purposes they should be used. These tags are comma-separated and the srm tag indicates that the door should be made available for SRM transfers. In earlier versions of dCache, a problem was found where the comma-separated list was not split correctly, preventing the SRM from recognising the door is available. This is fixed with this release.

Fix that the configured ca directory and vomsdir were ignored by the srm and dcap doors.

dcap

Fix how dcap doors are published so the SRM door no longer selects plain (unauthenticated) dcap doors for writing when the dcap.authz.anonymous-operations configuration property states that they are READONLY, and never selects the door if the property value is NONE. The dcap.authz.readonly configuration property continues to be honoured when dcap.authz.anonymous-operations is FULL, but is ignored otherwise.

srm

Fix performance regression due to reloading of trusted CA certificates with each request.

Fix support for credential delegation if the credential’s certificate is sent over multiple SSL frames. This most likely happens when the certificate exceeds the maximum frame size.

Changelog 2.13.7..2.13.8

922156f
[maven-release-plugin] prepare release 2.13.8
8a0cb80
ftp: Add missing import
7c5c3a1
srm: Cache the vomsdir and ca stores
7f1ab21
chimera: Resolve performance regression in directory deletion
98f87c1
doors: Fix timeout handling during retries
6b4ab0c
Fix timeout math to avoid overflow
28bc521
srm: Fix credential delegation
81e5d2a
doors: Fix string parsing for door tags and paths
bc62652
dcap: Respect dcap.authz.anonymous-operations when publishing login information
633308b
srm,dcap: Respect cadir and vomsdir settings
0940132
prepare for next development iteration

Release 2.13.7

ftp

Fix security vulnerability in gsi and kerberos authenticated ftp.

Release 2.13.6

Changes affecting multiple services

The System cell of each domain contains a version command that allows discovery of which dCache version is running. This release fixes this command. Note: there is no problem with the dcache script’s version command.

The internal message handling within dCache allows for online changes to how messages are routed. This release fixes the command for removing existing default and dumpster routes.

httpd

Doors may retry creating a mover should the initial creation fail. The web-admin is not prepared for this eventuality, leading to a stack-trace with Multiple entries with same key being logged. This release fixes this.

nfs

The NFS protocol provides access to additional infomation through dot commands. This release fixes the nameof and pathof commands for non-ASCII filenames.

pnfsmanager

Fix that dCache respects the setgid bit on a parent directory when the user uploads a file via the SRM protocol. Important: the srm node should be updated at the same time.

srm

Enforce authorisation of requests to finalise or cancel an upload. When initiating an upload, the user’s uid and primary gid are taken as the request owner-uid and owner-gid respectively. Only users that have the same uid as the request’s owner-uid or are a member of the request’s owner-gid are allowed to cancel or finalise an upload. Important: the srm must be updated if the pnfsmanager is updated.

webadmin

This release fixes a bug with the periodic building of billing plots. Previously, if the billing service took too long to reply then there would be no further updates to the billing plots.

Changelog 2.13.5..2.13.6

ccbae45
[maven-release-plugin] prepare release 2.13.6
bab3e9d
rpm: remove “commented out” macros lines from spec file
6ae156c
cells: Fix dumpster and default route removal
d4bc920
chimera: fix nameof and pathof for paths containing unicode
ffff2a3
httpd: Fix transfer collection for duplicate movers
ce293c9
chimera: Let SRM respect setgid on upload
7d98567
srm: Add authorization to put done and abort requests
808f6bf
module: cells
4ea1765
(2.13) dcache-webadmin: add TimeoutCacheException to catch clause in billing service
6ee6d97
[maven-release-plugin] prepare for next development iteration

Release 2.13.5

Changes affecting multiple services

Specifying the DISABLE_BROKEN_DH flag in the dcache.authn.ciphers configuration property disables all Diffie-Hellman ciphers if Java 7 is used; if dCache is run with Java 8 then this flag has no effect. Disabling DH ciphers is necessary because Java 7 contains a broken implementation of Diffie-Hellman, which was fixed with the release of Java 7 update 51. This dCache release updates the behaviour of the DISABLE_BROKEN_DH flag to allow Diffie-Hellman ciphers if dCache is run within Java 7 update 51 or later.

The description of the DISABLE_EC and DISABLE_RC4 flags have been expanded and updated.

This release updates advice on how the dcache.net.listen configuration property should be set when clients should use a DNS alias FQDN (a CNAME record) rather than the FQDN of one of the door’s public interfaces.

nfs

Add support for the TEST_STATEID and FREE_STATEID RPC methods. These are used by the Linux kernel during recovery procedure. The previous lack of support for these methods can lead to the leaking of stateids, which can lead to NFS4ERR_RESOURCE : Too many states being logged.

pool

Improve the error message (logged by the pool and in billing) should the FTP mover fail to connect to the client.

Changelog 2.13.4..2.13.5

12fcef1
[maven-release-plugin] prepare release 2.13.5
c94ac77
pool: fix ftp mover to provide better logging when failing to connect
77b16f0
properties: update description of dcache.net.listen
6682ce9
crypto: refine handling of broken ciphers
a093f64
libs: update to nfs4j–0.10.6
af19ef8
[maven-release-plugin] prepare for next development iteration

Release 2.13.4

Changes affecting multiple services

The event logger records when messages are received and sent by cells. Some cell messages send string commands; if so, the log entry contains that string. In previous releases, such string commands were mistakenly double-escaped. This is fixed with this release.

Doors publish their root path as part of the information that the srm service uses when determining TURLs. With this release, the published root path is included in the door’s info output.

admin

The admin cell features tab-completion for cell names; names without an @ expand using a list of well-known cells. In earlier releases, this did not work if both the admin and the well-known cell are not located in the broker domain (typically dCacheDomain). This is now fixed.

ftp

Fix debug output to include the flavour of GSS implementation; for example, GssFtpDoorV1::secure_reply: going to authorize using k5

gplazma

Earlier 2.13 releases contained a performance regression that resulted in the voms plugin requiring large amounts of CPU; using this plugin would result in gPlazma being very slow. This is now fixed.

pnfsmanager

When a user attempts to delete a symbolic link using a non-NFS door, previous versions of dCache would resolve the symbolic link to determine whether the user is allowed to delete the symblic link: only if the user is allowed to delete the symblic link’s target would the symbolic link be removed. With this release, the check verifies if the user is allowed to delete the symbolic link instead.

pool

This release updates the JVM command-line to make it explicit that compressed object references are in use. This allows the Berkeley DB library to calculate a more accurate cache size, potentially improving pool performance.

In previuos releases, any attempt is made to query a pool’s info (e.g., via the admin interface) while the pool is initialising will block until the initialisation has completed. This has a knock-on effect of blocking all subsequent messages. With this release, requesting information about a pool will not block during initialisation.

Fix high memory usage during pool initialisation if pool has any precious files.

srm

Fix SRM to allow downloads via read-only doors.

xrootd

The access log for the domain hosting an xrootd door would contain incomplete information about xrootd interactions: certain responses were logged without the corresponding xrootd session. This is now fixed.

Earlier releases will record a stack-trace if xrootd recieves a malformed request. This is now fixed.

Changelog 2.13.3..2.13.4

7ee11e5
[maven-release-plugin] prepare release 2.13.4
324f9dc
doors: Include root path in login broker info output
925de9e
admin: Fix cell name tab completion
a9706c8
srm: Fix read path check for srmPrepareToGet
2ebb724
cell: Don’t quote string command in event logger
053de63
pool: Explicitly enable compressed oops to calculate correct cache size
995342b
pool: Fix locking bug causing high memory usage during pool initialization
0a3e8a2
chimera: Fix path resolution on delete
9477e52
xrootd: Fix ‘xrootd logs stack-trace on malformed request’
bbf75b8
move execution of the superclass method before any concrete class initializations
2f4c55e
gplazma: Fix VOMS plugin regressions
160361b
pool: do not list a repository during initialization
03141eb
xrootd: Add session to access log
edc28b7
[maven-release-plugin] prepare for next development iteration

Release 2.13.3

gplazma

This release fixes a regression where using the xacml gPlazma plugin would prevent gsiftp-based transfers from succeeding.

pool

The Berkeley DB based metadata storage can sometimes fail. Should this happen, the pool must be restarted. In previously releases, such problems were logged with an unclear message and a stack-trace; the pool would continue to operate but would fail all subsequent transfers; nothing made it clear the pool must be restarted. With this release, such Berkeley DB problems will be logged with a concise error message and the pool will be disable, making the restart requirement explicit.

Fix regression in the migration module that could caused more replicas to be created than was desired and could cause replication to fail with a No targets error message.

srm

In previous versions of dCache, the ls operation in SRM occationally returned incomplete or inconsistent results. This is now fixed.

xrootd

Fixes GSI authenticated xrootd.

Changelog 2.13.2..2.13.3

63af410
[maven-release-plugin] prepare release 2.13.3
2ea8320
pool: Fix race condition in migration module
fa76bc9
srm: fix race condition in ls response
42249c5
xrootd: Fix GSI authentication
393f687
pool: Disable pool on meta data failures
7fb2c9f
srm: Fix wrong Spring factory bean type
3696470
(2.13) gplazma-xacml: add GlobusPrincipal (DN) to identified principals
62eebd5
[maven-release-plugin] prepare for next development iteration

Release 2.13.2

Changes affecting multiple services

In previous releases, dCache required a layout file be present, even if that file was empty. This has two negative impacts: the dcache stop (typically invoked automatically when updating a package) would not work, nodes where only scripts (e.g., info-provider) are used would require unecessary configuration. With this release, a missing file generates a warning but does not prevent the dcache stop command or scripts from working. This warning may be suppressed by setting dcache.layout.uri to an empty string.

httpd

The tape related queues (flush and restore) have no maximum limit, yet both the old web information and new webadmin show a dummy maximum value for these queues. This meaningless maximum value is no longer shown.

infoprovider

It is possible to run multiple SRM endpoints in dCache, provided certain restrictions are upheld. With this release, the info-provider publishes multiple SRM endpoints correctly.

nfs

Some shells, when attempt to overwrite a tag’s content using NFS, do so in a way that Chimera previously failed to support. This failure was reported back to the user as a remote I/O error. This release fixes this problem.

Log NFS clients that provide a stale client-ID.

When configured to support NFS v3, the nfs door will attempt to start the embedded portmap service if it cannot find the system portmap service. Previous releases of dCache would log a stack-trace if starting the embedded portmap service fails. With this release such failures are logged but no stack-trace is generated.

pool

Reduce latency when a pool processes a request to start a read mover. This improves dCache responsiveness when a client opens a file for reading.

The NFS mover uses the file’s size when processing read and write requests. For read operations, the file size cannot change. This release takes advantage of this to reduce the load on the underlying filesystem.

Reduce memory churn on pool when processing NFS requests. With this release, fewer small objects are created, resulting in less pressure on Java’s memory garbage-collector and consequently reduced likelihood of latency spikes.

poolmanager

Poolmanager may attempt to create additional copies of a file, only to discover such attempts fail because of other constraints. This leads to the log file containing entries like P2P denied: already too many copies and P2P denied: No pool candidates available/configured/left for p2p or file already everywhere. With this release, such entries are logged at info level: they no longer appear in the log file, but are available via the poolmanager’s pinboard.

srm

This release overhauls the srm service’s queuing behaviour. To reduce likelihood of bulk operations overwhelming dCache, the srm limits the number of requests it processes concurrently to some configurable maximum. It places any additional schedulable activity (LS, GET, PUT, BRING-ONLINE and RESERVE-SPACE) into queues; the next request from the queue is processed when a processing slot becomes available. Previously, the process of selecting which queued request to process next was both unfair and inefficent: as the queue size increased, the srm node consumed considerably more CPU effort and individuals experienced increased latency.

This release sees the introduction of a new queuing mechanism designed to overcome these problems. The queing strategy is both configurable and pluggable, allowing the admin to control the desired characteristics. Some of the queuing strategies shipped with dCache are designed to share the available processing slots fairly between users or between groups of users; this grouping of users is also both configurable and pluggable, with dCache shipping with many useful choices. See the srm default configuration properties file for further details.

Changelog 2.13.1..2.13.2

c4fa8eb
[maven-release-plugin] prepare release 2.13.2
09a85b9
systemtest: fix install command in credentials command
82e37f6
chimera: throw FileExistChimeraException if tag already exists
52bb9ff
infoprovider: remove single SRM instance limitation
9d9398f
(2.13) webadmin: do not display numerical value for max restores or stores
6e88ad3
boot: Don’t fail on missing layout file
bd06463
srm: Refactor scheduler to avoid unused queues and per user counters
5af2eb9
srm: Reimplement scheduling policy to resolve scalability issues
2d786f3
httpd: Do now show maximum for restore and flush queues
b6d4827
poolmanager: Reduce log level of p2p denial
9b6e708
libs: update to nfs4j–0.10.5
d51c9a5
pool: simplify duplicate request handling
66a69ca
pool: do not create new stateid in NfsMover#getStateId()
d6873c3
[maven-release-plugin] prepare for next development iteration
ac724f6
pool: reduce load on back-end file system

Release 2.13.1

Changes affecting multiple services

A new security configuration option allows the dCache admin to ban all SSL/TLS ciphers that use the RC4 cipher. RFC–7465 states services MUST NOT accept an RC4-based cipher suite. Adding the DISABLE_RC4 option to the dcache.authn.ciphers makes dCache compliant with RFC–7465. This option is not enabled by default to avoid possible regression with clients that require the RC4 cipher. This property affects dcap (with GSI), ftp (with GSI), srm, webadmin, webdav (with SSL/TLS), xrootd (GSI plugin) services.

pnfsmanager

Fixed ACL inheritance when uploading data through SRM. In earlier versions of dCache, a file uploaded through SRM failed to inherit any inheritable ACEs from the parent directory.

This release brings some modest performance improvements when creating upload directories. This improvement is available automatically only to sites that have not yet upgraded to 2.10 (or later). Sites already running 2.10 or later can enjoy the same improvements by deleting the upload directory (/upload by default) to allow dCache to recreate it. Important: deleting the upload directory will fail any current SRM uploads; it is recommended to do this during down-time.

pool

The pool’s migration module may be invoked with different pool selection modes: the -select option. The random selection option (-select=random) excludes pools that are full, but mistakenly considers replicas that could be deleted (i.e., non-sticky cached replicas) as part of the used space; this treats a pool as full even when the pool has removable files. With this release, pools that are full but contain some cached files are potential targets for random pool selection.

In earlier releases of dCache, the save command failed to record the stage, flush and remove timeouts for nearline storage (rh set timeout, st set timeout, rm set timeout respectively). This is now fixed.

The pool’s NFS mover will remember the previous TCP port on which it listened and will attempt to reuse it after restarting.

Scripts

Fix JAR selection when a short-lived java process is started; in particular, this fixes a problem with the chimera utility when operating on ACLs. In earlier versions, such operations fail with a java.lang.NoClassDefFoundError: org/dcache/acl/parser/ACEParser message.

spacemanager

Fix writing into a reservation when using a protocol that does not provide a username or FQAN; for example, when writing into dCache using NFS and with the WriteToken directory tag set. Previously writing would fail with a Message processing failed: null group message.

srm

The srm service can generate RemoteException stack traces when dCache is behaving correctly. These are now logged at debug level and without a stack trace.

Changelog 2.13.0..2.13.1

7ca949e
[maven-release-plugin] prepare release 2.13.1
73a4dee
pool: introduce unique port number for nfs mover
6dad279
pool: Store nearline storage timeouts to pool setup file
a9613dc
pnfsmanager: Create base upload directories without tags and acls
a9260c9
pnfsmanager: Inherit ACLs on upload with SRM
249b290
chimera: Add ACL insert triggers for HSQLDB
9a6b5ce
Fix limited class path generation
16739e8
pool: Let random pool selection select pools with removable files
6c493af
crypto: allow banning of RC4 cipher suites
b4903bf
spacemanager: Allow unowned files in reservations
b8a0c86
srm: Don’t log erroneous stack trace
770d73f
[maven-release-plugin] prepare for next development iteration

Release 2.13.0

Admin shell got major face lift

The admin shell has been reworked entirely. We have taken several design cues from the PostgreSQL shell,l psql, that most dCache admins are familiar with.

Most notably, the command set is syntactically divided into built-in commands affecting the shell itself and commands sent to connected dCache cells. The former are all prefixed with a backslash while the latter are not. This syntactic separation means that the shell commands are available at any time, even when connected to a cell. E.g. the cd command got replaced by \c, allowing one to connect to a different cell without first having to disconnect from the current cell. Thus there is no need for a .. command.

The shell provides built-in help for the shell commands using \?. This help is separate from the regular help output of the connected cell. The latter is available using the new \h command (the classic help command still works, but will not provide color highlighting).

All built-in shell commands provide full tab completion, including for the arguments. E.g. \c provides tab completion on the cell name; it will complete on all well-known cells, all local cells of the domain of the connected cell, and on domain names of fully qualified addresses.

The new \l command lists cells. By default it lists well known cells and cells of the domain of the connected cell. Wildcards for fully qualified addresses however match all cells, e.g. \l p*@* lists all cells starting with the letter p (\l p*@ is a shorthand for the same); and \l *@fooBar lists all cells of domain fooBar (with the shorthand \l @fooBar).

As part of the face lift, both the shell prompt and the welcome message got updated. The prompt is partially configurable using the new admin.prompt property.

The old shell provided several utility commands at the “root” level. These commands are no longer supported, but most of them can be mimicked using some of the new commands introduced.

We provide a brief summary of the new commands in the tables below, as well as a comparison of the old and new commands.

Cell address wildcards
Pattern Meaning
cell@domain Fully qualified cell address.
cell Well known cell.
GLOB@domain All cells matching GLOB in domain.
cell@GLOB All cells called cell in any domain matching GLOB.
cell@ Short for cell@*, i.e. any cell called cell in any domain.
@domain Short for *@domain, i.e. any cell in the domain called domain.
@ Short for *@*, i.e. any cell in the entire system.
*@domain1:cell@domain2 Source routed cell path. The asterix is not a wildcard!
Admin shell commands illustrated
Command Meaning
\? Show a summary of shell commands.
\? CMD Show help of shell command CMD.
\h Show a summary of commands of connected cell.
\h CMD Show help of command CMD of connected cell.
\c CELL Connect to well-known cell.
\c CELL@DOMAIN Connect to fully qualified cell.
\c CELL USER Connect to cell as user.
\l List all well-known cells and cells local to the currently connected domain.
\l GLOB List all cells matching the cell wildcard.
\q Quit shell.
\s CELL CMD Send CMD to cell.
\s CELL1,CELL2 CMD Send CMD to cell1 and cell2.
\s GLOB CMD Send CMD to all cells matching the cell wildcard.
\sn CMD Send CMD to pnfs manager (name space).
\sp CMD Send CMD to pool manager.
\sl PATH CMD Send CMD to all pools containing the file described by PATH.
\sl PNFSID CMD Send CMD to all pools containing the file described by PATH.
Keyboard shortcuts
Key Meaning
^C Interrupt currently running command.
^D Quit shell.
^R Search history.
TAB Complete command or argument.
Tricks
Command What it does
\s System@ kill System Restart all domains.
\sl /path/to/file rep ls $1 List all replicas of the file.
\sl /path/to/file rep rm $1 Remove all removable replicas of the file.
\sl /path/to/file rep set sticky -o=me -l=60000 $1 Add a 60s sticky bit owner by me to all replicas of the file.
\sn cacheinfoof /path/to/file Get cache locations of the file.
\sn cacheinfoof /path/to/file Get cache locations of the file.
Replacements for legacy commands
Old New Remark
flags ls file \sn flags ls file Output format is different.
flags remove file key \sn flags remove file -key
flags set file key value \sn flags set file -key=value
getpoolbylink
modify poolmode enable pool[,pool…] \s pool,[pool…] pool enable
modify poolmode disable pool[,pool…][code [message]] [options] \s pool,[pool…] pool disable [code [message]][options]
p2p pnfsid source target \s source migration cache -pnfsid=pnfsid target
p2p pnfsid \sp replicate pnfsid localhost The new form is non-blocking.
pnfs map path \sn pnfsidof path
quota query
repinfoof file \sl file rep ls $1 Output format is different.
set deletable pnfsid \sn flags set pnfsid -d=true
\sl pnfsid ret set cached $1
Note that neither form touches the sticky bit.
set sticky file \sl file rep set sticky $1 on
set sticky pnfsid -target=pool \s pool rep set sticky pnfsid on No equivalent command when using path.
set unsticky file \sl file rep set sticky $1 off
set unsticky pnfsid -target=pool \s pool rep set sticky pnfsid off No equivalent command when using path.
uncache file \sl file rep rm -force $1
uncache pnfsid -target=pool \s pool rep rm -force pnfsid No equivalent command when using path.

Admin shell provides better scripting support

The new admin shell detects whether it is connected to a pseudo terminal. If it isn’t, it typically means the input and output are redirected such as when connecting from a script. In this case the admin shell disables all styling: No welcome message, no prompt, no echo, no color highlighting, no command history. This makes it much easier to script the admin shell.

Admin shell automatically creates history file

Admin shell has had support for persistent command history for a long time. This feature was however only enabled if one manually created the file used to store the command history. This file is now created automatically if it doesn’t exist.

New legacy subsystem provides compatibility with old scripts

To support old scripts, the old shell is available as the legacy ssh subsystem. One can connect to this subsystem by adding the -s legacy option to the ssh command line.

We do however recommend that scripts are updated to use the new shell.

Admin no longer hard-codes cell addresses

Several new properties allow communication details to be configured:

admin.service.poolmanager=${dcache.service.poolmanager}
admin.service.poolmanager.timeout=30000
(one-of?MILLISECONDS|SECONDS|MINUTES|HOURS|DAYS)admin.service.poolmanager.timeout.unit=MILLISECONDS

admin.service.spacemanager=${dcache.service.spacemanager}
admin.service.spacemanager.timeout=30000
(one-of?MILLISECONDS|SECONDS|MINUTES|HOURS|DAYS)admin.service.spacemanager.timeout.unit=MILLISECONDS

admin.service.pnfsmanager=${dcache.service.pnfsmanager}
admin.service.pnfsmanager.timeout=30000
(one-of?MILLISECONDS|SECONDS|MINUTES|HOURS|DAYS)admin.service.pnfsmanager.timeout.unit=MILLISECONDS

admin.service.acm=${dcache.service.acm}
admin.service.acm.timeout=30000
(one-of?MILLISECONDS|SECONDS|MINUTES|HOURS|DAYS)admin.service.acm.timeout.unit=MILLISECONDS

Many admin shell commands now accept hex and octal values

Many admins shell commands now accept hexadecimal and octal numeric arguments. Hexdecimal arguments must be prefixed by 0x and octal arguments by 0.

Improved pcells compatibility

Several changes have been made to simplify maintaining compatibility with the pcells GUI. As part of these changes, the admin service now redirects pcells requests to the appropriate dCache cell, allowing pcells to be used even if pnfsmanager, poolmanager or spacemanager use non-default names.

Improved cell message communication

The internal cell communication has been heavily restructured. Most of these changes are internal and - except for maybe minor latency improvements - are invisible to the user. There are however also several bug fixes to corner cases, as well as improvements in logging and routing failure handling. One may observe more routing errors being logged; such errors happened in previous versions too, but were not always logged.

Several changes to the message routing logic were made. These changes mostly affect non-default topologies, enabling use cases not possible in previous versions. When matured, these use cases will be documented separately.

Route maintenance logic has been updated. The most immediately observable changes are the new output format of the route command in the System cell, the fact that route deletion requires the full route to be specified, and that duplicate routes are allowed (messages are routed through the first route that matches).

Admin ping command provides RTT

The ping command in the System cell is now blocking and measures the round trip time.

New defaults for database names and owners

The defaults for database names are now such that each logical database is stored separately. Except for Chimera, this means that each RDBMS enabled service uses its own database. The default name is that of the service, e.g. srm stores its data in a database called srm, and the spacemanager service stored its data in a database called spacemanager.

The default owner has been changed to dcache for all databases, including Chimera.

New common configuration properties control the defaults for all databases, but per service overrides are of course possible:

#  ---- Default host of RDBMS used by various services.
#
#   Various services need an RDBMS. Each service is recommended to use its
#   own logical database, but these may or may not be hosted in the same RDBMS
#   instance. This setting defines the host of the RDBMS used.
#
dcache.db.host = localhost

#  ---- Default RDBMS user account by various services.
#
dcache.db.user = dcache

#  ---- Default password of RDBMS used by various services.
#
dcache.db.password =

#  ---- Default password file for RDBMS used by various services.
#
#   Follows the format of the PostgreSQL pgpass file format. Avoids putting
#   the passwords in the dCache configuration files.
#
dcache.db.password.file =

Upon upgrade, existing sites should update the settings to match the previous setup:

srm.db.name = dcache
srm.db.user = srmdcache

spacemanager.db.name = dcache
spacemanager.db.user = srmdcache

pinmanager.db.name = dcache
pinmanager.db.user = srmdcache

transfermanagers.db.name = dcachet
transfermanagers.db.user = srmdcache

replica.db.name = replicas
replica.db.user = srmdcache

chimera.db.name = chimerac
chimera.db.user = chimera

billing.db.name = billing
billing.db.user = srmdcache

alarms.db.name = alarms
alarms.db.user = srmdcache

Broadcast service replaced by publish-subscribe messaging

The broadcast service is no longer supported and must be removed from the layout file on upgrade.

The functionality of the broadcast service have been replaced by an integrated multicast ability in the cell messaging system. Sending cells publish messages on a common topic while receiving cells subscribe to these topics. The topic is a regular cell address and is analogous to a multicast group in networking. This cell address is not bound to any particular cell. Instead the new TOPIC route directs messages published to a topic to all subscribers.

The system is transparent and does not require any configuration.

Login broker service replaced by publish-subscribe messaging

The loginbroker service is no longer supported. This service acted as a directory service for doors and was used by srm, info, and httpd to find doors. The functionality has been replaced by publish-subscribe messaging.

To control which doors are used by srm, httpd and the info provider, a new tag system has been introduced. Doors may be tagged by setting the *.loginbroker.tags property, and the consuming services may in turn filter doors by those tags. E.g. will only use doors that has one of the tags defined by srm.protocols.tags.

By default, the srm service only uses doors tagged as srm and the info-provider only publishes doors tagged as glue. By default, all doors are tagged as both srm and glue.

As part of this change, both the publishing and consuming services have been extended with several new lb commands. Doors can dynamically change the tags they publish, as well as stop publishing read or write capabilities (thus preventing SRM from using those doors). Similarly, consuming services like srm have been extended with commands to list collected information about doors:

lb disable [-read] [-write]  # suspend publishing capabilities
lb enable [-read] [-write]  # resume publishing capabilities
lb ls [-l] [-protocol=<string>]<...>  # list collected login broker information
lb set tags <tags><...>  # set published tags
lb set threshold <load>  # set threshold load for OOB updates
lb set update <seconds>  # set login broker update frequency
lb update # refresh login broker information

DCAP directory service is auto instantiated

The dir service used by DCAP to provide directory listing has been removed. The corresponding cell is now automatically instantiated whenever a DCAP door is created. Upon upgrade, the dir service must be removed from the layout file.

Support for JMS removed

Several years ago, an alternative messaging system based on the Java Messaging Services specification was introduced in dCache. This allowed external message brokers like ActiveMQ and OpenMQ to be used with dCache. This feature never caught on and we have removed all support for JMS.

Support for Terracotta removed

The srm service has supported a load-balancing system called Terracotta. This system supported running multiple SRM instances in a load-balancing setup. To our knowledge, this system has never been utilized by any site, and our internal evaluation showed that the overhead was relatively large. To avoid wasting developer resources on an unused feature, all support for Terracotta has been removed.

Support for multiple httpd instances

With the improved pcells decoupling from dCache, it is now possible to have several httpd services in the same dCache instance. One needs to give each a unique name, but that’s the only configuration tweak needed.

httpd uses JGlobus for certificate handling

When hosting the webadmin service, httpd supports HTTP over TLS. dCache now uses JGlobus for TLS, or more specifically, for the certificate handling required for TLS. The immediate consequence is that one no longer has to import host and CA certificates into a Java Keystore file. The httpd service can now read those directly from your /etc/grid-security directory.

dcache import command removed

Since certificates no longer need to be imported into Java Keystore files, the dcache import commands have been removed from the dcache script.

Robust delete notification for spacemanager, replica, and pinmanager

The spacemanager, replica, and pinmanager services rely on receiving notifications for when files are deleted. Previously, these were sent by both pnfsmanger and cleaner, but both used unreliable notifications. This in particular affected spacemanager: If the spacemanager was shut down while pnfsmanager or cleaner were running, the services could quickly get out of sync, resulting in leaked files in spacemanager.

This notification has now been replaced by a robust notification from cleaner. The deleted file is not removed from the trash table until the receiving services have been notified and have confirmed the notification. Whether the notification is send is controlled by the new cleaner.enable.space-reservation and cleaner.enable.replica settings. These settings must be in sync with whether space management or replica management is used in this dCache instance.

It is assumed that every dCache instance has a pinmanager. If you don’t have one, the easiest solution is to add this service upon upgrade.

Robust flush notification for spacemanager

Similar to the delete notification, the unreliable flush notification previously sent by pools was replaced by a reliable notification from pnfsmanager. The new pnfsmanager.enable.space-reservation controls whether these notifications are sent. One consequence is that if spacemanager is down, flush to tape will halt.

dCache expects spacemanager by default

The defaults have changed such that it is assumed that space reservations are supported unless explicitly disabled. In other words, the property dcache.enable.space-reservation now defaults to true.

This change has been made because it is a safer default. In a system without space management, forgetting to set the above property to false simply results in a non-functioning system. In a system with space management, forgetting to set this property to true results in lost notifications, causing silent database inconsistencies. Thus true is the safer default.

Sites without space manager will have to set dcache.enable.space-reservation to false on all doors and other head nodes.

As before, one has to have a spacemanager instance in the layout file to support space reservations. The change in defaults is merely whether the rest of dCache expects space manager to be running or not.

Info provider supports configurable service names

The names of the pinmanager and poolmanager services used by the info provider are now configurable.

Info provider generates timestamp attributes

Info provider now adds a GLUE2EntityCreationTime attribute to all the GLUE 2 entities it creates.

Pinmanager provides better error messages

Pin manager now distinguishes between several different causes for failing to pin a file.

Pool migration module maintains last access time

The new -atime option to the migration commands allow migrated files to keep the original access time. This allows the garbage collection order to be maintained.

Pool garbage collector implements strict LRU

Previous versions of the sweeper implemented an approximation of least-recently-used (LRU). Starting with version 2.13 it implements real LRU.

New pool safe-guards in case of repository errors

Additional safe guards to automatically disable a pool in case of fatal errors have been added. A pool will now also stop flushing files to tape if it has been marked dead.

Better HTTP error handling in pools

Previous versions would disconnect the client on every error. Starting with version 2.13, the pool will try to keep the connection alive.

Unified mover timeout handling in doors

Doors have several timeouts related to starting a mover on a pool, however the precise semantics were not unified. With this version, all doors use common timeout concepts for pool selection, mover creation and mover queuing.

Space manager has been renamed

The default space manager name has been changed to SpaceManager. In previous versions, the default was SrmSpaceManager, but since the space manager service is independent of the SRM service, this name was misleading. For compatibility, a temporary alias was added to allow the admin service to use both names.

Space manager supports less intrusive link group update command

The space manager update link groups command no longer blocks the space manager while the link groups are updated. The command returns once the updated information is being used.

Improved shutdown logic in space manager

If space manager discards requests during shutdown, the requesting service is now notified about the discarded messages. This allows for faster recovery after a space manager service restart.

Link group authorization by GID

The link group authorization file now supports authorization by GID. This allows space manager to be use with anonymous protocols like plain DCAP, plain xrootd and plain NFS.

Space manager provides better select failures

Space manager now provides better error messages when unable to find a suitable link group of write pool. It also provides better error code to distinguish permission and configuration errors from other errors. Doors like DCAP, NFS and FTP make use of these error codes to decide whether to retry requests or which error codes to send to clients.

Asynchronous pool selection

Pool selection requests in doors may now be asynchronous, meaning the door does not need to maintain a thread for the duration of pool selection. This is currently only utilized by the FTP door.

Pool HSM commands provide helpful error messages

Thanks to Onno Zweers from SARA for contributing a patch that allows the hsm set and hsm unset commands to provide more helpful error messages.

Pool Berkeley DB meta data format change

The Berkeley DB library used by pools for meta data has been upgraded. The update is forward compatible in that the new version can read the old meta data. However once upgraded, the pool cannot be downgraded to previous versions. If a downgrade is required, the meta data needs to first be converted to the file backend before the downgrade, and then back to the db backend after the downgrade.

Chimera cleans old leaked tag inodes upon upgrade

A bug in Chimera caused tag inodes to be leaked upong directory deletion. The bug was fixed in 2.10, however to reduce the impact of the fix, already leaked inodes were not deleted. When upgrading to 2.13, such inodes are automatically removed. Depending on the number of tag inodes, this procedure may take a while (from minutes to hours).

As always, if an nfs service has been placed ahead of the pnfsmanager service in a domain, the database schema update has to be triggered manually by running dcache database update before starting dCache.

Chimera relaxes permission checks on stat

In compliance with POSIX, Chimera no longer requires execute permissions for the containing directory to stat a file.

Timing gauges support asynchronous processing

The srm provides gauges for measuring the executing time of various request. Some of these requests are asynchronous and as a result appeared to be much faster than they really are. The gauges have now been extended to support asynchronous processing and the measured time now reflects the true processing time of such request.

Increase precision of timing gauges

The precision of timing gauges used by services like srm and pnfsmanager has been improved.

Name space services provides timing gauges for Chimera calls

The info command of the pnfsmanager service has been extended to provide timing information for Chimera calls.

New admin commands for NFS service

Previous versions showed information about all transfers, pools and clients as part of its info output. This tended to clutter the output and also caused a lot of unnecessary information to be collected by the info service.

With this release, NFS doors no longer show this information in the info output. Instead that information can be queried using several new admin commands:

show clients [<host>]  # show active NFSv4 clients
show pools [<pool>]  # show pools to pNFS device mapping
show proxyio # show proxy-io transfers
show transfers # show active transfers

New thread pool settings for NFS service

The NFS service provides the following new settings for controlling the message thread pool size:

nfs.cell.limits.message.threads.max=8
nfs.cell.limits.message.queue.max=1000

srmGetSpaceTokens results are cached by SRM service

Clients such as FTS query the available space tokens prior to every upload. This used to cause a query message to be sent to the spacemanager service every time, with spacemanager querying its database. Since the list rarely changes, this release introduces a short-lived cache for this information.

Log FTS-specific information

FTS, since v3.2.34, will send the Job-ID, File-ID and retry-count to dCache. Here is an example of this information: job-id=bb62f96e-23da-48c1-bd6f-e0737588733b;file-id=996117;retry=0.

How FTS provides this information varies depending on the protocol. For SRM and WebDAV, a special HTTP header is used; for FTP the information is part of the CLIENTINFO command. In all cases, the information is available in the access log (<domain>.access file).

The following shows FTS sending an SRM prepareToPut command:

level=INFO ts=2015-06-15T09:13:18.035+0200
event=org.dcache.srm.request [...] request.method=srmPrepareToPut
[..] request.token=-2147481808 status.code=SRM_SUCCESS
client-info=job-id=bb62f96e-23da-48c1-bd6f-e0737588733b;file-id=996117;retry=0
user-agent="fts_url_copy/3.2.34 gfal2/2.10.0 srm-ifce/1.23.1
gSOAP/2.7"

The following shows the same information in an FTP session:

level=INFO ts=2015-06-15T09:13:19.769+0200
event=org.dcache.ftp.response
session=door:GFTP-prometheus-AAUYiTEsOCg command="ENC{SITE
CLIENTINFO scheme=gsiftp;appname=\"fts_url_copy\";appver=\"3.2.34
(gfal2
2.10.0)\";job-id=bb62f96e-23da-48c1-bd6f-e0737588733b;file-id=996117;retry=0}"
reply="ENC{250 OK}"

The FTP session identifier (door:GFTP-prometheus-AAUYiTEsOCg in above example) allows discovery of all commands and corresponding responses for this attempt to upload a file.

The inclusion of this information allows correlation of dCache activity against FTS activity, which may prove useful when diagnosing problems.

Asynchronous open for xrootd

Pool selection in dCache is unbounded in that it may trigger staging from tape or internal replication of files. Yet many xrootd clients have very short timeouts. To avoid that clients time out during staging, dCache xrootd doors now use the asynchronous reply format of the xrootd protocol. In this format, the server immediately responds to an kXR_open request, telling the client that the actual response will be provided asynchronously at some point in the future.

Support for custom HTTP response headers

Support has been added to the webdav door and the pool so that custom HTTP response headers may be configured. To demonstrate this, the Server response header now has the value dCache/<version> where <version> is the installed dCache version.

Changelog from 2.12.0 to 2.13.0

6ffebac
[maven-release-plugin] prepare release 2.13.0
7072d29
nfs: include read/write information to transfers
c992a89
Revert “pool: reduce load on back-end file system”
8153214
nfs: add missing CR into getInfo()
90349c4
libs: update to voms-api–3.0.5
4965b76
pool: repository based unit test
0254056
pool: fix broken commit d1c977d
d1c977d
pool: reduce load on back-end file system
fc73901
dcache bash_completion: add property completion
a539214
webdav: add support for custom headers
d55d876
replica manager: demote warn to debug on truncation failure
8ee7fa5
libs: Update to voms-api-java 2.0.10.1
b117727
info: remove (almost) all hard-coded cell names
99dd7eb
spacemanager: better handling of implicit reservation failures
d900319
srm: Add cache for srmGetSpaceTokens
102557d
nfs: show mover info when proxy io is displayed
4f0b204
nfs-proxy: update proxy-io adapter interface to expose open-stateid
c15a133
srm: Fix cache invalidation of space meta data
8aa1126
chimera: Optimize deletion of directories without tags
259ff82
libs: use bouncycastle–1.46
ed08a85
libs: update jglobus to 2.0.6.9d
c817b78
chimera: Fix single command invocation of chimera utility
0b80ed8
pnfsmanager: Resolve upload directory leak caused by missing reply flag
65b3ca8
nfs: door should have a thread pool for message processing
8408007
replica manager: fix table truncation
247c14d
pool: report IO error if we cant find NFS mover
4de6fd1
pool: dedicated port range for nfs
77fa00b
common: Sort counters and gauges
4b3a7e4
pnfsmanager: Inject wrapped FileSystemProvider into ExtendedInode
668eb76
common: Increase precision of gauges
94f2199
pnfsmanager: Wrap Chimera provider with counters and gauges
e1d0bae
pnfsmanager: Minor refactoring of name space provider
9dbb81a
common: Make gauge robust against arithemtic overflow
7c56155
common: Extend execution time gauge proxy to asynchronous results
eec2fbc
common: Fix division by zero regression in gauges
4e65ce2
common: Fix rounding error in request gauge
5d24d9b
Revert “common: Extend execution time gauge proxy to asynchronous results”
1fab04c
Revert “common: Fix rounding error in request gauge”
268afa5
libs: Upgrade Hikari
e3ba1b6
pnfsmanager: Discard upload path creation request on TTL expiration
c3ebccd
common: Extend execution time gauge proxy to asynchronous results
a2fe675
common: Fix rounding error in request gauge
387e7ed
libs: Role back maven resource plugin
c352880
nfs-proxy: introduce an exponential backoff when connecting to pools
0b4a327
nfs-proxy: close RPC client if we fail to connect
2bab7d1
dir: Auto instantiate dir service with dcap doors
1fc7db1
libs: Update third party libraries
ce17c38
libs: Final round of Maven plugin updates
f5666c0
pnfsmanager: Drop cache location provider
e1a6b33
pnfsmanager: Use per-thread upload directory to reduce lock contention
4b9313f
nfs: add new commands to admin interface
cb13d35
replicamanager: update admin command help
8985286
srm: Fix erroneous message queue overflow issues
3cbc590
srm: Further optimize SRM scheduler
041fb50
srm: Abort pinning when cancelling bring-online requests
bd8dd02
system-test: Enable MVCC and logging for HSQLDB
0f87591
Downgrade logback.
fcb66c7
srm: Optimize scheduler performance
81786cd
srm-client: Respect server estimates for bring-online
53829a4
srm-client: Fix ArrayIndexOutOfBounds failure
61c2b65
srm: Fix sync to async mode timeout
7451599
pinmanager: Fix listing by id
a2aab5f
pool: Add HSM options to hsm script remove callout
d280311
srm: Fix queue size reporting
1aa2b29
pool: Fix timeout behavior of HSM requests
19e3d75
chimera: Add changeset to clean leaked tag inodes
932125b
xrootd: Use asynchronous replies for kXR_open
4d933ea
common-cli: understand hex/octal notation of numeric arguments
9019d9b
cells: update log message to describe why enqueing a message failed.
3c4a520
Fix database properties
bd403b9
spacemanager: Fix typo in db.host property
2215d33
info-provider: add support for publishing only tagged doors
5a2a367
Unify database host and user settings
6052cd5
srm: log ClientInfo HTTP request header
d2249a7
doors: Avoid locking login broker publisher while resolving addresses
ff97a00
webdav,srm: Decouple credential service from login broker
0ec700d
spacemanager: Update help text of admin commands
a4ce639
spacemanager: Allow spaces to become unowned
c8fa056
dcache: Replace loginbroker service by publish-subscribe topic
c890f74
cells: Various performance related refactoring
9146a74
cells: Make hash code of CellPath consistent with equals
856e26e
cells: Take additivity into account in logback cells filter
e8e34f2
cells: Refactor logback turbo filter thresholds
33c0f13
cli: remove stray separator in predefined alarms printout
13fb54e
libs: update nfs4j to 0.10.4
b97a41f
srm: Drop Terracotta support
f7033ea
cells: Do not export and subscribe until started
1972590
cells: Add CellAddressCore constructor to CellMessage
a705340
dcache: Delete unused MessageEventTimer
fa41f2c
cleaner: Add missing annotation
251a4ec
srm: Remove duplicate SURLs in bringonline and get requests
bc7da55
chimera: Provide SIZE attribute if a file got any locations
e4d92a5
chimera: Add cause to exceptions
78f447b
dcap: fix stack-trace when shutting down pool waiting for connection
fdae5e5
spacemanager: allow reservation authorisation based on gid
e3c8db6
system-test: add missing dCache disposible CA certificate
8c80b62
all: Exclude no route to cell logging for topics
0ae1bf0
all: Refactor ‘no-route-to-cell’ logging
f65215b
system-test: add regenerated host and user credentials
989135a
ChecksumChannelTest: refactor and active ignored tests
768880b
acl: fix compatibility with linux ace
ecf2fff
nfs: make use of chained exceptions
c95fe70
chimera: Delete unreferences tag inodes
cbf9aa3
libs: update to nfs4j–0.10.3
2cb3c2b
replicamanager: add an admin command to re-fetch resilient pool group
733024b
admin: Restore pcells compatibility with loginrbroker
f15818c
all: Move more vehicles to dcache-vehicles
10ae6b0
admin: Avoid endless exception loop upon client disconnect
ae4d0e6
replicamanager: do not fetch list of all pools
f4de2d8
chimera-provider: allow to set acl even if ACL is not enabled.
ef5f733
acl: do not require parent dir attributes on can{GET|SET}Attributes
1f8f6b6
pool: update scrubber messages to be less ambigous
69e6298
dcap: fileAttributesNotAvailable must set pass ENOENT to the client
c00015d
ChecksumChannel: fix int overrun bug on filling up gaps > Integer.MAX_VALUE
e7be891
doors: Make Transfer#selectPoolAndStartMover asynchronous
6afce77
srm: Minor refactoring of job loading
6ffa6be
replica: Fix regression in file deletion
a766de3
all: Fix several NPEs when submitting billing messages
54d532c
nfs-proxy: do not block forever on proxy requests
ba9d6f1
srmclient: consider copyjobfile when deciding whether to delegate
295e150
srm: Fix scheduler counter initialization on restart
01b5130
Introduce CacheLocationTopic and drop broadcast service
c077960
replica,poolmanager,broadcast: Use topic to notify replica manager about pool status
bfdc58d
pool,spacemanager,pnfsmanager: Move flush notification to pnfsmanager
a463837
doors: Log real path in billing
ca0b961
replica: Fix HSQLDB compatibility
a203249
srm: Allow the SRM to take action upon cleaned requests
f9e32e5
srm: Fix loading of jobs during restart
a25248f
srm: Force save jobs when adding information needed for cancellation
de73907
srm: Force save when job becomes RQUEUED
e24a738
cells: Introduce publish-subscribe messaging
42fb4fe
all: Update logback to 1.1.3
49a04bd
cells: Clean up no route to cell errors
49bbe3b
cells: More protection against cyclic routing
f93a60c
cells: Drop routes and event listeners on cell shutdown
19487b8
cells: Advertise and install routes to downstream domains
e4f6267
cells: Refactor cloning of CellMessage and CellPath
dc7c283
cells: Allow cyclic default routes
ebeddf3
cleaner: Allow list of cells to notify on delete to be empty
5ce5dbe
pinmanager: Make use of new delete notification
f88ecf0
chimera: update trigger for ACL population
86e7494
pools: fix parsing error in HsmRunSystem
a59ebeb
acl: accept AUDIT and ALARM ace types
bdbcd2d
httpd: Fix password based login
c36fb7a
alarms: Refactor alarms cell’s send command
f6e0c58
alarms: Rewrite submission client
7271794
javatunnel: use connected socket to discover local inet address
6582055
dcache: Remove dcache import commands
079e3a1
httpd: Use JGlobus for certificate handling
62fbd45
srm: Fix inverted use of certificate refresh periods
759f779
all: Minor upgrades of various third party libraries
12aaacb
cleaner: Java 8 and related refactoring
ce569d0
cleaner,pnfsmanager,replica,spacemanager: Rework delete notification
9da1035
all: Downgrade maven resources plugin due to maven incompatibility
a788510
all: Upgrade maven plugins
19c1d69
spacemanager: Set default of dcache.enable.space-reservation to true
1290c6d
admin: Avoid line breaks when not connection from a TTY
085c335
admin: Avoid division by zero when the client reports a zero sized terminal
1ee6afe
admin: Fix connection shutdown
230a57f
spacemanager: Fix race condition leading to leaked files
4a3fca7
spacemanager: Optimize space record deletion
37be23f
cells: Clean up routing table printing
050a1da
cells: Fix forwarding of GetAllDomainsRequest
e233696
cells: Reorder default route and tunnel connect operations
943b6b1
Get rid of more CVS/SVN revision references
a2c1bf5
doors: Clean up mover timeout interpretation
b052229
dcache: Make cell communication use the correct timeout
214c231
cells: make ping command blocking and measure elapsed time
f39c212
cells: Allow route duplication by key
f0801a4
dcache: Rename and extends PnfsHandler request methods
f80f15e
httpd,admin: Fix NPE in transfer collectors
33e81db
doors: Fix race condition that causes NPE in webadmin
0c58280
info/info-provider: publish valid IPv6 addresses
df0770a
info: rewrite StateMaintainer unit tests
942ab09
loginbroker: strip off zone off published interface name
87631f4
srm: log the broken TURL to help diagnose the problem
1047a23
Restore compile time compatibility with pcells
0242c4f
nfs: share mover for the same client
42d0c59
nfs: use NFSv4MoverHandler instead of Map in embedded NFS server
7f698cf
httpd: Fix ever increasing source address path
fd0b93b
admin: Various minor help related improvements
f3e0fab
cells: Fix NPE when NoRouteToCellException bounces
ff236c8
Update HsmSet.java
698d964
webadmin: remove empty-string destinations in cell stub injection
ac25162
cells: Drop digest implementations
ca4958b
cells: Drops cipher implementations
00f21ef
cells: Remove unused network classes
5095c86
cells: Drop unused and unfinished code versioning support
822e479
cells: Drop unused utility classes
796bf12
cells: Drop CollectionFactory
ade7c66
cells: Drop custom producer/consumer buffer pool
2c46dff
cells: Drop custom database implementations
af89b03
cells: Drop ssh1 implementation
1133c27
dcache: Drop old scripts and configuration files
39060a8
cells: Remove more old GUI code
349cd36
cells: Remove legacy GUI
3314757
acl: Move vehicles to acl-vehicles module
5d279c9
cells: Move message encoding/decoding to CellGlue
abc6e3b
cells: Fix event queue counting bug
94dade7
cells: Only use RoutedMessageEvents when sending to routers
5c0e21e
cells: Move message encoding to CellNucleus
342118e
cells: Drop old tunnel implementations
042421a
cells: Refactor routing loop
68be9b3
spacemanager: Do not block space manager when manually updating link groups
71d6bee
cells: Avoid slash in cell names
2f6aba2
pool-selection-unit: refactor commands to delegate to a provider
23078cf
pinmanager: Don’t log stack trace when unable to fetch pool monitor
1b3e3af
cells: Refactor stripping of local domain address
f5da9d4
cells: Minor refactoring of CellPath
05e9bd9
cells: Detect end of destination path
143fa59
cells: Fix route loop detection
249bf06
cells: Simplify routing code
831a5ae
cells: Add protection against bouncing delivery failures
0748bfa
all: Simplify handling of cell routing failures
1b7237c
cells: Automatic code reformatting of CellGlue
905795d
pinmanager: Split out different cases for why a file cannot be pinned
4922f33
pool: Fix meta data reconstruction
dd39a95
nfs: remove redundant registration at rpcbind
bedd485
admin: Fix command completion
829aa73
cells: Restore CellExceptionMessage encoding
2f2dcab
cells: Enable concurrent processing in routing table
b447d98
cells: Automatic code formatting for CellRoutingTable
6f084c2
cells: Remove testing code from routing table
ebf9cf0
cells: Clean up CellRoute class
b675b65
cells: Prevent removal of wrong route
01e48e1
info: rationalise logging
a040502
Get rid of CVS revisions
bdc8253
pool: Mark pool dead if repository state update fails
0c79bc8
pool: Stop flushing when pool is dead
d5da412
admin: Make pool manager and pnfs manager addresses configurable for pcells
17a0181
admin: Do not hard code space manager and login broker cell addresses
4b2e5b1
httpd: Enable multiple instances of httpd service
43d5ecb
admin: Isolate pcells from httpd
979b4a1
httpd: Refactor transfer collection
5749145
httpd: Fix pcells compatibility
9bd3bf8
pool: Add -atime option to migration module
a36fb73
info: rationalise braces in class and method declaration
b86e89c
Revert “Bugfix: Only free space if the removed file was the last link”
fb1563e
info: whitespace only changes
07c68a0
Bugfix: Only free space if the removed file was the last link
6a36e41
infoprovider: add GLUE2EntityCreationTime
54a81ea
access log: better fix for logging secondary gids
5b3c25d
rpm: incude timestamp into snapshot builds
b4aeea8
admin: Propagate NoRouteToCellException to pcells
59adbc0
srm: Remove easily avoidable singletons
6be7339
pool: Fix spelling and punctuation errors
eaaca36
pool: Fix file deletion
52880d8
srmclient: add support for userRequestDescription
3ad0147
srm: log more SRM bugs
a2f0b17
srm: suppress the Unable to find required classes message on startup
7ec4fde
admin: Restore pcells compatibility
eb41f79
Revert “pool: Make EntryState#DESTROYED deprecated”
6db9956
admin: Restore timeout semantics for pcells compatibility
80ddd56
loginbroker: Restore compatibility with pcells
e6feacf
spacemanager: Move messages to dcache-vehicles
1c2beb7
cleaner: Fix class cast exception
da69dda
srm: Use more sensible default values
695c433
srm: Introduce max pool period property
8a8b3dd
srm: Fix ConcurrentModificationException in Axis
48c88b9
pool: Fix repository statistiscs collection
16262b6
cells: remove obsolete code
255a5b7
core: use java.util.Base64
51ff104
pool: Fix pool entry reconstruction
f11b13a
nfs-proxy: modify ProxyIOAdapter interface to provide forEach method
bd49037
nfs: cleanup CDC related mess
54e33d1
pool: small java8 cleanup in IO scheduler
451c567
pool: Fix and align pool meta data recovery with current pnfs manager
8f7e08c
Fix serialization of cache class in FileAttributes
a0275b4
pool: Use Netty buffer allocator for http and xrootd movers
ab2d895
admin: Reset completer when connecting to another cell
b0a733d
spacemanager: Rename to SpaceManager
0f94277
pool: Refactor http mover to simplify error and keep-alive handling
9fbffb7
pool: Follow Netty best practice and use context rather then channel
346724e
pool: Use Java 8 date API for sweeper
438afde
spacemanager: Add missing nullable annotation to shut up IntelliJ
d4c120b
spacemanager: Auto reformat code
bbd03ef
spacemanager: Return discarded messages
ddefa01
dcache: Undecorate runnables in CDCExecutorServiceDecorator
6b964ab
access log: fix logging of secondary gids
f1f765d
chimera: do not maintain time-based cached value of FsStat
78a06ec
pool: ignore duplicated mover kill requests
9602858
pool: Make sweeper implement true LRU
b4cdd8c
spacemanager: Making logging and handling of transient errors more robust
b896fd5
pool: Fix read corruption in HTTP mover
0a10474
nfs: Pop elements of the NDC when leaving it’s scope.
7bdb7c5
pool: Fix HTTP mover for multiple requests
0452769
libs: update to nfs4j–0.10.2
62bdfa0
pool, xrootd: Upgrade to Netty 4.0.26
e0dc3ae
pool: Fix NPE in xrootd mover
55bd039
LocationManager: Use fqdn instead of hostname
8402006
acl: fix test case introduced in f626125
f626125
acl: introduce parser for linux nfs4_acl tool formated strings
5e56f38
spacemanager: Controlled shutdown
dd9a0b3
spacemanager: Make request processing more robust
2e1ab5e
spacemanager: Reduce log level on various transient DB errors
78dda92
admin: Create history file if it doesn’t exist
70edb7f
info-provider: remove hard-coded service names
c26885b
libs: update to nfs4j–0.10.1
12c8143
chimera: fix updating of directory tags
4217f3f
spacemanager: Minor simplification to link group updates
dd4be9c
spacemanager: Don’t log stack-trace on AL/RP/Reservation conflict
8c0cd4e
spacemanager: Lower default for number of threads
c44802c
admin: Automatic reformatting of code
493a22e
admin: Drop legacy commands
8c6c664
admin: Fix help of the modify poolmode command
4ccbd0f
pnfsmanager: Extend flags command
464481c
admin: Introduce a dedicated ssh command for the non-terminal case
02446ee
admin: Introduce \timeout command
6c710c6
admin: Introduce \exception command
b60af0b
admin: Add \sl to send command to pools containing a file
b001e25
admin: Remove hard coded service names
d2c00b1
admin: Add completion for \l command
1608af6
common: Add space and dash back to escapeable args characters
855b7c1
admin: Add tab completion for \s command
325b486
admin: Refactor tab completion
b3e01f5
admin: Don’t indent single destination replies for \s
46a7e29
admin: Let \s print partial result on interrupt
ad37b9f
admin: Execute commands concurrently with \s
47c2ed8
admin: Add tab completion for \h and \?
cdd008d
admin: Add local cells to for \c tab completion
662cece
admin: Add completion for \c command
d0bff4a
admin: Add partial tab completion for shell commands
f607cda
admin: Add glob support to \s command
4ea5af1
admin: Resolve domain name when connecting to cell
64c7c80
admin: Add \l command
a5dae4e
admin: Add \s command
7317920
admin: Make \? and \h use Ansi formatting if supported
6ff9ccf
admin: Update the welcome message
5d250c0
admin: Drop cd, su, logoff, help, ..
fda13c7
admin: Add \q command to quit
16d0321
admin: Add \sn and \sp commands
02ae6f9
admin: Add \? and \h commands
49ec2da
admin: Add \c command to connect to cell
2d0c188
admin: Fork legacy shell from regular shell
99e1c10
dcache-jms: Drop JMS support
f2ad4d6
pool: Update xrootd properties to reflect changes to Netty 4
44904a4
pnfsmanager: Fix help of storageinfoof and metadataof
5b714f5
ssh2: do not create and extra list when search for public key
60dd094
ssh2: close file stream when reading authorized_keys file
6a7006c
javatunnel: fix array size to decode
f096b76
nfs4: log abandoned movers with WARN
6fc4132
admin: Fix compilation error
351096a
admin: Detect allocation of pseudo TTY
7703d69
nfs-proxy: modify ProxyIOAdapter interface
79fdcba
releases: update dCache version to v2.13
27a3bd6
align remaining default DB settings
ca87674
change default config for a split DB setup