Highlights

File transfers using GridFTP can now reuse TCP connections, which makes them faster and improves compatibility with the Globus transfer service.

This release of dCache introduces a new approach to external messaging. With the introduction of Apache-Kafka as a message transport, for now in Billing, we start switching to a more modern and efficient messaging system. The goal for the Billing service is easier inclusion with systems like Elasticsearch: Instead of adding messages to a textual log and having an external component parse that log, a Kafka-aware ingester can import them directly into the remote system.

Various administrative tasks are now possible for users with the ‘admin’ role using frontend.

The frontend REST API is documented with Swagger, allowing the auto-generation of clients.

The frontend also now comes with a built-in web-browser based Swagger client.

The configuration of several components has been simplified. For example, creating LDAP-only setups for gplazma now only requires the configuration of a single plugin, ldap.

Incompatibilities

  • Starting from version 4.1 dCache head nodes are incompatible with pools older that 3.0.
  • NFSv4.1/pNFS is not compatible with older linux kernels when used with flexfile layout. As a result, by updating to dCache–4.1 a new export options can be used to enforce layout driver. If nothing is specified, then nfsv4_1_files only is offered to the clients. For linux clients with kernel versions starting from 4.14 as well as RHEL–7.5 is flexfile layout is recommended. / moder_client_host(rw,lt=nfsv4_1_files:flex_files) The nfs.pnfs.layouts property is obsolete in favour of lt= export option.

Acknowledgements

Once again, we have received important code contributions from our users.

Thanks to Christoph Anton Mitterer for his constributions.

We would like to thank the students of HTW Berlin Alexandra Konovalova, Hagen Wittlich and Marcus Kopp, for their contributions during the HTW course.

Release 4.1.43

pool

Attempting to start a full checksum scan (with csm check *) while an existing scan is still running is no longer reported as a bug.

Changelog 4.1.42..4.1.43

f1020142ba
[maven-release-plugin] prepare release 4.1.43
6a1ea2f940
pool: avoid IllegalStateException in ‘csm check *’ command
d39b791553
[maven-release-plugin] prepare for next development iteration

Release 4.1.42

alarms

To ease troubleshooting, the POOL_DEAD alarm message now includes the pool name.

pinmanager

A bug was fixed where PinManager’s bulk ls admin command yielded a NullPointerException if the optional argument was omitted.

A typo prevented the error message “Remote connection failure while unpinning…” from appearing completely and correctly in the logs. The error message string now contains the message string of the underlying Exception, hopefully providing helpful details for troubleshooting.

pool

A regression that prevented a replica’s last access time from being updated was fixed.

A regression that prevented a replica’s position in the LRU queue for garbage collection from being updated was fixed.

webdav

Users asserting the “admin” role would occasionally receive NullPointerExceptions when trying to transfer files through WebDAV. This release fixes that issue.

Changelog 4.1.41..4.1.42

c3dd241ee0
[maven-release-plugin] prepare release 4.1.42
f0492bed39
UnpinProcessor: fix assumed typo {)
8d47367472
webdav: allow transfers as user with role ‘admin’
1c04c3376d
pinmanager: avoid NPE if no argument given for ‘bulk ls’ command
53d12c8e3b
alarms: add pool name to POOL_DEAD alarm
18eff2bf7b
pool: fix reordering of removable replicas on access
c7309cd722
pool: fix storage of replica last access time
e7fff466cb
[maven-release-plugin] prepare for next development iteration

Release 4.1.41

Changes affecting multiple services

Stage request from unknown locations resulted in NPE in dcap and pinmanager services. this is now fixed and using dccp to stage a file should work even if the location is unknown. ‘–’

resilience

The current release fixed race condition on replica state and no inaccessible file errors occures for a newly written file.

Changelog 4.1.40..4.1.41

50c89ff
[maven-release-plugin] prepare release 4.1.41
b1bd482
dcap/pinmanager: stage request for unknown location results in NPE
cfe24ef
dcache-resilience (stable branches): fix race condition on replica state
f948fb1
[maven-release-plugin] prepare for next development iteration
3803f65
resilience: adjust synchronization of file operation removal from map

Release 4.1.40

pool

An unhelpful error message “Parameter directory is not a directory” is replaced with one that provides information on which directory is missing.

Update error messages (previously “Could not create mover”) to provide more information about why the mover could not be created.

The pool no longer logs configuration or deployment problems that prevent the pool from creating a mover as if that problem was a bug.

The current release fixed certain error cases, where a pool is unable to create a mover are no longer logged as a bug in dCache.

transfermanager

The current release fixed a NPE if transfer was cancelled.

Changelog 4.1.39..4.1.40

a81d695
[maven-release-plugin] prepare release 4.1.40
1907425
pool: avoid throwing a RuntimeException for non-bugs
8bfd05c
pool: avoid log-and-throw anti-pattern
7661f70
transfermanager: avoid NPE on shutdown
354f955
pool: throw exception with meaningful error message
e6ac348
pool: update error messages to make them distinct
668b305
[maven-release-plugin] prepare for next development iteration

Release 4.1.39

pool

The current release improved time formatting for Json mover info.

Changelog 4.1.38..4.1.39

66c2ab3
[maven-release-plugin] prepare release 4.1.39
ff956d5
pools: JSON mover info timeInSeconds should be timeInMilliseconds
32b21a5
[maven-release-plugin] prepare for next development iteration

Release 4.1.38

dcache

Jetty version is updated to 9.4.12.v20180830.

resilience

It is now possible to record resilience activity (on the receiving end), which may prove useful in understanding behaviour.

In rare circumstances dark removes can result in data loss by removing of all replicas for a given file. The current release fixed the issue.

Pool operations now can successfully be restarted from the command line after they have been shutdown, without restarting resilience.

Changelog 4.1.37..4.1.38

e28f2f2
[maven-release-plugin] prepare release 4.1.38
4ef1910
resilience: update state on pool operations when restarted from admin command
96d6fc7
chimera-shell: fix class cast of extractor in constructor
24db9e6
libs: update jetty version to 9.4.12.v20180830
485ef68
resilience: do simple existence check of replica on pool to avoid dark removes
85a15c7
Fixed method name error (getPool() -> getPoolName()) compilation error
b5d405a
[maven-release-plugin] prepare for next development iteration
5928fb7
resilience: add ability to log resilience activity (incoming)
c302851
resilience: add ability to log resilience activity

Release 4.1.37

webdav

The current release fixed resource name for door root error.

Changelog 4.1.36..4.1.37

f756896
[maven-release-plugin] prepare release 4.1.37
10c5a76
webdav: fix resource name for door root
788f7f5
[maven-release-plugin] prepare for next development iteration

Release 4.1.36

ftp

Now clients can request the checksum value of a file not owned by that user and where dCache does not already know the checksum value.

pool

The current release fixed some logging on the pool where messages were recorded against an arbitrary context (i.e., the bit in square brackets), resulting in misleading information.

Changelog 4.1.35..4.1.36

015c55b
[maven-release-plugin] prepare release 4.1.36
bd1cbc9
pool: fix CDC for repository listener notification
044f0fe
ftp: store calculated checksum using root privileges
e76c9c1
[maven-release-plugin] prepare for next development iteration

Release 4.1.35

webdav

When users request a macaroon via an HTTP POST request targeting a specific path, a caveat is created that restricts the macaroon to that path (requests to / result in a non-limited macaroon).

Commit 99c726e3 resulted in users getting back a non-limited macaroon for every request. This issue was fixed with this release.

Changelog 4.1.34..4.1.35

2aec4f2f34
[maven-release-plugin] prepare release 4.1.35
ebbe61de6a
systemtest: fix OpenSSL DN format change
4562cf2ef1
webdav: fix path-to-caveat for macaroon minting endpoint
fd04861a37
[maven-release-plugin] prepare for next development iteration

Release 4.1.34

alarms

Pool errors involving a fatal repository fault, for instance, can be sent now as an email alarm without having to send all pool disabled alarms.

dcache-view

Troubles when using firefox and/or safari to browse dcache-view were reported. This is now fixed.

gplazma

The JAAS gplazma plugin no longer logs a stacktrace on bad configuration.

pool

The current release improved error messages making them clearer by avoid using the same error message in multiple places.

srm

A stack-traces fixed for concurrent updates in pin-manager and similar (expected) failures.

transfermanager

Now Third-party transfers fail if the client is requesting to copy a file from dCache that has not fully been uploaded.

webdav

Disabling basic authn should not now disable macaroons. The current release fixed webdav.authn.basic and frontend.authn.basic so that setting these configuration properties to false no longer blocks macaroons from being accepted in the HTTP Authorization header.

The current release impoved error messaging for unauthenticated request.

IllegalArgumentException exception is fixed now and attempts by a client to copy a file that has not fully been uploaded results in a clear error response.

Changelog 4.1.33..4.1.34

5204213
[maven-release-plugin] prepare release 4.1.34
a870620
webdav/frontend: disabling basic authn should not disable macaroons
70698bc
srm: do not log a stack-trace on expected Exception errors
710892a
transfermanager: fail third-party copy if the file is still being uploaded
0f41262
webdav: fail COPY early if file is currently being uploaded
429daff
transfermanager: abort transfer if there is a bug
356ff03
gplazma: JAAS plugin logs a stack-trace on misconfiguration
1ecc08c
pool: avoid using the same error message in multiple places
cf2fde5
alarms: add pool dead alarm
bdf0fc9
[maven-release-plugin] prepare for next development iteration
e3c1ffb
pool: don’t update atime on flush
8dd48e4
scripts: fix ‘dcache pool yaml’ command
d54104f
webdav: 401 for unauthenticated requests; message in status line
0c8045e
dcache, frontend: release dcache-view version 1.4.8

Release 4.1.33

dcache-frontend

The current release added documentation concerning restores.

ftp

The current release fixed MLSC command for non-small directories and Globus is now able to list directories with > 100 directories.

Changelog 4.1.32..4.1.33

26e32b0
[maven-release-plugin] prepare release 4.1.33
12ea15a
ftp: fix MLSC command for non-small directories
534595a
dcache-frontend: add documentation concerning restores
90310c1
[maven-release-plugin] prepare for next development iteration

Release 4.1.32

billing

Database connection loss now is reported for billing.

webdav

The current release fixed the problem where all but one requests fail, if multiple concurrent PUT requests have directories in the path that do not already exist.

Changelog 4.1.31..4.1.32

65742b3
[maven-release-plugin] prepare release 4.1.32
b0f6314
dcache: wrap billing data source with AlarmEnabledDataSource
06de91f
common: fix random data generation in TimeseriesHistogram unit test
dccb333
webdav: work-around Milton racy API for creating collections
789d99d
webdav: fix name of root
649d857
[maven-release-plugin] prepare for next development iteration

Release 4.1.31

dcache-view

The following new functionalities have been added for dCache View: Now it is possible to use macaroon for file sharing. Files can be shared by sending the generated link, QR code or macaroon for the files to the person you want to give access to your files. Gravatar request is now make optional and how the images are stored are now more efficient to reduce the number of request made.

gplazma

Since update to newer BC and voms-java-api libraries sites report VOMS certificate validation errors like This is now fixed.

srm

iThe dcache ports command now includes the srm’s TLS/SSL interface.

Changelog 4.1.30..4.1.31

bd315b0
[maven-release-plugin] prepare release 4.1.31
ba68f63
gplazma voms plugin: add trust anchor refresh paramater
b369668
srm: include TLS/SSL port in ‘dcache ports’ command
995f688
dcache/frontend: release dcache-view version 1.4.7
c276c9e
[maven-release-plugin] prepare for next development iteration

Release 4.1.30

Changes affecting multiple services

The current release corrected the properties for access-log.

Changelog 4.1.29..4.1.30

1a071a3
[maven-release-plugin] prepare release 4.1.30
a84db5d
correct the properties for access-log
7ce9e23
[maven-release-plugin] prepare for next development iteration

Release 4.1.29

Changes affecting multiple services

If a client specifies a checksum value with either a WebDAV or FTP upload, a Restriction check by-passed due to missing path warning was logged occasionally. This was fixed now, ensuring that restrictions are always applied.

pool

Space reservations on pools that are connected to tape showed a problem with failing restore requests: If a restore failed, the space that was reserved to hold the file that was supposed to come in from tape was not freed again but kept in the ‘sticky’ state. This resulted in lots of unusable space on pools that could only be reclaimed through a restart.

With the current release, this issue is fixed and space is freed as soon as possible after a failed restore request.

resilience

A very rare race-condition is fixed where a failed upload results in resilience recording a stack-trace.

webdav

An issue with the Milton WebDAV library prevented Partial (or vector-read) GET requests from succeeding. This was fixed now through both an update of the dependency and a local patch while we wait for the proposed fix to be included upstream.

Changelog 4.1.28..4.1.29

69b1a98226
[maven-release-plugin] prepare release 4.1.29
f285e97109
webdav: fix proxied partial (vector-read) GET requests
239d9c38ea
pool: fix pool space accounting on failed restores
d6f3b6d1e1
resilience: fix NPE if file unlinked when resilience processes a broken file
c399f14ebe
ftp/webdav: fix bypass of restrictions
ed5b2f2167
[maven-release-plugin] prepare for next development iteration

Release 4.1.28

alarms

An internal issue with the alarms configuration was fixed, which should prevent a rare NullPointerException from occuring.

dcap

Creating a file or directory using the DCAP protocol with a URL as parameter, the file permissions were not set correctly.

With the current release, this was corrected, and such files use the client-supplied file permissions. If none are provided, the default modes 0700 (for directories) and 0600 (for files) are used.

xrootd

An uncaught exception in xrootd doors was fixed.

Changelog 4.1.27..4.1.28

f4a8299da1
[maven-release-plugin] prepare release 4.1.28
10cd61f23d
alarms: fix persistence.xml configuration
2f7260b151
dcap: fix permission propagation with DCAP
19a502af5f
dcache-xrootd: handle possible race condition in directory listing
25aefdd9b5
[maven-release-plugin] prepare for next development iteration

Release 4.1.27

statistics

Metadata merge was using max when it should had used min, this is now fixed.

Changelog 4.1.26..4.1.27

1fa662a
[maven-release-plugin] prepare release 4.1.27
4fc7fba
common: fix histogram metadata merge
a66a9e3
[maven-release-plugin] prepare for next development iteration

Release 4.1.26

dcache-view

The current release added several bug fixes and improvement to dCache view.

Changelog 4.1.25..4.1.26

b68e311
[maven-release-plugin] prepare release 4.1.26
51ec9b6
[maven-release-plugin] prepare for next development iteration
113d961
dcache: release dcache-view version 1.4.6

Release 4.1.25

ftp

The current release provides better protection against leaking proxy/data TCP sockets if client aborts a proxied transfer.

pool

The current release fixed stopwatch error and now IO-statistics collecting is more robust, avoiding stack-traces with the message This stopwatch is already stopped.

srm

Clients that use the gridsite protocol, such as davix, can now delegate their credential.

Changelog 4.1.24..4.1.25

1d819ee
[maven-release-plugin] prepare release 4.1.25
ef535ac
ftp: make shutdown more robust
f303edf
pool: fix stopwatch error
72c7bb3
common: fix bug in CountingHistogram index computation
290a335
[maven-release-plugin] prepare for next development iteration
19150ba
srm: gridsite fix querying validity of delegated credential

Release 4.1.24

ftp

The behaviour of FTP transfers was made more robust in cases where a client disconnects from the control channel prematurely.

The performance markers that dCache sends back to the client in FTP transfers are now more robust against bugs.

nfs

When transient errors in pools cause NFS transfers to have to wait and retry, the system’s behaviour is now more robust and no StackOverflowErrors should be logged any more.

scripts

Maven’s findbugs plugin is now granted more working memory in order to make builds, especially on our continuous integration system, more robust.

srm

Certificate lifetime considerations for VOMS proxy certificates are improved in this release: if a client delegates a credential where the VOMS AC expires before the X.509 proxies, dCache now will not use the credential beyond the AC expiry time. This avoids unnecessary authentication errors.

webdav

When the WebDAV door is considering an HTTP third-party-copy request that uses grid-site delegation, there is a minimum 20 minute validity that any existing delegated credential must satisfy. If this is not satisfied then dCache will request a fresh delegated credential.

Until now, if the client failed to delegate a fresh certificate then the subsequent COPY request was rejected. This release changes that behaviour and enables such transfers.

Changelog 4.1.23..4.1.24

62456f1976
[maven-release-plugin] prepare release 4.1.24
e8ffe81e2c
scripts: Avoid findbugs memory errors
4f45c29f8e
nfs: increase request retry delay when selecting/starting pool or mover
9abaac003c
webdav: adjust minimum validity after requesting delegation
4e7a6646c9
srmmanager/webdav: consider VOMS AC validity of delegated credential
648e13e9cd
ftp: make performance marker task robust.
650542c1b6
ftp: avoid NullPointerException if adapter is not connected
4bb1eb2cee
[maven-release-plugin] prepare for next development iteration

Release 4.1.23

pool

Diagnostic logging for failed HTTP third-party transfers was improved.

Billing records for failed transfers now show more detailed information.

The handling of cancelled flush requests for nearline media was rewritten to be more efficient. This resolves issues where pools report “Flush of 0000… failed with: CacheException” followed by “Pool restart required: Internal repository error”.

Compatibility with DPM was improved by increasing HTTP GET requests’ timeouts. This should allow more transfers to succeed.

poolmanager

Supplying poolmanager with an unresolvable hostname as the target will now result in an UnknownHostException instead of the previous behaviour where an (unnecessary) NullPointerException was thrown.

scripts

The format and content of the Storage Description JSON file have been updated according to WLCG suggestions:

  • ‘capacity_id’ field is renamed to ‘name’
  • ‘total_space’ and ‘used_space’ renamed to ‘totalsize’ and ‘usedsize’ respectively.
  • ‘timestamp’ field added
  • ‘vos’ field added
  • ‘assignedendpoints’ added. Currently hardcoded to “all”.

srm

Logging of errors in the SRM credential store was improved.

webdav

If a non-resolvable host name is given as the source or destination of a third-party copy request, WebDAV will now fail the transfer immediately instead of waiting for a Poolmanager timeout.

Diagnostic logging for failed HTTP third-party transfers was improved.

xrootd

dCache allows xrootd clients to specify a query/opaque string in a kXR_mv request’s source path.

Changelog 4.1.22..4.1.23

c5596cbe10
[maven-release-plugin] prepare release 4.1.23
412078eaee
pool: HTTP TPC rework exception logging
932b9e838c
pool: increase TPC socket timeout for GET requests
0df6f3862e
srm: fix credential store logging
acc5fee6c4
pool: update log status using exception class name if no message
45067402ee
storagedescriptor: update information based on WLCG feedback
0459c95bea
xrootd: strip off query part from kXR_mv source
4b91884fee
webdav: fail TPC request early on unknown hostname
a5dac05b31
nearline-provider: do not propagate thread interrupt flag
0a5ff9e2fd
poolmanager: fix NPE on unknown host
267bbc9938
webdav: improve logging of TPC requests
cb4b1bde42
[maven-release-plugin] prepare for next development iteration

Release 4.1.22

Changes affecting multiple services

In order to more easily identify a rejected macaroon in the logs, its ID is now included in the log message.

An irrelevant stacktrace was logged on unexpected CacheExceptions. This was removed, leading to less clutter in the logs.

Different macaroons that were issued against the same secret are now discernible in the logs.

Users now get more information about the reasons why an invalid macaroon was rejected: HTTP requests that are made with an invalid macaroon have a 401 HTTP response with the status-line explanation phrase that describes why the macaroon is invalid.

The access log file also logs why a macaroon was rejected.

core

A library dependency was updated to avoid CVE–2018–11771. This patch introduces no user-visible changes.

frontend

Github issue #4242 was resolved; cell information can now be be gathered using the REST interface without specifying domains.

gplazma

Invalid macaroon logins no longer “spam” gPlazma.

pnfsmanager

When creating a macaroon to allow uploading of data, the desired path may not already exist. Without restrictions, WebDAV will auto-create parent directory items that are missing, or the client can create these directory elements explicitly with MKCOL.

With restrictions (such as from a macaroon) such directory creation currently requires the MANAGE activity, which allows other actions beyond the scope of this scenario. With this release, the behaviour was changed so that a user with a macaroon that authorises them to upload data into a particular directory will be able to create parent directories to achieve uploading the data.

pool

A regression caused pools that had their size only specified in a layout file to report a size of 8 Exabytes. This issue was fixed.

dCache now supports a DPM-specific HTTP extension that indicates the checksum calculation is not yet complete, avoiding potential data corruption with third-party copies: If DPM is calculating a checksum, then any RFC 3230 (i.e., with a ‘Want-Digest’ header) GET or HEAD request returns ‘202 Accepted’ respond status line and an HTML page as the response entity. Since dCache considers any 2xx response as success, the HTML page was previously accepted as the file’s contents, resulting in data corruption.

dCache pools no longer log a stack-trace for non-bug P2P failures.

srm

The domain ‘.access’ log file now contains log information for grid-site delegation activity, which facilitates debugging of http third-party-copying issues.

transfermanagers

The “restriction check by-passed” warning for each WebDAV-initiated third-party transfer is fixed.

webdav

A user may request a macaroon by making an HTTP POST request to the WebDAV door. This log entry was augmented by the ID and type of macaroon used.

A previous patch needed a bit of an update to ensure that X.509-with-FQAN authenticated third-party transfers with macaroons work under all circumstances. This is now ensured.

xrootd

The --zip option of xrootd clients is now supported.

Changelog 4.1.21..4.1.22

0f2fc8770e
[maven-release-plugin] prepare release 4.1.22
fe89b26094
xrootd: add support for kXR_stat on open files
430bf8a4b0
pool: P2P failures trigger stack-trace
4781474673
webdav: obtain FQAN from X.509 credential for gridsite
f7e445ba77
core: avoid sending bad macaroons to gplazma
aa0ba138ad
webdav: update access log to record macaroon request details
27347cf925
transfermanager: fix missing path
aaa65d8ec9
libs: update to commons-compress–1.18
8e5163de42
macaroons: include macaroon id in error message
fc8cf467bf
pool: fix pool’s runtime configured size regression (b70b0d9)
a641fa61cc
core: provide better feedback and logging if a macaroon is rejected
aeb4cee6ce
dcache-frontend: fix array out of bounds exception in cell info service
edc17a507f
pool: update HTTP TPC to support retrying GET and HEAD requests for DPM
4aee9f201b
srm: add gridsite delegation interface access-log
d1c13e24ea
macaroons: fix logged id
ea79980dc0
core: avoid stacktrace on arbitrary CacheException
0c1f3ae879
[maven-release-plugin] prepare for next development iteration
30cd18cdd1
pnfsmanager: allow restricted user with UPLOAD to create parent directories

Release 4.1.21

frontend

When a user does not have the permission to read a file (or is simply not logged in), dCache would previously report a 500 Internal Server Error. This error reporting was improved, reporting 401 Unauthorized or 403 Forbidden as appropriate.

ftp

If the ftp client requests a proxied passive transfer with a different IP family from the control channel (i.e., the client connects using IPv6 and requests an IPv4 data channel, or vice versa) the ftp server must select which IP address it should return to the client.

As pointed out by Francesco Prelz (thanks!), the door currently selects the first address from the same interface that has the desired IP family. However, this may not be accessible by the client.

This release updates address selection so that only usable addresses will be returned to the clients.

info

Clients querying the info service (such as info-provider and storage-report) are now informed of the number of files stored in a space reservation.

The info service now displays the time at which the information it displays was recorded.

poolmanager

This release increases responsiveness for users that are not allowed to stage files, and for NFS users who access offline files. In cases where such a user issued a read request at the same time that Pool Manager handled a staging request, the first request would block for the duration of the staging – potentially quite a while. From now on, users that are not allowed to stage receive appropriate error messages as soon as possible, without having to wait for anyone else.

xrootd

Support for xrootd mkdir was improved.

Changelog 4.1.20..4.1.21

97108af956
[maven-release-plugin] prepare release 4.1.21
0da2eb62e7
xrootd: update to xrootd4j dependency to 3.2.3
c1eafbd213
dcache-frontend: fix error message for IdResource
af3d35159e
ftp: better address selection for cross-family passive proxied transfers
0919b25eab
info/space-manager: monitor number of files in reservation
afe43d0f0c
info: display the timestamps when metrics were collected
4ea99a4997
poolmanager: do not squash request if state is not allowed
664469b785
[maven-release-plugin] prepare for next development iteration

Release 4.1.20

sysytemd

Systemd did not inherite the system-wide limits and was completely ignoring /etc/security/limits.d/92-dcache.conf. This is now fixed and the limits successfully loaded and enabled as expected.

vehicles

The current release has fixed serialization regression in FileCorruptedCacheException.

Changelog 4.1.19..4.1.20

de4acc9
[maven-release-plugin] prepare release 4.1.20
9faf17e
vehicles: fix serialization regression in FileCorruptedCacheException
883e2fd
[maven-release-plugin] prepare for next development iteration
8c67519
systemd: Add /etc/security/limits.d/92-dcache.conf in the dcache systemd unit and generator.

Release 4.1.19

Changes affecting multiple services

This rlease fixes an issue with WebDAV 3rd-party-copy requests that are authorized using a macaroon that is only valid for writing a specific file.

NOTE: both the webdav door and transfermanagers must be updated before the fix is effective.

The timeout used by dCache when attempting to send a Kafka event is now adjustable via the configuration properties dcache.kafka.maximum-block and dcache.kafka.maximum-block.unit.

The default timeout for pools, and the xrootd, nfs and webdav doors is now non-zero. This should fix the problem of kafka events being lost under normal operational conditions.

pool

In order to help with debugging issues with partial FTP transfers, dCache pools now are able to log considerable information about failed FTP transfers.

This is controlled by the new property pool.mover.ftp.enable.log-aborted-transfers.

webdav

dCache can now transfer data with a remote site, authenticating with that remote site using a delegated X.509 credential, but authenticating locally with a macaroon.

xrootd

This release updates xrootd4j, which should help fix occasional “pad block corrupted” issues with older clients.

Changelog 4.1.18..4.1.19

5d7936f906
[maven-release-plugin] prepare release 4.1.19
f78d25b72b
pom.xml: update xrootd4j dependency to 3.2.3
0b1cb2522a
webdav: use TLS credential directly for gridsite
00f7be99b4
pool: instrument ftp mover to show partial transfers
1703a800ed
dcache: add configuration for the Kafka producer timeout
9e8499a872
webdav+transfermanagers: support TPC pull with targeted macaroons
6cae12248b
[maven-release-plugin] prepare for next development iteration

Release 4.1.18

frontend

The current release fixed broken directory QoS reporting and now frontend now more accurately describes the QoS of directories; i.e., the QoS that newly written files will receive when written into this directory, assuming none of the targeted pools are volatile.

webdav

the macaroon creation with multiple path restrictions failed with a http error 500 and the error message. This is now fixed and the macaroon creation succeeds when multiple path restrictions are defined.

The current release improved error handling for PROPFIND request.

Changelog 4.1.17..4.1.18

c905521
[maven-release-plugin] prepare release 4.1.18
377285e
frontend: fix broken directory qos reporting
c10aec8
webdav: avoid throwing any exception when listing a directory for PROPFIND
55d9429
webdav/macaroon: Fix macaroon creation with multiple path restrictions.
c6c3b7b
[maven-release-plugin] prepare for next development iteration

Release 4.1.17

dcache-view

Several fixes has been implemented for dcache-view: openid connect redirect handling is fixed, fixed file download.

ftp

dCache now has the ability to log the current status of a transfer at the point the client decided to abort an FTP transfer. This should support a post mortem investigation on why a transfer was cancelled.

nfs

With the current release the timeout of pnfshandler is configurable and nfs door quicker recovers from situations, when a PnfsManager is not available.

Changelog 4.1.16..4.1.17

4b873db
[maven-release-plugin] prepare release 4.1.17
abf6008
ftp: add ability to log client-aborted transfers
a6b0293
nfs: make timeout of pnfshandler configurable
d1bc061
dcache: release dcache-view version 1.4.5
384ce15
[maven-release-plugin] prepare for next development iteration

Release 4.1.16

NFS

When two clients A and B operate on a file in quick succession, A opening the file and B deleting it before LAYOUTGET is called, dCache puts the transfer into the list of active transfers and returned NFS4ERR_NOENT. If a client tries to optimize the corresponding CLOSE call away, as some do, the entries are never removed from the list, effectively creating a leak.

This problem was fixed. Clients now receive an NFS4ERR_STALE message in those cases.

core

Certain transfer failures, such as attempting to use a space-reservation that has insufficient capacity, resulted in the door eventually reporting a time-out problem to the client.

A typical error message would resemble

Request to [>SpaceManager@local ... ] timed out.

This problem was traced to an internal misconfiguration of a messaging component and is fixed from this release onwards.

frontend

The reporting of a file’s QoS status in frontend was improved. Files that are being scheduled for moving to tape are now reported as ‘tape’ instead of ‘disk’.

pool

In some cases, storage info data was not included in messages issued during a pool flush. This caused an irrelevant NPE to be logged.

This problem was solved, and as a side effect of the fix, billing records now have the correct format when reporting on flushes:

08.24 15:24:41 [pool:dcache-lab002-A@dcache-lab002Domain:store] [00006BD12E8925744156AAE87641D4AF73BB,1362] [/] 100013 2 {10006:"Flush was cancelled."}

vs.

8.24 15:51:07 [pool:dcache-lab002-A@dcache-lab002Domain:store] [00005C1387649DD74E0491DFE9A98D97DC39,1362] [/] data:sla2@osm 100015 1 {10006:"Flush was cancelled."}

A bug was fixed that occasionally caused problems with the pools’ Berkeley DB. This could, for example, be triggered by removing files which were in a flush queue.

A typical error message was, e.g.

27 Aug 2018 12:09:33 (cat2_lhcbtape) [Frontend-dcacheview PoolDataRequest] Fault occurred in repository: Internal repository error. Pool restart required: : CacheExcept
ion(rc=204;msg=Meta data lookup failed and a pool restart is required: (JE 7.3.7) Environment must be closed, caused by: com.sleepycat.je.ThreadInterruptedException: En
vironment invalid because of previous exception: (JE 7.3.7) /space/lhcb/tape/pool/meta java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause
incorrect internal state, unable to continue. Environment is invalid and must be closed.)
27 Aug 2018 12:09:33 (cat2_lhcbtape) [Frontend-dcacheview PoolDataRequest] Pool mode changed to disabled(fetch,store,stage,p2p-client,p2p-server,dead): Pool restart req
uired: Internal repository error

webdav

Web clients (such as web-browsers) make OPTIONS pre-flight requests to discover what they are allowed to do, according to the CORS standard.

Unfortunately, some web-browsers make the OPTIONS request without presenting any credentials. If the resource is within a protected directory then dCache currently fails the OPTIONS request.

This release introduces a new behaviour where such requests will always succeed, so that browser pre-flight requests are not hampered.

Changelog 4.1.15..4.1.16

3117e40e20
[maven-release-plugin] prepare release 4.1.16
51316cf59d
nearline-provides: do not interrupt processing thread on cancel
c903be8c6c
nfs41: invalidate open-state on layoutget if file is removed
610802bae9
pool: fix NPE on flush
09f8da57cd
webdav: always respond to OPTIONS request
3addbcf0cb
core: ensure pool/poolmanager communication receives errors
302edf9436
frontend: add targetQoS for not-yet-flushed tape files
c07f1e2a8c
[maven-release-plugin] prepare for next development iteration
cb86e6f1ad
dcache: release dcache-view version 1.4.3

Release 4.1.15

gplazma

The OidcAuthPlugin plugin was updated so that users whos op does not claim name, and does not claim given_name nor family_name can use dCache.

pool

This release fixed the log stack-trace for queue admin commands and now bad admin input for the following admin commands no longer results in a stack-trace being logged:

  • queue activate
  • queue activate class
  • queue remove class
  • queue suspend class
  • queue resume class
  • queue remove pnfsid

poolmanager

NPE is fixed when staging files back from tape and poolmanager.enable.cache-hit-message is true.

webdav

The current release updated default credential delegation for third-party copy so that now requesting a third-party copy using a macaroon does not trigger a failed attempt to OpenID-Connect delegation.

Changelog 4.1.14..4.1.15

d01d6ec
[maven-release-plugin] prepare release 4.1.15
69d3818
poolmanager: fix NullPointerException when staging files and reporting hits
ccca569
gplazma: oidc fix FullNamePrincipal creation
a3a2904
libs: update jetty to version 9.4.11
0d8ef70
pool: ‘queue’ admin commands not the log stack-trace on bad arguments
fd3e041
webdav: update default credential delegation for third-party copy
50b839d
[maven-release-plugin] prepare for next development iteration

Release 4.1.14

history

This release fixes a bug that could cause startup errors in the history service in the face of network errors.

many

Remote pool monitor would occasionally log stack traces from exceptions
when a domain shut down due to an interrupt. This has been fixed, reducing the number of irrelevant log entries in such situations.

Changelog 4.1.13..4.1.14

e5285aa7f7
[maven-release-plugin] prepare release 4.1.14
d56f0715a1
dcache-history: handle Gson syntax errors explicitly
98c303c423
cells: add handling of RemoteProxyFailureException nested InterruptedException to UncaughtException handler
8a629cf44a
[maven-release-plugin] prepare for next development iteration

Release 4.1.13

nfs

dCache 4.1 now uses nfs4j version 0.16.2, which includes a bugfix that should help avoid some rarely observed deadlocks with current Linux clients.

Changelog 4.1.12..4.1.13

4b67640b3c
[maven-release-plugin] prepare release 4.1.13
71b2c7023f
pom: nfs4j–0.16.2 bugfix version
2664c4a163
[maven-release-plugin] prepare for next development iteration

Release 4.1.12

PNFS

pool

HTTP responses now contain more meaningful messages along with the HTTP response codes, instead of only just showing stock messages like “400 Bad request”.

webdav

When the WebDAV door proxies a transfer and a transfer failure occurs, the door previously always just reported “500 Internal Error”. This reporting is now improved, with any more detailed error messages from (possibly) other services taking precedence. For example, if a pool returns a 400 error code, thus complaining about the client’s request, this code is reported instead, which should help with diagnosing the error’s cause.

Changelog 4.1.11..4.1.12

39ef1bad70
[maven-release-plugin] prepare release 4.1.12
f16cdb0dd9
webdav: pass on status message phrase to client
441f07ddd7
pool: update HTTP mover to report errors as HTTP status message phrase
97d62e96f9
pnfsmanager: fix digest name handling in get file checksum command
7cbfb3ba88
[maven-release-plugin] prepare for next development iteration

Release 4.1.11

resilience

Resilience suffered from a bug that would lead to a NoSuchElementException when a pool name no longer mapped to a location known to the Resilience service. This issue has been fixed.

When multiple pools go offline it is possible that all replicas for a given resilient file become unreadable. If the file is not CUSTODIAL, and thus cannot be restored from tape, the discovery of such a file during scanning will generate an error in the ‘history errors’ listing, in the resilience domain .resilience log, and will also raise a general alarm concerning the pool.

There currently exists a command, ‘inaccessible’, which generates a listing of the pnfsids on a given pool which in the current state of dCache have no readable replicas. However, this command takes a while to complete (asynchronously), and the output is written to a file which must be viewed by logging in.

This release introduces ‘refering pool’ information to the error output so that grepping the resilience log for a given pool becomes easier, and adds options to the command to check further details.

Changelog 4.1.10..4.1.11

20b2619adc
[maven-release-plugin] prepare release 4.1.11
7f752ec2d8
dcache-resilience: improve inaccessible file accounting
d50e97549c
dcache-resilience: skip invalid cancel filters
da6366ad2d
[maven-release-plugin] prepare for next development iteration

Release 4.1.10

scripts

A regression in the dcache pool convert command was fixed; the command works again.

scripts

The instructions that are printed out once dcache pool convert completes successfully now correctly point to the property that needs to be updated, namely pool.plugins.meta.

Changelog 4.1.9..4.1.10

5607cf0272
[maven-release-plugin] prepare release 4.1.10
5544fb58cd
pool: fix ‘dcache pool convert’ command
88e9615ac8
scripts: update reference to configuration property
cff48234a4
pool: fix metadata migration tool to use Path
2278e32aa2
[maven-release-plugin] prepare for next development iteration

Release 4.1.9

frontend

The frontend now correctly handles situations where a transfer that is already completed is killed. Be aware that in order to make use of the bugfix, both the pools and the head nodes need to be updated to at least 4.1.9.

ftp

Since commit eefb964, 3rd-party-transfers using ftp between two dCache endpoints had an issue where connections were not reliably closed. This release fixes the problem.

pool

This release improves dCache’s robustness against network errors: In case registering a file with PNFS manager fails due to a timeout, the request is retried transparently.

Changelog 4.1.8..4.1.9

508e23b42e
[maven-release-plugin] prepare release 4.1.9
1e26192be7
ftp: always close proxied data connection if client closes their half
1281d9b1b6
vehicles: fail-fast on invalid path
2403b788a8
pool: retry request to pnfs manager if timed out
e4d8bf9857
dcache-frontend: invalidate transfer when killed mover not found
588777f26f
[maven-release-plugin] prepare for next development iteration
51d4b79234
dcache-frontend: add “Requires admin role” to alarms methods (Swagger)

Release 4.1.8

dcache-view

Now there is a live update of QoS when changed. The current release added the following fixes: fixed the upload toast to show the current status, fixed regression regarding the processing of click events on vaadin-grid tables and fixed path request in view-file.

door

The current release added support for a door advertising multiple hostnames or IP addresses. dCache doors can now advertise multiple interfaces, including DNS aliases.

webdav

Milton’s OPTIONS handler was returning a 404 error if an OPTIONS request targets an entity that did not exist. This behavior deviated from Apache httpd server and was resulting in failed uploads for dcache-view. The current release fixed.

Changelog 4.1.7..4.1.8

a3c1971
[maven-release-plugin] prepare release 4.1.8
db227cd
gplazma.properties: hint to enable roles
768e3f7
doors: support advertising multiple addresses in LoginBroker
e467725
webdav: do not return 404 for OPTIONS request targeting absent entity
7fc66fa
dcache: release dcache-view version 1.4.2
08eeef4
[maven-release-plugin] prepare for next development iteration

Release 4.1.7

pool

There was a regression resulting in the pool logging somewhat useless information during start-up.

This is now fixed and during the pools start-up, the replica stores locations are logged.

Changelog 4.1.6..4.1.7

7f9212b
[maven-release-plugin] prepare release 4.1.7
89eb811
[maven-release-plugin] prepare for next development iteration
677cc2e
pool: fix logging of replica-store on start-up

Release 4.1.6

dcache

The current release updated dcache view version. New v1.4.1 includes following fixes and improvements: redesign rename component, fix memory leakage in view-file, fix rendering & lazy-loading issue in view-file. change billing records and pool activity to use new provider and modify list-row element internal design.

Changelog 4.1.5..4.1.6

25dd502
[maven-release-plugin] prepare release 4.1.6
888012f
[maven-release-plugin] prepare for next development iteration
bfcaa5c
dcache: release dcache-view version 1.4.1

Release 4.1.5

dcache-resilience

There was a small regression in the way resilience computes the number of operations necessary to adjust copies when a storage unit definition changes.

The current rellease fixed computation of operation count when storage requirements change.

ftp

In order to aid diagnosing problems when FTP response being lost, now dcache logs failures to wrap/encrypt responses.

webdav

The current release improved error handling for client authentication with OpenID-Connect. A more complete set of information is now logged if OIDC delegation fails, supporting the ability to discover why the delegation failed.

Changelog 4.1.4..4.1.5

f5cfa0c
[maven-release-plugin] prepare release 4.1.5
fdf6d26
scripts: add support for parsing ZooKeeper transaction logs
19113dc
ftp: log failures to wrap/encrypt responses
56086ff
dcache-resilience: fix computation of operation count when storage requirements change
d2acb6c
webdav: log errors if OIDC delegation fails
ab951e9
[maven-release-plugin] prepare for next development iteration

Release 4.1.4

ftp

Error reporting in the FTP service has been improved: in some mixed IPv4 / IPv6 scenarios, only unclear error messages were reported.

Changelog 4.1.3..4.1.4

98193cf1b8
[maven-release-plugin] prepare release 4.1.4
62553a82cd
ftp: returned error is too vague for meaningful investigation
715b6966d4
[maven-release-plugin] prepare for next development iteration

Release 4.1.3

frontend

The current release improved REST API swagger documentation that describes which aspects of dCache are affected by related operations.

nfs

Situations, where selection process was incomplete could not be manually recovered When selection process incomplete, due to PoolManager restart, there was no way to trigger a new selection. The current release added two new commands to nfs door: transfer retry and transfer forget.

The first command manually re-activates existing transfer by re-trying selection process. The second one should be used to completely ‘forget’ the stale transfer and let client to trigger a fresh selection process.

Changelog 4.1.2..4.1.3

ee31290
[maven-release-plugin] prepare release 4.1.3
9fca38c
frontend: add swagger Tag descriptions
97a6e1e
nfs: add commands to reactivate stale transfers
07f5f7c
[maven-release-plugin] prepare for next development iteration

Release 4.1.2

ftp

A NullPointerException could occasionally occur in ftp. This issue was fixed in this release.

The ftp door now provides far more detailed proxy information (including ASCII-art diagrams of connection status).

FTP has received extra debug-level logging to ease debugging.

gplazma

gplazma now supports a Fermilab-specific authorization data source in JSON format.

pool

During active ftp transfers, connection problems would lead to the rather unhelpful error message “451 General problem”. This error reporting was refactored, so that diagnosis of the cause is now greatly facilitated.

poolmanager

A potential NullPointerException (that was not observed in real-world usage until now) was fixed in Pool Manager.

resilience

A correction to resilience’s error handling results in no more reports on non-resilient (but corrupted) files.

spacemanager

In order to facilitate debugging, Space Manager now logs link-group related content in greater detail.

Changelog 4.1.1..4.1.2

b2ee4ccbe3
[maven-release-plugin] prepare release 4.1.2
102f38bbe7
poolmanager: fix migration command if named pool is removed
95053574d6
ftp: improve proxy logging to facilitate debugging
c33c4390d4
dcache-resilience: repair over-aggressive handling of broken file messages
11d215e903
ftp: add detailed information about proxy status
1d09af9b53
pool: fix error message for failed active FTP transfers
d7704546c8
fix the project version in pom.xml
2a5f421a08
gplazma-fermi: fix last modified check in junit test
791f03fde5
spacemanager: add remote pool monitor debug logging
029bcc8817
ftp: avoid NPE if connection is closed.
d2ae92d1dc
gplazma-fermi: add mapping plugin to support VO group and username from file
85989a2471
[maven-release-plugin] prepare for next development iteration

Release 4.1.1

Changes affecting multiple services

This version removes the (by now unused) directory /var/lib/cell-info from a default installation. The directory was previously used to store cell info data. With the introduction of the history service in dCache 3.2.1, this became obsolete.

A small bug-fix addresses wrong directory permissions on tar or Debian packages, where the directory /var/lib/dcache/pool-history had the wrong permissions.

The deprecated properties for configuring alarms have now finally been made unavailable.

chimera

An internal update enables chimera to use PostgreSQL 10.

nfs

A modification in IP address handling greatly increases the speed of NFS client access for Linux clients in mixed IPv4/IPv6 environments.

Prior to Linux 4.12, Linux clients with only an IPv4 address would wait for (timeout * retry) seconds when connecting to pools with both a v4 and v6 address. This was fixed upstream in Linux 4.12, but that fix was not backported to e.g. RHEL 7 yet.

Changelog 4.1.0..4.1.1

986bde1cce
[maven-release-plugin] prepare release 4.1.1
96fef30a55
nfs: filter out IPv6 DS addresses if client connected with v4
0d5d837a66
chimera: adjust postgres driver provider to new version schema
684780b296
[maven-release-plugin] prepare for next development iteration
6e4e69c8a8
skel: remove extraneous cell-info dir
cfeb7b1e48
packaging: add missing chown and chmod on pool-history
bc834d029f
skel: make deprecated alarms properties forbidden

Release 4.1.0

Billing

A newly-introduced configuration property dcache.enable.kafka controls whether the Kafka messaging system is used for message delivery.

Two new global options to control publishing via Kafka have been added:

dcache.enable.kafka = true|false
dcache.kafka.bootstrap-servers= host1:port,host2:port

The kafka topic is billing and is currently not configurable. The records are written as JSON objects, like:

{  
    "msgType": "request",  
    "mappedUID": 0,  
    "date": "Fri Mar 02 10:48:41 CET 2018",  
    "cellName": "NFS-dcache-lab007",  
    "VERSION": "1.0",  
    "session": "door:NFS-dcache-lab007@core-dcache-lab007:AAVmauGezqg:1519984121401001",  
    "fileSize": 969,  
    "status": {  
      "msg": "",  
      "code": 0  
    },  
    "storageInfo":"test:public@osm"  
    "cellType": "door",
    "cellDomain": "core-dcache-lab007",
    "mappedGID": 0,  
    "billingPath": "/",  
    "sessionDuration": 9,  
    "queuingTime": 0,  
    "pnfsid": "0000328CE70CBAF24579BF5AD373C05AECC0",  
    "transferPath": "/",  
    "@timestamp": "2018-03-02T09:48:41.412Z",  
    "clientChain": "131.169.240.87",  
    "type": "dcache_billing",  
    "client": "131.169.240.87",  
    "subject": [  
      "UidPrincipal[0]",  
      "GidPrincipal[0,primary]",  
      "GidPrincipal[0]",  
      "Origin[131.169.240.87]"  
    ]  
  }  

Frontend

The 4.0 release saw the introduction of a range of RESTful services which provide an administrative overview of the system. There have been a few minor changes in the intervening releases. The API is now stable with 4.1.

To summarize: these services provide overviews of file data, poolmanager configuration, service state, pool configuration and activity, and system alarms. One can obtain full information on individual files by pnfsid; billing records for reads, writes, p2ps, stores, restores and removes of a given file are also available if your system is configured to use the billing database. With the database also comes histogram data for tracking reads, writes, p2ps, stores and restores over intervals of 24 hours, 1 week, 1 month and 1 year.

Histogram data on pool queues and file lifetime is similarly aggregated over pools and pool groups. This data collection is actually done by the history service (introduced in 3.2), with which the frontend communicates, so it is only available if the former has been deployed.

Finally, information corresponding roughly to the poolmanager.conf file is furnished by the selection service API, while the space management API provides link group and space token information.

With 4.1 also comes integration with Swagger. All RESTful services have been provided with basic annotations in order automatically to generate API documentation. A convenient web interface which allows exploration and testing of the API, describing paths, parameters, error codes and JSON output, now runs at:

https://[host]:3880/api/v1.

Consulting this page is the easiest way to familiarize yourself with the API.

A number of properties have been added to the defaults for the frontend, most of which have to do with communication with other services (timeouts, endpoints, etc.). Consult the frontend.properties file for details.

FTP

GridFTP defines a set of different transfer modes, among them streaming mode (MODE S) and extended block mode (MODE E). MODE E transfers now support keeping the data connection alive in between uploads, resulting in significantly higher throughput and providing better compatibility with the Globus transfer service.

gplazma

The LDAP plugin is updated to natively support auth phase. As a result, there is need for an extra mutator plugin to use ldap based authentication auth optional ldap map sufficient ldap NOTICE, that even it’s not required to use ldaps:// url, it STRONGLY avised to do so!

The configuration of LDAP-only-setups, which are becoming more important, is now much easier. Previously, the plugins jaas, mutator and ldap were necessary for such a setup. Now, the ldap module is sufficient. An example of a new-style gplazma configuration file would be:

 auth     optional   ldap
 map      optional   ldap
 session  optional   ldap
 identity optional   ldap

This functionality relies on using either OpenJDK or Oracle JDK.

NFS

New export option lt= to control layout types is added. The supported values are: - flex_files - nfsv4_1_files While it is recommended to use the flex_files layout type, only modern kernel versions (and RHEL 7.5) are supporting it. If kernel client capabilities are unknown, use nfsv4_1_files, which is the default behaviour.

Pool

Added configuration option to publish billing records into apache-kafka queue:

pool.enable.kafka = true|false
pool.kafka.bootstrap-servers= host1:port,host2:port

Added IO statistics to monitor storage performance. Billing can now log, for read and write operations, the mean bandwidth of each IO request and time that the disk system was handling requests and idling. It is important to note that the dCache configuration that describes the billing log format must be updated before these values are actually recorded.

Resilience

In the various 4.0 releases, resilience was improved in a number ways. We summarize them here.

First, a number of fixes have made it more robust: uncaught exceptions should no longer allow tasks to get stuck in the queue, and the logic concerning available file sources when clear cache location messages has been fixed. In addition, we have fixed the way fatal error alarms are generated so as to avoid spamming the alarm service and domain logs (there is now a special resilience log for recording these errors on a file-by-file basis, configured as usual via properties and logback.xml).

Two features have been added to resilience which enhance its usability. The first involves a redefinition of what pools it considers to be accessible as opposed to usable (writable) for making copies. With this change it is now possible to close off a resilient pool from external writes while still allowing resilience to use it, i.e., by setting

\s resilient-pool-1 pool disable -store

Resilience no long considers such a pool ‘inaccessible’. The rule is now: a pool is readable (by external clients) unless its disabled flags include ‘fetch’, and is writable (by resilience) unless its disabled flags include -p2p-client.

The second new feature is integration of staging into resilience. Files which are CUSTODIAL ONLINE (i.e., resilient but also stored on tape) will now be automatically restored from tape if all their current replicas are not accessible (e.g., because all those pools have gone offline). Since this is achieved via communication with the Pool Manager, we have made sure that a resilient group whose pools are connected to tape (viz. for stores) but for which staging has been disabled will nonetheless receive the first restored replica via p2p from some other eligible staging pool selected by the Pool Manager.

The default sorting order for the admin history and history errors commands has been changed from descending to ascending (more user friendly).

Resilience has now been running for almost a year on a number of systems at DESY and elsewhere with reliable and performant results.

Zookeeper

Updated external zookeeper libraries to version 3.4.11

Changelog from 4.0.0 to 4.1.0

684780b
[maven-release-plugin] prepare for next development iteration
95875a4
[maven-release-plugin] prepare release 4.1.0
bbfc504
pom: use nfs4j–0.16.1
87c2340
dcache-frontend: revise Swagger annotations
2411ac7
dcache: release dcache-view version 1.4.0
30422a8
webdav: support multiple RFC 3230 ‘Want-Digest’ headers
a1bc255
ftp: ensure half-closed connections are fully closed on return
81cc9df
ftp: close pool connection with MODE-E proxy
1f96951
dcache-frontend: avoid null dereferencing for incomplete pool history data
9a24522
dcache: avoid NPE from initialization race in RestoreRequestsReceiver in HttpPooMgrEngineV3
4ae42e8
[maven-release-plugin] prepare branch 4.1
d9e07f3
ftp: complete removal of ancient transaction log
d01b05f
dcache-frontend: fix predicates in spacemanager resource
7679b60
zookeeper: silence warning and log lifecycle state events.
259ec4f
alarms: fix broken path
ad12325
Revert “cells: remove custom implementation of curator’s PathChildrenCache”
af4bfbf
config: unify kafka configuration
edc9822
dcache: Removed FTPTransactionLog class
b477cf4
ftp: Removed uses of FTPTransactionLog class
eb3159a
TransferManagerHandler: Removed FTPTransactionlog
94e2a06
migration: Remove org.dcache.pool.migration.ReverseOrder
69729b5
dcache-resilience: avoid spamming alarms with abort messages
6f0b99c
Revert “dcache-resilience: avoid spamming alarms with abort messages”
7294198
dcache-resilience: avoid spamming alarms with abort messages
f49cd17
ftp: update ProxyAdapter API
3f30968
dcache-history: test git hub test
f529653
dcache-frontend: add RESTful space service (spacemanager info)
b69d55d
fir broken commit 96beb4184c
a411e2e
libs: revert “libs: update to liquibase–3.5.4”
cc0cbb1
cells: remove compatibility with pre 3.0 releases
db2a8b3
pool: fix formating for StorageInfo data
7554be3
nfs: fix wrong imports
75d03b3
dcache-resilience: handle properly RuntimeExceptions from tasks
076edee
libs: update to liquibase–3.5.4
5f97c5a
nfs: add embedded Kafka producer
eefb964
ftp: support keeping data connections between proxied MODE-E uploads.
3f8fe3e
ftp: update JavaDoc for adapter classes
e31531f
gplazma2-ldap: implement GPlazmaAuthenticationPlugin
3564ec6
gplazma2-ldap: re-write ldap plugin in Java
cc7c87f
dcache-resilience: restore proper log level for eviction message
8534173
ftp: ensure adapter is closed
96beb41
system-test: enable only nfs41 file layout
3b71edb
dcache-resilience: integrate support for staging files when no valid replicas accessible
1fa0e4b
dcache: add filter to pool manager to select a read pool from within a specified pool group
e7c186b
idcache:exclude hidden dependency to master
2445198
ftp: remove rare NullPointerException when proxying data
886dbf0
httpd: do not log an exception if info cell not running
cdfbdfc
poolmanager: silence NoRouteToCell for stage queue topic
deb1fac
poolmanager: fix poolmanager startup with certain poolmanager.conf content
b243c5d
dcache-frontend: change all JSON representations of pnfsid to String type
37ff3a3
dcache-resilience: repair faulty commit
aab9309
pool: bug-fixfor commit@74880c32a97…
74880c3
pool: add embedded kafka producer
de97ecb
core: remove getDatabaseId from PnfsId
ba972b3
dcache-resilience: define non-writable pool to mean p2p-client is disabled
ea0318b
dcache: fix remote pool monitor wait bug
e7773a1
cells: wait for zookeeper node before update the content
a352dfc
dcache-resilience: repair handling of broken files*
5b37a84
packaging: check ‘services.sh’ after old rpm removed
2139302
core: remove support for ‘*’ PnfsIds.
413e4ad
core: remove support for ‘domain’ in PNFS-style PnfsIds
fdd0731
utils: do not iterate over certificates twice during validation
1e3e96f
core: tidy up unit-tests
3c30323
dcache-resilience: fix bug in source handling with Clear Cache Location messages
4ff42f7
src: remove unused diskCacheV111.util.UptimeParser
5fec809
remove dead code from PnfsId
90bab98
pool: use interface rather than concrete class
d40d999
dcache-resilience: fix wrong assumption about error type in Message
1d2f7ee
pool: add IO statistics to monitor storage performance
ecc2c61
cells: fix broken commit 0be9a678
0be9a67
cells: add explicit ZooKeeper/Curator monitoring
33d603a
frontend: Map requests with bad JSON to HTTP 400 Bad Request status code
0d8ac41
pool-manager: use java8 streams to get Active or Enabled pools
abc6bbb
pom: update zookeeper to latest stable release 3.4.11
cfd0c9c
pom: update apache curator version to 2.12.0
1274b30
login-broker: replace LoadProvider interface with DoubleSupplier
5c8562e
src: fix broken test after commit 4d564d2534
4d564d2
cells: remove null checks for CuratorFramework
47b7d51
cells: remove historical unused code
e71f1fc
cells: remove custom implementation of curator’s PathChildrenCache
d0aa2c6
pool: use NFS internal reboot verifier for WRITE and COMMIT
fbda58a
dcache-resilience: change default sorting orders to ASC on admin history commands
1ccc2cb
gplazma: update multimap to support primary-agnostic matching
86f6d1b
pool: remove embedded nfs v3 server for felxfile layout type support
4022844
libs: switch to nfs4j–0.16.x
5ad6fff
nfs: return LAYOUTUNAVAILABLE for DOT files
fe50695
frontend: add remaining swagger annotations
bd6a2fa
star: fix printing exception error message
81178fb
storage-descriptor: add new binary to RPM manifest
cf1cf65
storage-descriptor: add support for generating new format
d157b92
star: update documentation to provide better description of script
8a60269
frontend: fix broken commit
16739c5
frontend: add qos swagger annotations
a56f887
frontend: add swagger namespace API annotations
d101fc6
frontend: add global swagger annotations
3b54f8e
frontend: add swagger-ui support
551ae71
frontend: add swagger annotation scanning
40a43c4
frontend: convert tabs to spaces
298f858
pool: refactor IO statistics
4a05834
Revert “zookeeper: remove work-around for ZOOKEEPER–2809”
0cad361
zookeeper: remove work-around for ZOOKEEPER–2809
77d82c0
nfs: fix ServerFault on FileNotFoundHimeraFsException
646da1f
ftp: support non-ASCII filenames in LIST output
0de9090
ftp: update LIST output to be more like ‘ls -l’
1576d5e
scripts: update reference to JDK to avoid mentioning specific java version
efed757
info: allow admin to control whether non-VO / non-FQAN identities are shown
0010216
chimera: fix deadlock in Postgres driver
77c85a1
gplazma: update multimap for generic match and add FQAN support
4598f02
authentication: update multimap to cooperate in generating primary gids
c538064
pools: fix NPE from info command at startup
7887ba9
systemd: stop all dCache service units in Debian prerm
279c72f
chimera: correct previous attempt to fix ‘lost+found’ directory permission
f31dfa1
[maven-release-plugin] prepare for next development iteration
4bb2574
maven: fix dcache.service permissions