What’s new in dCache 4.1
Release notes
Highlights
File transfers using GridFTP can now reuse TCP connections, which makes them faster and improves compatibility with the Globus transfer service.
This release of dCache introduces a new approach to external messaging. With the introduction of Apache-Kafka as a message transport, for now in Billing, we start switching to a more modern and efficient messaging system. The goal for the Billing service is easier inclusion with systems like Elasticsearch: Instead of adding messages to a textual log and having an external component parse that log, a Kafka-aware ingester can import them directly into the remote system.
Various administrative tasks are now possible for users with the ‘admin’ role using frontend.
The frontend REST API is documented with Swagger, allowing the auto-generation of clients.
The frontend also now comes with a built-in web-browser based Swagger client.
The configuration of several components has been simplified. For example, creating LDAP-only setups
for gplazma now only requires the configuration of a single plugin, ldap
.
Incompatibilities
- Starting from version 4.1 dCache head nodes are incompatible with pools older that 3.0.
- NFSv4.1/pNFS is not compatible with older linux kernels when used with flexfile layout.
As a result, by updating to dCache–4.1 a new export options can be used to enforce layout
driver. If nothing is specified, then nfsv4_1_files only is offered to the clients. For
linux clients with kernel versions starting from 4.14 as well as RHEL–7.5 is flexfile layout
is recommended.
/ moder_client_host(rw,lt=nfsv4_1_files:flex_files)
The nfs.pnfs.layouts property is obsolete in favour of lt= export option.
Acknowledgements
Once again, we have received important code contributions from our users.
Thanks to Christoph Anton Mitterer for his constributions.
We would like to thank the students of HTW Berlin Alexandra Konovalova, Hagen Wittlich and Marcus Kopp, for their contributions during the HTW course.
Release 4.1.43
pool
Attempting to start a full checksum scan (with csm check *
) while an
existing scan is still running is no longer reported as a bug.
Changelog 4.1.42..4.1.43
- f1020142ba
- [maven-release-plugin] prepare release 4.1.43
- 6a1ea2f940
- pool: avoid IllegalStateException in ‘csm check *’ command
- d39b791553
- [maven-release-plugin] prepare for next development iteration
Release 4.1.42
alarms
To ease troubleshooting, the POOL_DEAD alarm message now includes the pool name.
pinmanager
A bug was fixed where PinManager’s bulk ls
admin command yielded a
NullPointerException if the optional argument was omitted.
A typo prevented the error message “Remote connection failure while unpinning…” from appearing completely and correctly in the logs. The error message string now contains the message string of the underlying Exception, hopefully providing helpful details for troubleshooting.
pool
A regression that prevented a replica’s last access time from being updated was fixed.
A regression that prevented a replica’s position in the LRU queue for garbage collection from being updated was fixed.
webdav
Users asserting the “admin” role would occasionally receive NullPointerExceptions when trying to transfer files through WebDAV. This release fixes that issue.
Changelog 4.1.41..4.1.42
- c3dd241ee0
- [maven-release-plugin] prepare release 4.1.42
- f0492bed39
- UnpinProcessor: fix assumed typo
{)
- 8d47367472
- webdav: allow transfers as user with role ‘admin’
- 1c04c3376d
- pinmanager: avoid NPE if no argument given for ‘bulk ls’ command
- 53d12c8e3b
- alarms: add pool name to POOL_DEAD alarm
- 18eff2bf7b
- pool: fix reordering of removable replicas on access
- c7309cd722
- pool: fix storage of replica last access time
- e7fff466cb
- [maven-release-plugin] prepare for next development iteration
Release 4.1.41
Changes affecting multiple services
Stage request from unknown locations resulted in NPE in dcap and pinmanager services.
this is now fixed and using dccp
to stage a file should work even if the location is unknown. ‘–’
resilience
The current release fixed race condition on replica state and no inaccessible file errors
occures for a newly written file.
Changelog 4.1.40..4.1.41
- 50c89ff
- [maven-release-plugin] prepare release 4.1.41
- b1bd482
- dcap/pinmanager: stage request for unknown location results in NPE
- cfe24ef
- dcache-resilience (stable branches): fix race condition on replica state
- f948fb1
- [maven-release-plugin] prepare for next development iteration
- 3803f65
- resilience: adjust synchronization of file operation removal from map
Release 4.1.40
pool
An unhelpful error message “Parameter directory
is not a directory” is
replaced with one that provides information on which directory is
missing.
Update error messages (previously “Could not create mover”) to provide more information about why the mover could not be created.
The pool no longer logs configuration or deployment problems that prevent the pool from creating a mover as if that problem was a bug.
The current release fixed certain error cases, where a pool is unable to create a mover are no longer logged as a bug in dCache.
transfermanager
The current release fixed a NPE if transfer was cancelled.
Changelog 4.1.39..4.1.40
- a81d695
- [maven-release-plugin] prepare release 4.1.40
- 1907425
- pool: avoid throwing a RuntimeException for non-bugs
- 8bfd05c
- pool: avoid log-and-throw anti-pattern
- 7661f70
- transfermanager: avoid NPE on shutdown
- 354f955
- pool: throw exception with meaningful error message
- e6ac348
- pool: update error messages to make them distinct
- 668b305
- [maven-release-plugin] prepare for next development iteration
Release 4.1.39
pool
The current release improved time formatting for Json mover info.
Changelog 4.1.38..4.1.39
- 66c2ab3
- [maven-release-plugin] prepare release 4.1.39
- ff956d5
- pools: JSON mover info timeInSeconds should be timeInMilliseconds
- 32b21a5
- [maven-release-plugin] prepare for next development iteration
Release 4.1.38
dcache
Jetty version is updated to 9.4.12.v20180830.
resilience
It is now possible to record resilience activity (on the receiving end), which may prove useful in understanding behaviour.
In rare circumstances dark removes can result in data loss by removing of all replicas for a given file. The current release fixed the issue.
Pool operations now can successfully be restarted from the command line after they have been shutdown, without restarting resilience.
Changelog 4.1.37..4.1.38
- e28f2f2
- [maven-release-plugin] prepare release 4.1.38
- 4ef1910
- resilience: update state on pool operations when restarted from admin command
- 96d6fc7
- chimera-shell: fix class cast of extractor in constructor
- 24db9e6
- libs: update jetty version to 9.4.12.v20180830
- 485ef68
- resilience: do simple existence check of replica on pool to avoid dark removes
- 85a15c7
- Fixed method name error (getPool() -> getPoolName()) compilation error
- b5d405a
- [maven-release-plugin] prepare for next development iteration
- 5928fb7
- resilience: add ability to log resilience activity (incoming)
- c302851
- resilience: add ability to log resilience activity
Release 4.1.37
webdav
The current release fixed resource name for door root error.
Changelog 4.1.36..4.1.37
- f756896
- [maven-release-plugin] prepare release 4.1.37
- 10c5a76
- webdav: fix resource name for door root
- 788f7f5
- [maven-release-plugin] prepare for next development iteration
Release 4.1.36
ftp
Now clients can request the checksum value of a file not owned by that user and where dCache does not already know the checksum value.
pool
The current release fixed some logging on the pool where messages were recorded against an arbitrary context (i.e., the bit in square brackets), resulting in misleading information.
Changelog 4.1.35..4.1.36
- 015c55b
- [maven-release-plugin] prepare release 4.1.36
- bd1cbc9
- pool: fix CDC for repository listener notification
- 044f0fe
- ftp: store calculated checksum using root privileges
- e76c9c1
- [maven-release-plugin] prepare for next development iteration
Release 4.1.35
webdav
When users request a macaroon via an HTTP POST request targeting a specific path, a caveat is created that restricts the macaroon to that path (requests to / result in a non-limited macaroon).
Commit 99c726e3 resulted in users getting back a non-limited macaroon for every request. This issue was fixed with this release.
Changelog 4.1.34..4.1.35
- 2aec4f2f34
- [maven-release-plugin] prepare release 4.1.35
- ebbe61de6a
- systemtest: fix OpenSSL DN format change
- 4562cf2ef1
- webdav: fix path-to-caveat for macaroon minting endpoint
- fd04861a37
- [maven-release-plugin] prepare for next development iteration
Release 4.1.34
alarms
Pool errors involving a fatal repository fault, for instance, can be sent now as an email alarm without having to send all pool disabled alarms.
dcache-view
Troubles when using firefox and/or safari to browse dcache-view were reported. This is now fixed.
gplazma
The JAAS gplazma plugin no longer logs a stacktrace on bad configuration.
pool
The current release improved error messages making them clearer by avoid using the same error message in multiple places.
srm
A stack-traces fixed for concurrent updates in pin-manager and similar (expected) failures.
transfermanager
Now Third-party transfers fail if the client is requesting to copy a file from dCache that has not fully been uploaded.
webdav
Disabling basic authn should not now disable macaroons. The current release fixed webdav.authn.basic
and frontend.authn.basic
so that setting
these configuration properties to false
no longer blocks macaroons from being accepted in the HTTP Authorization header.
The current release impoved error messaging for unauthenticated request.
IllegalArgumentException exception is fixed now and attempts by a client to copy a file that has not fully been uploaded results in a clear error response.
Changelog 4.1.33..4.1.34
- 5204213
- [maven-release-plugin] prepare release 4.1.34
- a870620
- webdav/frontend: disabling basic authn should not disable macaroons
- 70698bc
- srm: do not log a stack-trace on expected Exception errors
- 710892a
- transfermanager: fail third-party copy if the file is still being uploaded
- 0f41262
- webdav: fail COPY early if file is currently being uploaded
- 429daff
- transfermanager: abort transfer if there is a bug
- 356ff03
- gplazma: JAAS plugin logs a stack-trace on misconfiguration
- 1ecc08c
- pool: avoid using the same error message in multiple places
- cf2fde5
- alarms: add pool dead alarm
- bdf0fc9
- [maven-release-plugin] prepare for next development iteration
- e3c1ffb
- pool: don’t update atime on flush
- 8dd48e4
- scripts: fix ‘dcache pool yaml’ command
- d54104f
- webdav: 401 for unauthenticated requests; message in status line
- 0c8045e
- dcache, frontend: release dcache-view version 1.4.8
Release 4.1.33
dcache-frontend
The current release added documentation concerning restores.
ftp
The current release fixed MLSC command for non-small directories and Globus is now able to list directories with > 100 directories.
Changelog 4.1.32..4.1.33
- 26e32b0
- [maven-release-plugin] prepare release 4.1.33
- 12ea15a
- ftp: fix MLSC command for non-small directories
- 534595a
- dcache-frontend: add documentation concerning restores
- 90310c1
- [maven-release-plugin] prepare for next development iteration
Release 4.1.32
billing
Database connection loss now is reported for billing.
webdav
The current release fixed the problem where all but one requests fail, if multiple concurrent PUT requests have directories in the path that do not already exist.
Changelog 4.1.31..4.1.32
- 65742b3
- [maven-release-plugin] prepare release 4.1.32
- b0f6314
- dcache: wrap billing data source with AlarmEnabledDataSource
- 06de91f
- common: fix random data generation in TimeseriesHistogram unit test
- dccb333
- webdav: work-around Milton racy API for creating collections
- 789d99d
- webdav: fix name of root
- 649d857
- [maven-release-plugin] prepare for next development iteration
Release 4.1.31
dcache-view
The following new functionalities have been added for dCache View: Now it is possible to use macaroon for file sharing. Files can be shared by sending the generated link, QR code or macaroon for the files to the person you want to give access to your files. Gravatar request is now make optional and how the images are stored are now more efficient to reduce the number of request made.
gplazma
Since update to newer BC and voms-java-api libraries sites report VOMS certificate validation errors like This is now fixed.
srm
iThe dcache ports
command now includes the srm’s TLS/SSL interface.
Changelog 4.1.30..4.1.31
- bd315b0
- [maven-release-plugin] prepare release 4.1.31
- ba68f63
- gplazma voms plugin: add trust anchor refresh paramater
- b369668
- srm: include TLS/SSL port in ‘dcache ports’ command
- 995f688
- dcache/frontend: release dcache-view version 1.4.7
- c276c9e
- [maven-release-plugin] prepare for next development iteration
Release 4.1.30
Changes affecting multiple services
The current release corrected the properties for access-log.
Changelog 4.1.29..4.1.30
- 1a071a3
- [maven-release-plugin] prepare release 4.1.30
- a84db5d
- correct the properties for access-log
- 7ce9e23
- [maven-release-plugin] prepare for next development iteration
Release 4.1.29
Changes affecting multiple services
If a client specifies a checksum value with either a WebDAV or FTP upload,
a Restriction check by-passed due to missing path
warning was logged occasionally.
This was fixed now, ensuring that restrictions are always applied.
pool
Space reservations on pools that are connected to tape showed a problem with failing restore requests: If a restore failed, the space that was reserved to hold the file that was supposed to come in from tape was not freed again but kept in the ‘sticky’ state. This resulted in lots of unusable space on pools that could only be reclaimed through a restart.
With the current release, this issue is fixed and space is freed as soon as possible after a failed restore request.
resilience
A very rare race-condition is fixed where a failed upload results in resilience recording a stack-trace.
webdav
An issue with the Milton WebDAV library prevented Partial (or vector-read) GET requests from succeeding. This was fixed now through both an update of the dependency and a local patch while we wait for the proposed fix to be included upstream.
Changelog 4.1.28..4.1.29
- 69b1a98226
- [maven-release-plugin] prepare release 4.1.29
- f285e97109
- webdav: fix proxied partial (vector-read) GET requests
- 239d9c38ea
- pool: fix pool space accounting on failed restores
- d6f3b6d1e1
- resilience: fix NPE if file unlinked when resilience processes a broken file
- c399f14ebe
- ftp/webdav: fix bypass of restrictions
- ed5b2f2167
- [maven-release-plugin] prepare for next development iteration
Release 4.1.28
alarms
An internal issue with the alarms configuration was fixed, which should prevent a rare NullPointerException from occuring.
dcap
Creating a file or directory using the DCAP protocol with a URL as parameter, the file permissions were not set correctly.
With the current release, this was corrected, and such files use the client-supplied file permissions. If none are provided, the default modes 0700 (for directories) and 0600 (for files) are used.
xrootd
An uncaught exception in xrootd doors was fixed.
Changelog 4.1.27..4.1.28
- f4a8299da1
- [maven-release-plugin] prepare release 4.1.28
- 10cd61f23d
- alarms: fix persistence.xml configuration
- 2f7260b151
- dcap: fix permission propagation with DCAP
- 19a502af5f
- dcache-xrootd: handle possible race condition in directory listing
- 25aefdd9b5
- [maven-release-plugin] prepare for next development iteration
Release 4.1.27
statistics
Metadata merge was using max when it should had used min, this is now fixed.
Changelog 4.1.26..4.1.27
- 1fa662a
- [maven-release-plugin] prepare release 4.1.27
- 4fc7fba
- common: fix histogram metadata merge
- a66a9e3
- [maven-release-plugin] prepare for next development iteration
Release 4.1.26
dcache-view
The current release added several bug fixes and improvement to dCache view.
Changelog 4.1.25..4.1.26
- b68e311
- [maven-release-plugin] prepare release 4.1.26
- 51ec9b6
- [maven-release-plugin] prepare for next development iteration
- 113d961
- dcache: release dcache-view version 1.4.6
Release 4.1.25
ftp
The current release provides better protection against leaking proxy/data TCP sockets if client aborts a proxied transfer.
pool
The current release fixed stopwatch error and now IO-statistics collecting is more robust,
avoiding stack-traces with the message This stopwatch is already stopped
.
srm
Clients that use the gridsite protocol, such as davix, can now delegate their credential.
Changelog 4.1.24..4.1.25
- 1d819ee
- [maven-release-plugin] prepare release 4.1.25
- ef535ac
- ftp: make shutdown more robust
- f303edf
- pool: fix stopwatch error
- 72c7bb3
- common: fix bug in CountingHistogram index computation
- 290a335
- [maven-release-plugin] prepare for next development iteration
- 19150ba
- srm: gridsite fix querying validity of delegated credential
Release 4.1.24
ftp
The behaviour of FTP transfers was made more robust in cases where a client disconnects from the control channel prematurely.
The performance markers that dCache sends back to the client in FTP transfers are now more robust against bugs.
nfs
When transient errors in pools cause NFS transfers to have to wait and retry, the system’s behaviour is now more robust and no StackOverflowErrors should be logged any more.
scripts
Maven’s findbugs plugin is now granted more working memory in order to make builds, especially on our continuous integration system, more robust.
srm
Certificate lifetime considerations for VOMS proxy certificates are improved in this release: if a client delegates a credential where the VOMS AC expires before the X.509 proxies, dCache now will not use the credential beyond the AC expiry time. This avoids unnecessary authentication errors.
webdav
When the WebDAV door is considering an HTTP third-party-copy request that uses grid-site delegation, there is a minimum 20 minute validity that any existing delegated credential must satisfy. If this is not satisfied then dCache will request a fresh delegated credential.
Until now, if the client failed to delegate a fresh certificate then the subsequent COPY request was rejected. This release changes that behaviour and enables such transfers.
Changelog 4.1.23..4.1.24
- 62456f1976
- [maven-release-plugin] prepare release 4.1.24
- e8ffe81e2c
- scripts: Avoid findbugs memory errors
- 4f45c29f8e
- nfs: increase request retry delay when selecting/starting pool or mover
- 9abaac003c
- webdav: adjust minimum validity after requesting delegation
- 4e7a6646c9
- srmmanager/webdav: consider VOMS AC validity of delegated credential
- 648e13e9cd
- ftp: make performance marker task robust.
- 650542c1b6
- ftp: avoid NullPointerException if adapter is not connected
- 4bb1eb2cee
- [maven-release-plugin] prepare for next development iteration
Release 4.1.23
pool
Diagnostic logging for failed HTTP third-party transfers was improved.
Billing records for failed transfers now show more detailed information.
The handling of cancelled flush requests for nearline media was rewritten to be more efficient. This resolves issues where pools report “Flush of 0000… failed with: CacheException” followed by “Pool restart required: Internal repository error”.
Compatibility with DPM was improved by increasing HTTP GET requests’ timeouts. This should allow more transfers to succeed.
poolmanager
Supplying poolmanager with an unresolvable hostname as the target will now result in an UnknownHostException instead of the previous behaviour where an (unnecessary) NullPointerException was thrown.
scripts
The format and content of the Storage Description JSON file have been updated according to WLCG suggestions:
- ‘capacity_id’ field is renamed to ‘name’
- ‘total_space’ and ‘used_space’ renamed to ‘totalsize’ and ‘usedsize’ respectively.
- ‘timestamp’ field added
- ‘vos’ field added
- ‘assignedendpoints’ added. Currently hardcoded to “all”.
srm
Logging of errors in the SRM credential store was improved.
webdav
If a non-resolvable host name is given as the source or destination of a third-party copy request, WebDAV will now fail the transfer immediately instead of waiting for a Poolmanager timeout.
Diagnostic logging for failed HTTP third-party transfers was improved.
xrootd
dCache allows xrootd clients to specify a query/opaque string in a kXR_mv request’s source path.
Changelog 4.1.22..4.1.23
- c5596cbe10
- [maven-release-plugin] prepare release 4.1.23
- 412078eaee
- pool: HTTP TPC rework exception logging
- 932b9e838c
- pool: increase TPC socket timeout for GET requests
- 0df6f3862e
- srm: fix credential store logging
- acc5fee6c4
- pool: update log status using exception class name if no message
- 45067402ee
- storagedescriptor: update information based on WLCG feedback
- 0459c95bea
- xrootd: strip off query part from kXR_mv source
- 4b91884fee
- webdav: fail TPC request early on unknown hostname
- a5dac05b31
- nearline-provider: do not propagate thread interrupt flag
- 0a5ff9e2fd
- poolmanager: fix NPE on unknown host
- 267bbc9938
- webdav: improve logging of TPC requests
- cb4b1bde42
- [maven-release-plugin] prepare for next development iteration
Release 4.1.22
Changes affecting multiple services
In order to more easily identify a rejected macaroon in the logs, its ID is now included in the log message.
An irrelevant stacktrace was logged on unexpected CacheExceptions. This was removed, leading to less clutter in the logs.
Different macaroons that were issued against the same secret are now discernible in the logs.
Users now get more information about the reasons why an invalid macaroon was rejected: HTTP requests that are made with an invalid macaroon have a 401 HTTP response with the status-line explanation phrase that describes why the macaroon is invalid.
The access log file also logs why a macaroon was rejected.
core
A library dependency was updated to avoid CVE–2018–11771. This patch introduces no user-visible changes.
frontend
Github issue #4242 was resolved; cell information can now be be gathered using the REST interface without specifying domains.
gplazma
Invalid macaroon logins no longer “spam” gPlazma.
pnfsmanager
When creating a macaroon to allow uploading of data, the desired path may not already exist. Without restrictions, WebDAV will auto-create parent directory items that are missing, or the client can create these directory elements explicitly with MKCOL.
With restrictions (such as from a macaroon) such directory creation currently requires the MANAGE activity, which allows other actions beyond the scope of this scenario. With this release, the behaviour was changed so that a user with a macaroon that authorises them to upload data into a particular directory will be able to create parent directories to achieve uploading the data.
pool
A regression caused pools that had their size only specified in a layout file to report a size of 8 Exabytes. This issue was fixed.
dCache now supports a DPM-specific HTTP extension that indicates the checksum calculation is not yet complete, avoiding potential data corruption with third-party copies: If DPM is calculating a checksum, then any RFC 3230 (i.e., with a ‘Want-Digest’ header) GET or HEAD request returns ‘202 Accepted’ respond status line and an HTML page as the response entity. Since dCache considers any 2xx response as success, the HTML page was previously accepted as the file’s contents, resulting in data corruption.
dCache pools no longer log a stack-trace for non-bug P2P failures.
srm
The domain ‘.access’ log file now contains log information for grid-site delegation activity, which facilitates debugging of http third-party-copying issues.
transfermanagers
The “restriction check by-passed” warning for each WebDAV-initiated third-party transfer is fixed.
webdav
A user may request a macaroon by making an HTTP POST request to the WebDAV door. This log entry was augmented by the ID and type of macaroon used.
A previous patch needed a bit of an update to ensure that X.509-with-FQAN authenticated third-party transfers with macaroons work under all circumstances. This is now ensured.
xrootd
The --zip
option of xrootd clients is now supported.
Changelog 4.1.21..4.1.22
- 0f2fc8770e
- [maven-release-plugin] prepare release 4.1.22
- fe89b26094
- xrootd: add support for kXR_stat on open files
- 430bf8a4b0
- pool: P2P failures trigger stack-trace
- 4781474673
- webdav: obtain FQAN from X.509 credential for gridsite
- f7e445ba77
- core: avoid sending bad macaroons to gplazma
- aa0ba138ad
- webdav: update access log to record macaroon request details
- 27347cf925
- transfermanager: fix missing path
- aaa65d8ec9
- libs: update to commons-compress–1.18
- 8e5163de42
- macaroons: include macaroon id in error message
- fc8cf467bf
- pool: fix pool’s runtime configured size regression (b70b0d9)
- a641fa61cc
- core: provide better feedback and logging if a macaroon is rejected
- aeb4cee6ce
- dcache-frontend: fix array out of bounds exception in cell info service
- edc17a507f
- pool: update HTTP TPC to support retrying GET and HEAD requests for DPM
- 4aee9f201b
- srm: add gridsite delegation interface access-log
- d1c13e24ea
- macaroons: fix logged id
- ea79980dc0
- core: avoid stacktrace on arbitrary CacheException
- 0c1f3ae879
- [maven-release-plugin] prepare for next development iteration
- 30cd18cdd1
- pnfsmanager: allow restricted user with UPLOAD to create parent directories
Release 4.1.21
frontend
When a user does not have the permission to read a file (or is simply not logged in), dCache would previously report a 500 Internal Server Error. This error reporting was improved, reporting 401 Unauthorized or 403 Forbidden as appropriate.
ftp
If the ftp client requests a proxied passive transfer with a different IP family from the control channel (i.e., the client connects using IPv6 and requests an IPv4 data channel, or vice versa) the ftp server must select which IP address it should return to the client.
As pointed out by Francesco Prelz (thanks!), the door currently selects the first address from the same interface that has the desired IP family. However, this may not be accessible by the client.
This release updates address selection so that only usable addresses will be returned to the clients.
info
Clients querying the info service (such as info-provider and storage-report) are now informed of the number of files stored in a space reservation.
The info service now displays the time at which the information it displays was recorded.
poolmanager
This release increases responsiveness for users that are not allowed to stage files, and for NFS users who access offline files. In cases where such a user issued a read request at the same time that Pool Manager handled a staging request, the first request would block for the duration of the staging – potentially quite a while. From now on, users that are not allowed to stage receive appropriate error messages as soon as possible, without having to wait for anyone else.
xrootd
Support for xrootd mkdir
was improved.
Changelog 4.1.20..4.1.21
- 97108af956
- [maven-release-plugin] prepare release 4.1.21
- 0da2eb62e7
- xrootd: update to xrootd4j dependency to 3.2.3
- c1eafbd213
- dcache-frontend: fix error message for IdResource
- af3d35159e
- ftp: better address selection for cross-family passive proxied transfers
- 0919b25eab
- info/space-manager: monitor number of files in reservation
- afe43d0f0c
- info: display the timestamps when metrics were collected
- 4ea99a4997
- poolmanager: do not squash request if state is not allowed
- 664469b785
- [maven-release-plugin] prepare for next development iteration
Release 4.1.20
sysytemd
Systemd did not inherite the system-wide limits and was completely ignoring /etc/security/limits.d/92-dcache.conf
.
This is now fixed and the limits successfully loaded and enabled as expected.
vehicles
The current release has fixed serialization regression in FileCorruptedCacheException
.
Changelog 4.1.19..4.1.20
- de4acc9
- [maven-release-plugin] prepare release 4.1.20
- 9faf17e
- vehicles: fix serialization regression in FileCorruptedCacheException
- 883e2fd
- [maven-release-plugin] prepare for next development iteration
- 8c67519
- systemd: Add
/etc/security/limits.d/92-dcache.conf
in the dcache systemd unit and generator.
Release 4.1.19
Changes affecting multiple services
This rlease fixes an issue with WebDAV 3rd-party-copy requests that are authorized using a macaroon that is only valid for writing a specific file.
NOTE: both the webdav door and transfermanagers must be updated before the fix is effective.
The timeout used by dCache when attempting to send a Kafka event is now adjustable via the configuration properties dcache.kafka.maximum-block and dcache.kafka.maximum-block.unit.
The default timeout for pools, and the xrootd, nfs and webdav doors is now non-zero. This should fix the problem of kafka events being lost under normal operational conditions.
pool
In order to help with debugging issues with partial FTP transfers, dCache pools now are able to log considerable information about failed FTP transfers.
This is controlled by the new property pool.mover.ftp.enable.log-aborted-transfers
.
webdav
dCache can now transfer data with a remote site, authenticating with that remote site using a delegated X.509 credential, but authenticating locally with a macaroon.
xrootd
This release updates xrootd4j, which should help fix occasional “pad block corrupted” issues with older clients.
Changelog 4.1.18..4.1.19
- 5d7936f906
- [maven-release-plugin] prepare release 4.1.19
- f78d25b72b
- pom.xml: update xrootd4j dependency to 3.2.3
- 0b1cb2522a
- webdav: use TLS credential directly for gridsite
- 00f7be99b4
- pool: instrument ftp mover to show partial transfers
- 1703a800ed
- dcache: add configuration for the Kafka producer timeout
- 9e8499a872
- webdav+transfermanagers: support TPC pull with targeted macaroons
- 6cae12248b
- [maven-release-plugin] prepare for next development iteration
Release 4.1.18
frontend
The current release fixed broken directory QoS reporting and now frontend now more accurately describes the QoS of directories; i.e., the QoS that newly written files will receive when written into this directory, assuming none of the targeted pools are volatile.
webdav
the macaroon creation with multiple path restrictions failed with a http error 500 and the error message. This is now fixed and the macaroon creation succeeds when multiple path restrictions are defined.
The current release improved error handling for PROPFIND request.
Changelog 4.1.17..4.1.18
- c905521
- [maven-release-plugin] prepare release 4.1.18
- 377285e
- frontend: fix broken directory qos reporting
- c10aec8
- webdav: avoid throwing any exception when listing a directory for PROPFIND
- 55d9429
- webdav/macaroon: Fix macaroon creation with multiple path restrictions.
- c6c3b7b
- [maven-release-plugin] prepare for next development iteration
Release 4.1.17
dcache-view
Several fixes has been implemented for dcache-view: openid connect redirect handling is fixed, fixed file download.
ftp
dCache now has the ability to log the current status of a transfer at the point the client decided to abort an FTP transfer. This should support a post mortem investigation on why a transfer was cancelled.
nfs
With the current release the timeout of pnfshandler is configurable and nfs door quicker recovers from situations, when a PnfsManager is not available.
Changelog 4.1.16..4.1.17
- 4b873db
- [maven-release-plugin] prepare release 4.1.17
- abf6008
- ftp: add ability to log client-aborted transfers
- a6b0293
- nfs: make timeout of pnfshandler configurable
- d1bc061
- dcache: release dcache-view version 1.4.5
- 384ce15
- [maven-release-plugin] prepare for next development iteration
Release 4.1.16
NFS
When two clients A and B operate on a file in quick succession, A opening the file and B deleting it before LAYOUTGET is called, dCache puts the transfer into the list of active transfers and returned NFS4ERR_NOENT. If a client tries to optimize the corresponding CLOSE call away, as some do, the entries are never removed from the list, effectively creating a leak.
This problem was fixed. Clients now receive an NFS4ERR_STALE message in those cases.
core
Certain transfer failures, such as attempting to use a space-reservation that has insufficient capacity, resulted in the door eventually reporting a time-out problem to the client.
A typical error message would resemble
Request to [>SpaceManager@local ... ] timed out.
This problem was traced to an internal misconfiguration of a messaging component and is fixed from this release onwards.
frontend
The reporting of a file’s QoS status in frontend was improved. Files that are being scheduled for moving to tape are now reported as ‘tape’ instead of ‘disk’.
pool
In some cases, storage info data was not included in messages issued during a pool flush. This caused an irrelevant NPE to be logged.
This problem was solved, and as a side effect of the fix, billing records now have the correct format when reporting on flushes:
08.24 15:24:41 [pool:dcache-lab002-A@dcache-lab002Domain:store] [00006BD12E8925744156AAE87641D4AF73BB,1362] [/] 100013 2 {10006:"Flush was cancelled."}
vs.
8.24 15:51:07 [pool:dcache-lab002-A@dcache-lab002Domain:store] [00005C1387649DD74E0491DFE9A98D97DC39,1362] [/] data:sla2@osm 100015 1 {10006:"Flush was cancelled."}
A bug was fixed that occasionally caused problems with the pools’ Berkeley DB. This could, for example, be triggered by removing files which were in a flush queue.
A typical error message was, e.g.
27 Aug 2018 12:09:33 (cat2_lhcbtape) [Frontend-dcacheview PoolDataRequest] Fault occurred in repository: Internal repository error. Pool restart required: : CacheExcept
ion(rc=204;msg=Meta data lookup failed and a pool restart is required: (JE 7.3.7) Environment must be closed, caused by: com.sleepycat.je.ThreadInterruptedException: En
vironment invalid because of previous exception: (JE 7.3.7) /space/lhcb/tape/pool/meta java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause
incorrect internal state, unable to continue. Environment is invalid and must be closed.)
27 Aug 2018 12:09:33 (cat2_lhcbtape) [Frontend-dcacheview PoolDataRequest] Pool mode changed to disabled(fetch,store,stage,p2p-client,p2p-server,dead): Pool restart req
uired: Internal repository error
webdav
Web clients (such as web-browsers) make OPTIONS pre-flight requests to discover what they are allowed to do, according to the CORS standard.
Unfortunately, some web-browsers make the OPTIONS request without presenting any credentials. If the resource is within a protected directory then dCache currently fails the OPTIONS request.
This release introduces a new behaviour where such requests will always succeed, so that browser pre-flight requests are not hampered.
Changelog 4.1.15..4.1.16
- 3117e40e20
- [maven-release-plugin] prepare release 4.1.16
- 51316cf59d
- nearline-provides: do not interrupt processing thread on cancel
- c903be8c6c
- nfs41: invalidate open-state on layoutget if file is removed
- 610802bae9
- pool: fix NPE on flush
- 09f8da57cd
- webdav: always respond to OPTIONS request
- 3addbcf0cb
- core: ensure pool/poolmanager communication receives errors
- 302edf9436
- frontend: add targetQoS for not-yet-flushed tape files
- c07f1e2a8c
- [maven-release-plugin] prepare for next development iteration
- cb86e6f1ad
- dcache: release dcache-view version 1.4.3
Release 4.1.15
gplazma
The OidcAuthPlugin plugin was updated so that users whos op does not claim
name
, and does not claim given_name
nor
family_name
can use dCache.
pool
This release fixed the log stack-trace for queue
admin commands and now bad admin input for the following admin commands no longer results in a stack-trace being logged:
- queue activate
- queue activate class
- queue remove class
- queue suspend class
- queue resume class
- queue remove pnfsid
poolmanager
NPE is fixed when staging files back from tape and
poolmanager.enable.cache-hit-message
is true.
webdav
The current release updated default credential delegation for third-party copy so that now requesting a third-party copy using a macaroon does not trigger a failed attempt to OpenID-Connect delegation.
Changelog 4.1.14..4.1.15
- d01d6ec
- [maven-release-plugin] prepare release 4.1.15
- 69d3818
- poolmanager: fix NullPointerException when staging files and reporting hits
- ccca569
- gplazma: oidc fix FullNamePrincipal creation
- a3a2904
- libs: update jetty to version 9.4.11
- 0d8ef70
- pool: ‘queue’ admin commands not the log stack-trace on bad arguments
- fd3e041
- webdav: update default credential delegation for third-party copy
- 50b839d
- [maven-release-plugin] prepare for next development iteration
Release 4.1.14
history
This release fixes a bug that could cause startup errors in the history service in the face of network errors.
many
Remote pool monitor would occasionally log stack traces from exceptions
when a domain shut down due to an interrupt. This has been fixed, reducing
the number of irrelevant log entries in such situations.
Changelog 4.1.13..4.1.14
- e5285aa7f7
- [maven-release-plugin] prepare release 4.1.14
- d56f0715a1
- dcache-history: handle Gson syntax errors explicitly
- 98c303c423
- cells: add handling of RemoteProxyFailureException nested InterruptedException to UncaughtException handler
- 8a629cf44a
- [maven-release-plugin] prepare for next development iteration
Release 4.1.13
nfs
dCache 4.1 now uses nfs4j version 0.16.2, which includes a bugfix that should help avoid some rarely observed deadlocks with current Linux clients.
Changelog 4.1.12..4.1.13
- 4b67640b3c
- [maven-release-plugin] prepare release 4.1.13
- 71b2c7023f
- pom: nfs4j–0.16.2 bugfix version
- 2664c4a163
- [maven-release-plugin] prepare for next development iteration
Release 4.1.12
PNFS
pool
HTTP responses now contain more meaningful messages along with the HTTP response codes, instead of only just showing stock messages like “400 Bad request”.
webdav
When the WebDAV door proxies a transfer and a transfer failure occurs, the door previously always just reported “500 Internal Error”. This reporting is now improved, with any more detailed error messages from (possibly) other services taking precedence. For example, if a pool returns a 400 error code, thus complaining about the client’s request, this code is reported instead, which should help with diagnosing the error’s cause.
Changelog 4.1.11..4.1.12
- 39ef1bad70
- [maven-release-plugin] prepare release 4.1.12
- f16cdb0dd9
- webdav: pass on status message phrase to client
- 441f07ddd7
- pool: update HTTP mover to report errors as HTTP status message phrase
- 97d62e96f9
- pnfsmanager: fix digest name handling in
get file checksum
command - 7cbfb3ba88
- [maven-release-plugin] prepare for next development iteration
Release 4.1.11
resilience
Resilience suffered from a bug that would lead to a NoSuchElementException when a pool name no longer mapped to a location known to the Resilience service. This issue has been fixed.
When multiple pools go offline it is possible that all replicas for a given resilient file become unreadable. If the file is not CUSTODIAL, and thus cannot be restored from tape, the discovery of such a file during scanning will generate an error in the ‘history errors’ listing, in the resilience domain .resilience log, and will also raise a general alarm concerning the pool.
There currently exists a command, ‘inaccessible’, which generates a listing of the pnfsids on a given pool which in the current state of dCache have no readable replicas. However, this command takes a while to complete (asynchronously), and the output is written to a file which must be viewed by logging in.
This release introduces ‘refering pool’ information to the error output so that grepping the resilience log for a given pool becomes easier, and adds options to the command to check further details.
Changelog 4.1.10..4.1.11
- 20b2619adc
- [maven-release-plugin] prepare release 4.1.11
- 7f752ec2d8
- dcache-resilience: improve inaccessible file accounting
- d50e97549c
- dcache-resilience: skip invalid cancel filters
- da6366ad2d
- [maven-release-plugin] prepare for next development iteration
Release 4.1.10
scripts
A regression in the dcache pool convert
command was fixed; the command works again.
scripts
The instructions that are printed out once dcache pool convert
completes successfully now correctly point to the
property that needs to be updated, namely pool.plugins.meta
.
Changelog 4.1.9..4.1.10
- 5607cf0272
- [maven-release-plugin] prepare release 4.1.10
- 5544fb58cd
- pool: fix ‘dcache pool convert’ command
- 88e9615ac8
- scripts: update reference to configuration property
- cff48234a4
- pool: fix metadata migration tool to use Path
- 2278e32aa2
- [maven-release-plugin] prepare for next development iteration
Release 4.1.9
frontend
The frontend now correctly handles situations where a transfer that is already completed is killed. Be aware that in order to make use of the bugfix, both the pools and the head nodes need to be updated to at least 4.1.9.
ftp
Since commit eefb964, 3rd-party-transfers using ftp between two dCache endpoints had an issue where connections were not reliably closed. This release fixes the problem.
pool
This release improves dCache’s robustness against network errors: In case registering a file with PNFS manager fails due to a timeout, the request is retried transparently.
Changelog 4.1.8..4.1.9
- 508e23b42e
- [maven-release-plugin] prepare release 4.1.9
- 1e26192be7
- ftp: always close proxied data connection if client closes their half
- 1281d9b1b6
- vehicles: fail-fast on invalid path
- 2403b788a8
- pool: retry request to pnfs manager if timed out
- e4d8bf9857
- dcache-frontend: invalidate transfer when killed mover not found
- 588777f26f
- [maven-release-plugin] prepare for next development iteration
- 51d4b79234
- dcache-frontend: add “Requires admin role” to alarms methods (Swagger)
Release 4.1.8
dcache-view
Now there is a live update of QoS when changed. The current release added the following fixes: fixed the upload toast to show the current status, fixed regression regarding the processing of click events on vaadin-grid tables and fixed path request in view-file.
door
The current release added support for a door advertising multiple hostnames or IP addresses. dCache doors can now advertise multiple interfaces, including DNS aliases.
webdav
Milton’s OPTIONS handler was returning a 404 error if an OPTIONS request targets an entity that did not exist. This behavior deviated from Apache httpd server and was resulting in failed uploads for dcache-view. The current release fixed.
Changelog 4.1.7..4.1.8
- a3c1971
- [maven-release-plugin] prepare release 4.1.8
- db227cd
- gplazma.properties: hint to enable roles
- 768e3f7
- doors: support advertising multiple addresses in LoginBroker
- e467725
- webdav: do not return 404 for OPTIONS request targeting absent entity
- 7fc66fa
- dcache: release dcache-view version 1.4.2
- 08eeef4
- [maven-release-plugin] prepare for next development iteration
Release 4.1.7
pool
There was a regression resulting in the pool logging somewhat useless information during start-up.
This is now fixed and during the pool
s start-up, the replica store
s locations are logged.
Changelog 4.1.6..4.1.7
- 7f9212b
- [maven-release-plugin] prepare release 4.1.7
- 89eb811
- [maven-release-plugin] prepare for next development iteration
- 677cc2e
- pool: fix logging of replica-store on start-up
Release 4.1.6
dcache
The current release updated dcache view version. New v1.4.1 includes following fixes and improvements: redesign rename component, fix memory leakage in view-file, fix rendering & lazy-loading issue in view-file. change billing records and pool activity to use new provider and modify list-row element internal design.
Changelog 4.1.5..4.1.6
- 25dd502
- [maven-release-plugin] prepare release 4.1.6
- 888012f
- [maven-release-plugin] prepare for next development iteration
- bfcaa5c
- dcache: release dcache-view version 1.4.1
Release 4.1.5
dcache-resilience
There was a small regression in the way resilience computes the number of operations necessary to adjust copies when a storage unit definition changes.
The current rellease fixed computation of operation count when storage requirements change.
ftp
In order to aid diagnosing problems when FTP response being lost, now dcache logs failures to wrap/encrypt responses.
webdav
The current release improved error handling for client authentication with OpenID-Connect. A more complete set of information is now logged if OIDC delegation fails, supporting the ability to discover why the delegation failed.
Changelog 4.1.4..4.1.5
- f5cfa0c
- [maven-release-plugin] prepare release 4.1.5
- fdf6d26
- scripts: add support for parsing ZooKeeper transaction logs
- 19113dc
- ftp: log failures to wrap/encrypt responses
- 56086ff
- dcache-resilience: fix computation of operation count when storage requirements change
- d2acb6c
- webdav: log errors if OIDC delegation fails
- ab951e9
- [maven-release-plugin] prepare for next development iteration
Release 4.1.4
ftp
Error reporting in the FTP service has been improved: in some mixed IPv4 / IPv6 scenarios, only unclear error messages were reported.
Changelog 4.1.3..4.1.4
- 98193cf1b8
- [maven-release-plugin] prepare release 4.1.4
- 62553a82cd
- ftp: returned error is too vague for meaningful investigation
- 715b6966d4
- [maven-release-plugin] prepare for next development iteration
Release 4.1.3
frontend
The current release improved REST API
swagger documentation that describes which aspects of
dCache are affected by related operations.
nfs
Situations, where selection process was incomplete could not be manually recovered
When selection process incomplete, due to PoolManager
restart, there was no way to trigger a new selection.
The current release added two new commands to nfs door
: transfer retry
and transfer forget
.
The first command manually re-activates existing transfer by re-trying selection process. The second one should be used to completely ‘forget’ the stale transfer and let client to trigger a fresh selection process.
Changelog 4.1.2..4.1.3
- ee31290
- [maven-release-plugin] prepare release 4.1.3
- 9fca38c
- frontend: add swagger Tag descriptions
- 97a6e1e
- nfs: add commands to reactivate stale transfers
- 07f5f7c
- [maven-release-plugin] prepare for next development iteration
Release 4.1.2
ftp
A NullPointerException could occasionally occur in ftp. This issue was fixed in this release.
The ftp door now provides far more detailed proxy information (including ASCII-art diagrams of connection status).
FTP has received extra debug-level logging to ease debugging.
gplazma
gplazma now supports a Fermilab-specific authorization data source in JSON format.
pool
During active ftp transfers, connection problems would lead to the rather unhelpful error message “451 General problem”. This error reporting was refactored, so that diagnosis of the cause is now greatly facilitated.
poolmanager
A potential NullPointerException (that was not observed in real-world usage until now) was fixed in Pool Manager.
resilience
A correction to resilience’s error handling results in no more reports on non-resilient (but corrupted) files.
spacemanager
In order to facilitate debugging, Space Manager now logs link-group related content in greater detail.
Changelog 4.1.1..4.1.2
- b2ee4ccbe3
- [maven-release-plugin] prepare release 4.1.2
- 102f38bbe7
- poolmanager: fix migration command if named pool is removed
- 95053574d6
- ftp: improve proxy logging to facilitate debugging
- c33c4390d4
- dcache-resilience: repair over-aggressive handling of broken file messages
- 11d215e903
- ftp: add detailed information about proxy status
- 1d09af9b53
- pool: fix error message for failed active FTP transfers
- d7704546c8
- fix the project version in pom.xml
- 2a5f421a08
- gplazma-fermi: fix last modified check in junit test
- 791f03fde5
- spacemanager: add remote pool monitor debug logging
- 029bcc8817
- ftp: avoid NPE if connection is closed.
- d2ae92d1dc
- gplazma-fermi: add mapping plugin to support VO group and username from file
- 85989a2471
- [maven-release-plugin] prepare for next development iteration
Release 4.1.1
Changes affecting multiple services
This version removes the (by now unused) directory /var/lib/cell-info from a default installation. The directory was previously used to store cell info data. With the introduction of the history service in dCache 3.2.1, this became obsolete.
A small bug-fix addresses wrong directory permissions on tar or Debian packages, where the directory /var/lib/dcache/pool-history had the wrong permissions.
The deprecated properties for configuring alarms have now finally been made unavailable.
chimera
An internal update enables chimera to use PostgreSQL 10.
nfs
A modification in IP address handling greatly increases the speed of NFS client access for Linux clients in mixed IPv4/IPv6 environments.
Prior to Linux 4.12, Linux clients with only an IPv4 address would wait for (timeout * retry) seconds when connecting to pools with both a v4 and v6 address. This was fixed upstream in Linux 4.12, but that fix was not backported to e.g. RHEL 7 yet.
Changelog 4.1.0..4.1.1
- 986bde1cce
- [maven-release-plugin] prepare release 4.1.1
- 96fef30a55
- nfs: filter out IPv6 DS addresses if client connected with v4
- 0d5d837a66
- chimera: adjust postgres driver provider to new version schema
- 684780b296
- [maven-release-plugin] prepare for next development iteration
- 6e4e69c8a8
- skel: remove extraneous cell-info dir
- cfeb7b1e48
- packaging: add missing chown and chmod on pool-history
- bc834d029f
- skel: make deprecated alarms properties forbidden
Release 4.1.0
Billing
A newly-introduced configuration property dcache.enable.kafka
controls whether the Kafka messaging
system is used for message delivery.
Two new global options to control publishing via Kafka have been added:
dcache.enable.kafka = true|false
dcache.kafka.bootstrap-servers= host1:port,host2:port
The kafka topic is billing and is currently not configurable. The records are written as JSON objects, like:
{
"msgType": "request",
"mappedUID": 0,
"date": "Fri Mar 02 10:48:41 CET 2018",
"cellName": "NFS-dcache-lab007",
"VERSION": "1.0",
"session": "door:NFS-dcache-lab007@core-dcache-lab007:AAVmauGezqg:1519984121401001",
"fileSize": 969,
"status": {
"msg": "",
"code": 0
},
"storageInfo":"test:public@osm"
"cellType": "door",
"cellDomain": "core-dcache-lab007",
"mappedGID": 0,
"billingPath": "/",
"sessionDuration": 9,
"queuingTime": 0,
"pnfsid": "0000328CE70CBAF24579BF5AD373C05AECC0",
"transferPath": "/",
"@timestamp": "2018-03-02T09:48:41.412Z",
"clientChain": "131.169.240.87",
"type": "dcache_billing",
"client": "131.169.240.87",
"subject": [
"UidPrincipal[0]",
"GidPrincipal[0,primary]",
"GidPrincipal[0]",
"Origin[131.169.240.87]"
]
}
Frontend
The 4.0 release saw the introduction of a range of RESTful services which provide an administrative overview of the system. There have been a few minor changes in the intervening releases. The API is now stable with 4.1.
To summarize: these services provide overviews of file data, poolmanager configuration, service state, pool configuration and activity, and system alarms. One can obtain full information on individual files by pnfsid; billing records for reads, writes, p2ps, stores, restores and removes of a given file are also available if your system is configured to use the billing database. With the database also comes histogram data for tracking reads, writes, p2ps, stores and restores over intervals of 24 hours, 1 week, 1 month and 1 year.
Histogram data on pool queues and file lifetime is similarly aggregated over pools and pool groups. This data collection is actually done by the history service (introduced in 3.2), with which the frontend communicates, so it is only available if the former has been deployed.
Finally, information corresponding roughly to the poolmanager.conf file is furnished by the selection service API, while the space management API provides link group and space token information.
With 4.1 also comes integration with Swagger. All RESTful services have been provided with basic annotations in order automatically to generate API documentation. A convenient web interface which allows exploration and testing of the API, describing paths, parameters, error codes and JSON output, now runs at:
https://[host]:3880/api/v1.
Consulting this page is the easiest way to familiarize yourself with the API.
A number of properties have been added to the defaults for the frontend, most of which have to do with communication with other services (timeouts, endpoints, etc.). Consult the frontend.properties file for details.
FTP
GridFTP defines a set of different transfer modes, among them streaming mode (MODE S) and extended block mode (MODE E). MODE E transfers now support keeping the data connection alive in between uploads, resulting in significantly higher throughput and providing better compatibility with the Globus transfer service.
gplazma
The LDAP plugin is updated to natively support auth phase. As a result,
there is need for an extra mutator plugin to use ldap based authentication
auth optional ldap
map sufficient ldap
NOTICE, that even it’s not required to use ldaps:// url, it STRONGLY avised to do so!
The configuration of LDAP-only-setups, which are becoming more important, is now much easier.
Previously, the plugins jaas
, mutator
and ldap
were necessary for such a setup. Now, the
ldap
module is sufficient. An example of a new-style gplazma configuration file would be:
auth optional ldap
map optional ldap
session optional ldap
identity optional ldap
This functionality relies on using either OpenJDK or Oracle JDK.
NFS
New export option lt= to control layout types is added. The supported values are: - flex_files - nfsv4_1_files While it is recommended to use the flex_files layout type, only modern kernel versions (and RHEL 7.5) are supporting it. If kernel client capabilities are unknown, use nfsv4_1_files, which is the default behaviour.
Pool
Added configuration option to publish billing records into apache-kafka queue:
pool.enable.kafka = true|false
pool.kafka.bootstrap-servers= host1:port,host2:port
Added IO statistics to monitor storage performance. Billing can now log, for read and write operations, the mean bandwidth of each IO request and time that the disk system was handling requests and idling. It is important to note that the dCache configuration that describes the billing log format must be updated before these values are actually recorded.
Resilience
In the various 4.0 releases, resilience was improved in a number ways. We summarize them here.
First, a number of fixes have made it more robust: uncaught exceptions should no longer allow tasks to get stuck in the queue, and the logic concerning available file sources when clear cache location messages has been fixed. In addition, we have fixed the way fatal error alarms are generated so as to avoid spamming the alarm service and domain logs (there is now a special resilience log for recording these errors on a file-by-file basis, configured as usual via properties and logback.xml).
Two features have been added to resilience which enhance its usability. The first involves a redefinition of what pools it considers to be accessible as opposed to usable (writable) for making copies. With this change it is now possible to close off a resilient pool from external writes while still allowing resilience to use it, i.e., by setting
\s resilient-pool-1 pool disable -store
Resilience no long considers such a pool ‘inaccessible’. The rule is now: a pool is readable (by external clients) unless its disabled flags include ‘fetch’, and is writable (by resilience) unless its disabled flags include -p2p-client.
The second new feature is integration of staging into resilience. Files which are CUSTODIAL ONLINE (i.e., resilient but also stored on tape) will now be automatically restored from tape if all their current replicas are not accessible (e.g., because all those pools have gone offline). Since this is achieved via communication with the Pool Manager, we have made sure that a resilient group whose pools are connected to tape (viz. for stores) but for which staging has been disabled will nonetheless receive the first restored replica via p2p from some other eligible staging pool selected by the Pool Manager.
The default sorting order for the admin history and history errors commands has been changed from descending to ascending (more user friendly).
Resilience has now been running for almost a year on a number of systems at DESY and elsewhere with reliable and performant results.
Zookeeper
Updated external zookeeper libraries to version 3.4.11
Changelog from 4.0.0 to 4.1.0
- 684780b
- [maven-release-plugin] prepare for next development iteration
- 95875a4
- [maven-release-plugin] prepare release 4.1.0
- bbfc504
- pom: use nfs4j–0.16.1
- 87c2340
- dcache-frontend: revise Swagger annotations
- 2411ac7
- dcache: release dcache-view version 1.4.0
- 30422a8
- webdav: support multiple RFC 3230 ‘Want-Digest’ headers
- a1bc255
- ftp: ensure half-closed connections are fully closed on return
- 81cc9df
- ftp: close pool connection with MODE-E proxy
- 1f96951
- dcache-frontend: avoid null dereferencing for incomplete pool history data
- 9a24522
- dcache: avoid NPE from initialization race in RestoreRequestsReceiver in HttpPooMgrEngineV3
- 4ae42e8
- [maven-release-plugin] prepare branch 4.1
- d9e07f3
- ftp: complete removal of ancient transaction log
- d01b05f
- dcache-frontend: fix predicates in spacemanager resource
- 7679b60
- zookeeper: silence warning and log lifecycle state events.
- 259ec4f
- alarms: fix broken path
- ad12325
- Revert “cells: remove custom implementation of curator’s PathChildrenCache”
- af4bfbf
- config: unify kafka configuration
- edc9822
- dcache: Removed FTPTransactionLog class
- b477cf4
- ftp: Removed uses of FTPTransactionLog class
- eb3159a
- TransferManagerHandler: Removed FTPTransactionlog
- 94e2a06
- migration: Remove org.dcache.pool.migration.ReverseOrder
- 69729b5
- dcache-resilience: avoid spamming alarms with abort messages
- 6f0b99c
- Revert “dcache-resilience: avoid spamming alarms with abort messages”
- 7294198
- dcache-resilience: avoid spamming alarms with abort messages
- f49cd17
- ftp: update ProxyAdapter API
- 3f30968
- dcache-history: test git hub test
- f529653
- dcache-frontend: add RESTful space service (spacemanager info)
- b69d55d
- fir broken commit 96beb4184c
- a411e2e
- libs: revert “libs: update to liquibase–3.5.4”
- cc0cbb1
- cells: remove compatibility with pre 3.0 releases
- db2a8b3
- pool: fix formating for StorageInfo data
- 7554be3
- nfs: fix wrong imports
- 75d03b3
- dcache-resilience: handle properly RuntimeExceptions from tasks
- 076edee
- libs: update to liquibase–3.5.4
- 5f97c5a
- nfs: add embedded Kafka producer
- eefb964
- ftp: support keeping data connections between proxied MODE-E uploads.
- 3f8fe3e
- ftp: update JavaDoc for adapter classes
- e31531f
- gplazma2-ldap: implement GPlazmaAuthenticationPlugin
- 3564ec6
- gplazma2-ldap: re-write ldap plugin in Java
- cc7c87f
- dcache-resilience: restore proper log level for eviction message
- 8534173
- ftp: ensure adapter is closed
- 96beb41
- system-test: enable only nfs41 file layout
- 3b71edb
- dcache-resilience: integrate support for staging files when no valid replicas accessible
- 1fa0e4b
- dcache: add filter to pool manager to select a read pool from within a specified pool group
- e7c186b
- idcache:exclude hidden dependency to master
- 2445198
- ftp: remove rare NullPointerException when proxying data
- 886dbf0
- httpd: do not log an exception if info cell not running
- cdfbdfc
- poolmanager: silence NoRouteToCell for stage queue topic
- deb1fac
- poolmanager: fix poolmanager startup with certain poolmanager.conf content
- b243c5d
- dcache-frontend: change all JSON representations of pnfsid to String type
- 37ff3a3
- dcache-resilience: repair faulty commit
- aab9309
- pool: bug-fixfor commit@74880c32a97…
- 74880c3
- pool: add embedded kafka producer
- de97ecb
- core: remove getDatabaseId from PnfsId
- ba972b3
- dcache-resilience: define non-writable pool to mean p2p-client is disabled
- ea0318b
- dcache: fix remote pool monitor wait bug
- e7773a1
- cells: wait for zookeeper node before update the content
- a352dfc
- dcache-resilience: repair handling of broken files*
- 5b37a84
- packaging: check ‘services.sh’ after old rpm removed
- 2139302
- core: remove support for ‘*’ PnfsIds.
- 413e4ad
- core: remove support for ‘domain’ in PNFS-style PnfsIds
- fdd0731
- utils: do not iterate over certificates twice during validation
- 1e3e96f
- core: tidy up unit-tests
- 3c30323
- dcache-resilience: fix bug in source handling with Clear Cache Location messages
- 4ff42f7
- src: remove unused diskCacheV111.util.UptimeParser
- 5fec809
- remove dead code from PnfsId
- 90bab98
- pool: use interface rather than concrete class
- d40d999
- dcache-resilience: fix wrong assumption about error type in Message
- 1d2f7ee
- pool: add IO statistics to monitor storage performance
- ecc2c61
- cells: fix broken commit 0be9a678
- 0be9a67
- cells: add explicit ZooKeeper/Curator monitoring
- 33d603a
- frontend: Map requests with bad JSON to HTTP 400 Bad Request status code
- 0d8ac41
- pool-manager: use java8 streams to get Active or Enabled pools
- abc6bbb
- pom: update zookeeper to latest stable release 3.4.11
- cfd0c9c
- pom: update apache curator version to 2.12.0
- 1274b30
- login-broker: replace LoadProvider interface with DoubleSupplier
- 5c8562e
- src: fix broken test after commit 4d564d2534
- 4d564d2
- cells: remove null checks for CuratorFramework
- 47b7d51
- cells: remove historical unused code
- e71f1fc
- cells: remove custom implementation of curator’s PathChildrenCache
- d0aa2c6
- pool: use NFS internal reboot verifier for WRITE and COMMIT
- fbda58a
- dcache-resilience: change default sorting orders to ASC on admin history commands
- 1ccc2cb
- gplazma: update multimap to support primary-agnostic matching
- 86f6d1b
- pool: remove embedded nfs v3 server for felxfile layout type support
- 4022844
- libs: switch to nfs4j–0.16.x
- 5ad6fff
- nfs: return LAYOUTUNAVAILABLE for DOT files
- fe50695
- frontend: add remaining swagger annotations
- bd6a2fa
- star: fix printing exception error message
- 81178fb
- storage-descriptor: add new binary to RPM manifest
- cf1cf65
- storage-descriptor: add support for generating new format
- d157b92
- star: update documentation to provide better description of script
- 8a60269
- frontend: fix broken commit
- 16739c5
- frontend: add qos swagger annotations
- a56f887
- frontend: add swagger namespace API annotations
- d101fc6
- frontend: add global swagger annotations
- 3b54f8e
- frontend: add swagger-ui support
- 551ae71
- frontend: add swagger annotation scanning
- 40a43c4
- frontend: convert tabs to spaces
- 298f858
- pool: refactor IO statistics
- 4a05834
- Revert “zookeeper: remove work-around for ZOOKEEPER–2809”
- 0cad361
- zookeeper: remove work-around for ZOOKEEPER–2809
- 77d82c0
- nfs: fix ServerFault on FileNotFoundHimeraFsException
- 646da1f
- ftp: support non-ASCII filenames in LIST output
- 0de9090
- ftp: update LIST output to be more like ‘ls -l’
- 1576d5e
- scripts: update reference to JDK to avoid mentioning specific java version
- efed757
- info: allow admin to control whether non-VO / non-FQAN identities are shown
- 0010216
- chimera: fix deadlock in Postgres driver
- 77c85a1
- gplazma: update multimap for generic match and add FQAN support
- 4598f02
- authentication: update multimap to cooperate in generating primary gids
- c538064
- pools: fix NPE from info command at startup
- 7887ba9
- systemd: stop all dCache service units in Debian prerm
- 279c72f
- chimera: correct previous attempt to fix ‘lost+found’ directory permission
- f31dfa1
- [maven-release-plugin] prepare for next development iteration
- 4bb2574
- maven: fix dcache.service permissions
–