dCache 2.5 Release Notes
Highlights:
- Support for certificates with SHA-2 checksum
- Switch to JGlobus2
- Added external service notification on missing files
- Alarms service with web page
Table of contents
Upgrade Instructions
Starting with version 2.4.0, dCache requires Java 7.
Compatibility
If NFS4.1 and HTTP PUT redirect are not used, then 2.5 doors, head nodes can be mixed with pools of releases 2.2 or newer. Doors and head nodes have to be updated at the same time.
If server side SRM copy is used dCache 2.5 is not compatible with pools of dCache 2.2.8 or before. Please upgrade to dCache 2.2.9 in this case.
Please note that due to a regression introduced in dCache 2.5.0 dCache 2.5.0 and dCache 2.5.1 are not compatible with dCache 2.2 to 2.4. As this has been fixed in version 2.5.2, dCache 2.5.2 is not compatible with dCache 2.5.0 or dCache 2.5.1.
dCache 2.5.2
Chimera
Fixed compatibility issues with uberftp and list operations on the root directory.
Service: pool
Fixed a bug that was causing the number of movers shown in the info page and on the httpd pages to be negative.
Fixed a regression in the migration module. The regression broke the functionality of the migration info command.
Service: poolmanager
Fixed glob matching for psu commands.
Service: pnfsmanager
Fixed compatibility issues with uberftp and list operations on the root directory.
doors
Fixed compatibility issues with uberftp and list operations on the root directory.
Miscellaneous
Added a producer of StAR accounting records. The StAR producer works with EGI's Apel SSM v2. Sites publishing StAR records for EGI must update their SSM RPM to at least v2.0.1.
In the case when the OS does not allow dCache to create more threads dCache wants to restart the service. Fixed a bug that did not allow dCache to stop a service in this case.
Fixed an infinite loop and the false positive logging of a stack trace in message handling.
Fixed the incompatibility of dCache 2.5 with dCache 2.2 to 2.4. As a side effect 2.5.2 is incompatible with 2.5.0 and 2.5.1.
Improved compatibility issue with Java 7.
Fixed the problem of a DN containing double slashes like in http://.
Changelog 2.5.1 to 2.5.2
- [r18175] star: add producer (Merge of r18105, r18142)
- [r18222] libs: upgrade Parboiled to 1.1.4
- [r18223] pool: Fix problem with negative number of active movers (Merge of r18207)
- [r18225] pool: Fix migration module regression (Merge of r18203)
- [r18226] cells: Preallocate killer thread
- [r18230] cells: Fix retry loop and logging in sendAndWaitToPermanent (Merge of r18210)
- [r18232] system-test: fix webadmin gid (Merge of r18202)
- [r18233] wicket: Exclude cglib dependency (Merge of r18190)
- [r18234] poolmanager: Fix glob matching for psu commands (Merge of r18174)
- [9284cbc] pom: scm should point to github
- [081e3e9] doors,pnfsmanager,chimera: Add checks for root directory
- [ed2beee] [fix] Add serialVersionUID field to legacy principals
- [50c70b7] pom: use custom jglobus2 libraries
dCache 2.5.1
Chimera
Updated output from chimera-cli ls command to match ls -l.
Service: pool
Fixed a timeout bug in the migration module.
Service: pinmanager
Fixed a regression that was introduced in dCache 1.9.13 and prevented pinmanager from retrying stage failures.
Service: spacemanager
Fixed a bug that produced a stack-trace if spacemanager operations were attempted when the spacemanager service was disabled.
Service: httpd
Fixed a problem with usage charts on the pool and pool groups pages not being shown when using certain settings for system locales.
Fixed the bad behavior of the webadmin whereby the user was redirected back to an unencrypted connection after logging on to an admin page.
Made webadmin login more intuitive and friendly. Now, a valid certificate which is mapped to an admin user allows immediate access to the page without having to go through the login page.
Service: billing
Fixed a problem where the billing service would spontaneously restart, due to running out of memory when computing the 24-hour histograms from fine-grained data tables.
Service: alarms
Modified the initialization process of the alarms service to make it easier to use an RDBMS instead of the XML file to store the alarms. Just create the alarms database with
createdb -U srmdcache alarms
and set the property alarms.store.db.type=rdbms in the dcache.conf file.
Added the possibility for the use of group tags in
the include-in-key in the definition of an alarm.
As an example we show the definition of the checksum alarm:
<alarmType> logger:org.dcache.pool.classic.ChecksumScanner, regex:"Checksum mismatch detected for (.+) - marking as BROKEN", type:CHECKSUM, level:ERROR, severity:MODERATE, include-in-key:group1 type host service domain </alarmType>
Here the tag group1 extracts the pnfsid from the message
and includes only that portion of the message string as an identifier.
The tag must be expressed as "group + number" without any
whitespace. group0 is identical to "message".
Defined the following alarms:
TYPE [SEVERITY] -------------------------------------------------------------------------------------- SERVICE_CREATION_FAILURE [CRITICAL] DB_OUT_OF_CONNECTIONS [CRITICAL] DB_UNAVAILABLE [CRITICAL] JVM_OUT_OF_MEMORY [CRITICAL] OUT_OF_FILE_DESCRIPTORS [CRITICAL] IO_ERROR [HIGH] HSM_READ_FAILURE [HIGH] HSM_WRITE_FAILURE [HIGH] LOCATION_MANAGER_UNAVAILABLE [HIGH] POOL_MANAGER_UNAVAILABLE [HIGH] POOL_DISABLED [MODERATE] CHECKSUM [MODERATE]
Added the property alarms.server.host to configure the host on which the alarms server runs.
Service: srm
Fixed a bug that prevented clean shutdown of the SRM.
Fixed credential delegation for srmCopy transfers which was broken due to the jGlobus 2 update.
dcache script
Fixed a bug that lead to an error message on attempts to run the dcache command. This problem occurred when the partition storing the configuration cache ran out of space. Since this is stored as /var/lib/dcache/config/cache by default, this can happen if log files are aggressively using up all available space and the dCache configuration has changed since the last dcache command.
New properties:# dcache status /usr/share/dcache/lib/loadConfig.sh: line 89: getProperty: command not found /usr/share/dcache/lib/loadConfig.sh: line 90: getProperty: command not found /usr/bin/dcache: line 370: getProperty: command not found
- alarms.server.host
Miscellanious
Fixed the incompatiblity of dCache 2.5.0 pools with pools older than 2.5.
Added a possibility to catch system start-up exceptions for a given domain and process them as alarms.
Increased speed of system shutdown by fixing several small problems.
Changelog 2.5.0 to 2.5.1
- [r18077] Fix billing plot scalability issue (Merge of r18014)
- [r18079] pool: Fix timeout bug in migration module (Merge of r18029)
- [r18080] space-manager: fail requests when disabled (Merge of r18035)
- [r18081] srm: Use correct scheduler for space reservation requests (Merge of r18062)
- [r18082] httpd: Don't use system locale when generating HTML (Merge of r18063)
- [r18083] config: fix error message for obsolete properties (Merge of r17999)
- [r18084] scripts: fix dcache cmd for zero-length cache file (Merge of r17969)
- [r18086] chimera-cli: fix output of 'ls' cmd (Merge of r17963)
- [r18128] add legacy GlobusPrincipal class to restore 2.2 pool compatibility (Merge of r18112)
- [r18129] bootloader: report command execution exceptions through logging system (Merge of r18120)
- [r18130] core: Cleanup shutdown code
- [r18132] alarms: webadmin redirects to http after admin page login (Merge of r18127)
- [r18136] pinmanager: Retry stage failures (Merge of r18092)
- [r18137] alarms: improvements
- [r18138] alarms: add key regex group (Merge of r18110)
- [r18139] alarms: add a standard set logback.xml (Merge of r18126)
- [r18155] alarms: fix minor configuration issues (Merge of r18141)
- [r18156] webadmin: login more intuitive and friendly (Merge of r18143)
- [r18157] alarms: add missing datanucleus.properties
dCache 2.5.0
Service: pool
Introduced the new property destroyOrphanReplicaOnFlush. By default this property is set to true, which does not change the former behaviour. Due to some sort of failure (like a power outage) it might happen that there are files without corresponding namespace entries. If the property destroyOrphanReplicaOnFlush is set to true these files will be destroyed on flush. Set this property to false to avoid this.
Enabled the migration module to work even if the pool it wants to read from is disabled.
Service: poolmanager
Added a possibility to see clients in the request container. With the option -l the command rc ls will not only list the number of requests but also the actual clients.
Example:
[example.org] (PoolManager) admin > rc ls -l 00002ECA2067CC954F8CBEBD872D20D04B2B@0.0.0.0/0-*/* m=2 r=0 [<unknown>] \ [Suspended (pool unavailable) 11.29 16:19:03] {0,} DCap-3.0,131.169.185.68:35150 NFS4-4.1:example.desy.de/131.169.185.68:714
Service: alarms
Added the new service alarms.
Any logging event can be defined as an alarm. Administrators can thus be directly notified of problems which need immediate attention and rectification.
To enable the alarms service, it is recommended to add it to a new domain which needs to be on the same node as the httpd service, e.g.:
[alarmserverDomain] [alarmserverDomain/alarms]
To be able to use the alarms webpage, you need to be able to login to the webadmin.
Service: gplazma2
Restored auth dcap functionality.
Made the location of the VOMS directory configurable for the VOMS and XACML plugins. It remains fixed (as it was originally) at the standard location for the GsiTunnel and the SRMAuthorizer.
Service: srm
Added the missing 'export SRM_PATH' statement to srmrmdir.
Added timeout arguments to GridFtpClient. The timeouts -first_byte_timout and -next_byte_timeout (in seconds) can be specified via the srmcp client. When transferring large files (~ 10GB) it is advisable to set -next_byte_timeout to e.g., 1200 seconds.
Service: nfs
Fixed last access time calculation in the mover.
Service: nfsv4.1
Added support for linux clients with numeric idmappings. A typical NFSv4 installation requires an LDAP or NIS server for user identity management. To allow NFSv3 style numeric id based mapping the legacy mode should be enabled in the dcache.conf file or in the layout file:
nfs.idmap.legacy= true
Service: dcap
Fixed GSS error message.
Modified dcap to not request GSI delegation.
Introduced the two new properties dcapAnonymousAccessLevel
and dcapReadOnly to restore the read only dcap functionality
which was lost at some point between 1.9.5 and 2.2.
The property dcapAnonymousAccessLevel controls anonymous user
access level and is set by default to READONLY. This is the
level of access in case that the authenticated login failed (e.g. for
kerberos or gsi dcap). The plain dcap door provides unauthenticated
and therefore anonymous access by definition. In order to enable
writes via plain dcap doors this variable must be set
to FULL.
The property dcapReadOnly enables/disables write access to
any dcap door (regardless of anonymous or authenticated access). By
default it is set to false.
Renamed the authenticated dcap door cell name from DCap-${host.name} to DCap-auth-${host.name} and introduced the new property dCapAuthPort with default value 22129 for its port.
Service: missing-files
Introduced this new service. This is an optional, pluggable component that allows dCache to respond to missing files. This central service instructs the door to either fail the request or retry (which makes sense only if the file has been fetched from some external source).
For the start the missing-files service can be enabled for the WebDAV door. The other doors will follow. To enable it write something like
in your layout file.[someDoors/webdav] missing-files.enabled = true
We currently provide a single plugin semsg which uses an external program to send a notification when a user tries to read a file that doesn't exist. The default configuration is in the missingfiles-semsg.properties file. Anyone else may write their own plugins and add them to the comma-separated list missing-files.plugin.list like.
These plugins are used to determine how dCache should react when a user attempts to read a missing file. Each plugin is asked in turn what to do until a plugin replies with a terminating answer or the list of plugins is exhausted. A plugin replies saying to fail the request, to retry the request or to ask the next plugin in the chain. If the last plugin defers the request then then the missing-files service will instruct the door to fail the request.[someDoors/webdav] missing-files.enabled=true missing-files.plugin.list=plugin1,plugin2,plugin3
Note that populating dCache with files from a remote storage system is currently not possible since the plugin interface does not allow triggering 3rd-party copies.
Properties
New properties:- missing-files.name
- missing-files.timeout
- missing-files.enabled
- missing-files.plugin.list
- missing-files.thread-count
- missing-files.plugins.semsg.broker.host
- missing-files.plugins.semsg.broker.port
- missing-files.plugins.semsg.auth-dn
- missing-files.plugins.semsg.max-files
- missing-files.plugins.semsg.max-delay
- missing-files.plugins.semsg.certificate
- missing-files.plugins.semsg.private-key
- missing-files.plugins.semsg.command
- missing-files.plugins.semsg.broker.topic
- missing-files.plugins.semsg.broker.wire-format
- missing-files.plugins.semsg.broker.endpoint
- destroyOrphanReplicaOnFlush
- dCapAuthPort
- dcapAnonymousAccessLevel
- dcapReadOnly
- alarms.server.port
- alarms.server.config
- alarms.store.path
- webadmin.alarm.cleaner.enabled
- webadmin.alarm.cleaner.timeout
- webadmin.alarm.cleaner.delete-threshold
- webadmin.alarm.dao.properties=
Miscellanious
Fixed incompatibility of JAIDA libraries with changes in java.awt package in Java 1.7 which prevented the generation of the billing histograms. The fix is backward compatible with Java 1.6.
Due to immanent transition to digital signatures that use SHA-2 (Secure Hash Algorithm) we had to migrate to jglobus-2 libraries to support SHA-2 signed certificates. Unlike previously used cog-jglobus libraries Jglobus-2 suports both SHA-1 SHA-2 signed certificates.
Changelog 2.4.0 to 2.5.0
Crossed out entries have been merged into the 2.4 branch.
- [r17779] bump trunk version number
- [r17781] export SRM_PATH prior to calling underlying srm script
- [r17782] SRM: move org.dcache.srm.scheduler.Job to org.dcache.srm.request.Job
- [r17783] SRM: move org.dcache.srm.scheduler.Job to org.dcache.srm.request.Job
- [r17798] fix broken merge trunk@17797
- [r17802] X509/VOMS Utility refactoring
- [r17803] Logging-based alarms, part 1 (logback API)
- [r17804] Logging-based alarms, part 2 (storage/DAO)
- [r17805] Logging-based alarms, part 3 (front-end)
- [r17806] Removing duplicate pom.xml
- [r17807] Logging-based alarms, part 5 (server, embedded)
- [r17813] Alarms - findbugs fixes
- [r17814] alarms findbugs (2)
- [r17816] added timeout arguments to GridFtpClient
- [r17819] Use Guava LoadingCache for log filter
- [r17820] Be explicit about requiring serializability
- [r17822] cells: Refactor CommandInterpreter
- [r17823] Use Java 5 style foreach loops
- [r17824] Use Java 7 switch statement when possible
- [r17825] boot: add support for python properties oracle
- [r17828] libdcap: use correct status variable to print gss errors
- [r17829] libdcap: do not request GSI delegation
- [r17830] missingfiles: add support
- [r17831] Use Java 7 try-with-resource
- [r17832] Replace Collections.EMPTY_* with type safe variant
- [r17833] Use modern loop constructs
- [r17846] Use Java 7 diamond operator
- [r17849] Collapse identical catch blocks
- [r17850] 1) change human readable part of MLST reply from
- [r17851] Billing: remove zero-length files
- [r17852] Logging-based alarms, part 4 (tests)
- [r17853] Missing getters on Alarms Test utility class
- [r17854] cells: Add annotation based cell command support
- [r17856] restore "r/o dcap doors" (equivalent of http://rb.dcache.org/r/4936
- [r17857] follow up on 17856
- [r17860] cells: Fix command scanner for inherited commands
- [r17861] pool: Port MigrationModule to new annotated command infrastructure
- [r17862] pinmanager: Port CLI to annotated commands
- [r17864] missingfiles: add file mistakenly omitted from previous commit
- [r17868] common: Avoid use of raw types
- [r17870] Move plot-related functionality out of the BillingDatabase cell class
- [r17871] nfs: fix last access time calculation in the mover
- [r17878] add option destroyOrphanReplicaOnFlush
- [r17879] reverted the revision 17878
- [r17881] chimera: fix FsInode.exists() function by throwing exception
- [r17890] Remove instance-bound PKIVerifiers; replace with static map
- [r17893] pool: fix mover start time in a JobInfo
- [r17898] PKIVerifier initialization bug
- [r17899] make dele, mkd, rmd function replies RFC compliant
- [r17900] Handle case when UserNamePrincipal is specified in dcache.kpwd
- [r17905] fix typo
- [r17906] properties: handle authenticated dcap door by assigning a specific
- [r17917] Allow for dynamic creation of XML storage file for alarms
- [r17918] reverting to 17906
- [r17919] Whitespace only fix
- [r17920] fix typo
- [r17926] JAIDA-Freehep Java 1.6/1.7 compatibility
- [r17927] pool: Add flag to migration {copy/move/cache} to override the disable state of the source pool
- [r17928] nfs: support linux clients with numeric idmappings
- [r17932] missingfiles-semsg: Delete file erronously commited in r17849
- [r17937] Alarms modifications for system-tests compatibility and easier/automatic path configuration
- [r17938] WebAdmin: adding missing billingPlots CSS
- [r17939] Fix configuration cache
- [r17941] poolmanager: add a possibility to see clients in request container
- [r17943] nfsv4: initialize portmap as spring bean
- [r17944] add RemoteNameSpaceProvider
- [r17948] migrate to jglobus2.0
- [r17949] gplazma: allow plugins to be 'dCache aware'
- [r17950] pom: suppress dependency on cog-jglobus jars
- [r17951] Regex utilities (alarms) refactored for reusability
- [r17977] gplazma: add missing directory-listing-handler (Merge of r17862)
- [r17978] defaults: fix double-reported TCP port for http (Merge of r17976)
- [r18034] cells: Fix cell shell regression (Merge of r18033)