How to get from 1.9.12 to 2.2
By Gerd Behrmann <behrmann@nordu.net>
dCache 2.2 is the third long term support release (aka golden release). It is the result of 12 months of development since the release of dCache 1.9.12, the previous long term support release. During these 12 months 4 feature releases (1.9.13, 2.0, 2.1, and 2.2) were made at regular intervals. This document compiles the most important information from these four feature releases.
dCache 2.2 will be supported with regular bug fix releases at least until April 2014. dCache 1.9.12 will be supported until April 2013, however one should expect the release frequency to drop of as only the most critical bugs and security issues will be fixed. While the upgrade path from 1.9.12 to 2.2 is easy, it should be expected that no direct upgrade path from releases prior to 2.2 to a future (fourth) long term support release is provided.
Many things have changed between 1.9.12 and 2.2 and this document does not attempt to describe every little change. The focus is on changes that affect the upgrade process and on new features. Minor feature changes and bug fixes are often excluded. There is more information scattered in the release notes of each individual release.
The last section of this document contains useful reference material that should be consulted while reading this document. The reference material also includes a proposed checklist that may be used while planning an upgrade to 2.2.
The filesystem hierarchy standard (FHS) provides a set of requirements and guidelines for file and directory placement under UNIX-like operating systems.
dCache has traditionally been installed in /opt/d-cache. Although this directory is specified by the FHS, dCache did not follow the recommendations for the installation in /opt.
dCache is now distributed in two different layouts:
The FHS packages install the bulk of the static files in /usr/share/dcache, with a few files in /usr/bin, /usr/sbin and /usr/share/man. Configuration files are stored under /etc/dcache and non-static files are stored under /var/lib/dcache with log files in /var/log/dcache.
The FHS packages automatically create a user account (dcache) during installation and dCache will drop to this account during startup. An init script and a logrotation configuration is automatically installed. Admin door ssh keys are auto created during installation.
Both layouts are distributed via www.dcache.org, but the FHS packages should be preferred. It should be expected that future feature releases will only use the FHS layout.
Migration from the classic layout to the FHS layout is possible and recommended, however the procedure is manual and not covered by this document. This process is described in a separate document.
We recommend and expect that all users will transition to the FHS packages. Users that wish to continue installing in /opt may do so using the FHS tarball. This tarball uses an internal layout similar to the other FHS packages, but can be installed in any directory. One looses the convenience of the package manager, but gains flexibility in where to install dCache.
Veteran dCache administrators will note the change in version number scheme. Starting with the release of dCache 2.0.0 we use the second digit to distinguish feature releases and the third digit to distinguish bug fix releases.
No direct upgrade path is provided from releases earlier than 1.9.12. Users of earlier releases should upgrade to 1.9.12 first, verify that everything works, and subsequently upgrade to 2.2.
We strongly recommend upgrading to the latest release of 1.9.12 before upgrading to 2.2. At the time of writing this is 1.9.12-17.
Important: Before upgrading to 2.2, the pool manager's configuration file has to be regenerated by issuing the save command in the pool manager's admin interface. This has to be done using at least version 1.9.12-9 or 1.9.13-3. Failing to do so will prevent the pool manager from starting after upgrade and the configuration file has to rewritten by hand.
Head nodes of 2.2 are compatible with pools 1.9.12-11 and newer, 1.9.13-4 and newer, 2.0.0 and newer, and 2.1.0 and newer, up to and including 2.2. Pools from any of these releases can be mixed. The exception to this rule is if NFS 4 is used; in that case all pools and head nodes have to be upgraded to 2.2 together.
Beginning with the release of 2.3 (primo July 2011) head nodes will only be compatible with pools belonging to release 2.2 and newer.
Assuming that NFS 4 not is used, a staged upgrade can be performed by first updating all head nodes (PNFS manager, pool manager, pin manager, space manager, SRM, all doors, monitoring services, etc) while leaving pools on the earlier release. Once the head nodes are online again and confirmed to be working, pools can be upgraded one by one while dCache is online. Obviously the service will be slightly degraded as files on a pool that is being upgraded will be momentarily unavailable. Please note that staged upgrade is not possible if pool nodes run any other dCache service (such as doors) besides the pool service.
The alternative to a staged upgrade is to shutdown all dCache services and upgrade all nodes.
In either case an in-place upgrade is possible and recommended.
Lots of components have been modified to improve consistency, robustness, latency, and add new features. In several cases this has affected the semantics of common operations in dCache. We recommend reading through the following sections, paying attention to issues like authorization, file ownership, multihoming, obsolete and forbidden configuration properties, and init scripts.
ACLs are now tightly integrated into Chimera. Consequently, ACL support for PNFS has been removed. We recommend upgrading to Chimera if ACL support is needed.
The ACL command line tools chimera-get-acl and chimera-set-acl have been replaced by the getfacl and setfacl subcommands of the Chimera command line tool chimera-cli. Eg. rather than chimera-get-acl one now uses chimera-cli getfacl. The arguments of the setfacl command have changed to no longer rely on an explicit index to order the ACEs. The following is an example of using the setfacl command:
$ chimera-cli setfacl /pnfs/desy.de/data USER:123:lfx:A GROUP:123:lfx:D
The ACLs can now also be queried and updated through NFS 4. Assuming NFS 4 file system is mounted, one can query and update ACLs like so:
$ nfs4_getfacl /pnfs/desy.de/data/generated/acl-test
$ nfs4_setfacl -a A::tigran@desy.afs:arw
The admin shell commands previously provided by the acl service, that is, setfacl and getfacl, are now integrated into the pnfsmanager service. The acl service is obsolete and should be removed from layout files.
By default, dCache does not implement the correct POSIX semantics for lookup permissions: Only lookup permissions of the parent directory are enforced. This was traditionally done to improve performance with the PNFS backend, but is now only kept to maintain backwards compatibility. The default behaviour is unchanged, however setting the new pnfsmanager configuration property pnfsVerifyAllLookups to true will enable POSIX semantics. The property is only supported for Chimera.
For Chimera, the authorization checks have been optimized to reduce the number of database queries involved. This reduces latency on name space operations and improves Chimera throughput.
gPlazma 1 is no longer supported as a standalone service. The gplazma.version property is obsolete. Support for legacy authentication and mapping schemes is provided through the new gplazma1 plugin for gPlazma 2. This plugin uses the legacy /etc/dcache/dcachesrm-gplazma.policy configuration file. The default gPlazma 2 configuration in /etc/dcache/gplazma.conf loads the gplazma1 plugin, which means that existing users of gPlazma 1 should not have to make any modifications when upgrading to dCache 2.2. We do however recommend that users migrate away from the gplazma1 plugin as soon as possible. Henceforth we will no longer refer to specific versions of gPlazma.
Supports the legacy /etc/dcache/dcachesrm-gplazma.policy configuration file. Should be used like this:
auth requisite gplazma1 map requisite gplazma1 session requisite gplazma1
Although mixing the gplazma1 plugin with other plugins is possible, we recommend migrating away from this plugin as soon as possible.
gPlazma now supports password authentication through the kpwd plugin. The kpwd plugin is not new, however the support for password based authentication is. The dcache kpwd subcommand of the dcache script allows kpwd files to be manipulated.
Should be used like this:
auth sufficient kpwd map sufficient kpwd account sufficient kpwd session sufficient kpwd
The new jaas authentication plugin for gPlazma delegates password authentication to the Java Authentication and Authorization Services (JAAS). A valid JAAS configuration has to provided in /etc/dcache/jgss.conf. JAAS has traditionally been used to support Kerberos authentication in dCache, however the jaas plugin is not limited to the Kerberos use case. Successful authentication results in a user name principal, which can be further mapped using one of the mapping plugins.
Should be used like this:
auth sufficient jaas gplazma.jaas.name=gplazma
with a /etc/dcache/jgss.conf containing something like:
gplazma { com.sun.security.auth.module.JndiLoginModule required user.provider.url="nis://NISServerHostName/NISDomain/user" group.provider.url="nis://NISServerHostName/NISDomain/system/group"; }
This would cause JAAS to use an external directory service to verify the password (the JndiLoginModule also supports LDAP).
Note that the gPlazma jaas module only supports the auth step. It cannot associate the session with UID, GID or other information provided by JAAS.
There are lots of third party JAAS login modules available, allowing you to easily use external password validation services with dCache.
The new krb5 mapping plugin for gPlazma is to be used in conjunction with the nfsv41 service for Kerberos authentication (see Kerberos authentication). The nfsv41 service submits KerberosPrincipals on the form user@example.org to gPlazma. The krb5 plugin strips the domain suffix, leaving only the user name in a user name principal. Other plugins (eg nsswitch, nis, authzdb) can be used to map the user name to UID and GID.
Use the plugin like this:
map optional krb5
The new nsswitch mapping, identity, and session plugin for gPlazma allows the systems native name service switch to be used for mapping user name principals to UID and GID.
The new nis mapping, identity, and session plugin for gPlazma allows the Network Information System to be used to map user name principals to UID, GID and home directory.
The ftp service now supports gPlazma for password authentication. Use the useGPlazmaAuthorizationModule and useGPlazmaAuthorizationCell configuration properties to control whether gPlazma is used or not. Note that by default gPlazma is used. Existing deployments will either have to update their gPlazma configuration to support password authentication or explicitly disable the use of gPlazma for the ftp service.
The webdav service has been updated to support HTTP Basic authentication. Password verification is done through gPlazma. Please note that HTTP Basic authentication over an unencrypted channel is vulnerable to man-in-the-middle attacks. We strongly recommend only using HTTP Basic authentication over HTTPS.
Two new commands were added to the gPlazma the admin interface.
test login replaces the existing get mapping command. It shows the result of a login attempt but is more flexible when describing which principals have been identified for the user.
explain login uses the same format as test login, but provides detailed information on how the login was processed. The result of each processed plugin is explained in a tree structure.
Pool manager is used by doors to perform pool selection. Essentially, pool manager routes transfer to pools, controls staging from tape, and coordinates pool to pool transfers. Some of the biggest changes between 1.9.12 and 2.2 happened in the pool manager and how it is used by doors and pin manager.
In previous versions the retry logic in case of pool selection failures was placed inside pool manager. The consequence of that design decision was that doors and pin manager would never know what was happening inside pool manager: Was a file being staged or copied, or was the transfer suspended because the pool with the file was offline. Another consequence was that pool manager needed logic to query file meta data from PNFS manager. The query logic replicated similar logic already present in doors and would add latency to the pool selection process.
This behaviour was changed such that pool manager never retries requests internally. Instead, a pool selection failure causes the request to fail and be sent back to the door or pin manager. It is at the discretion of the requester to query PNFS manager for updated meta data and to retry the request. A consequence is that pool selection latency is reduced and that the retry logic can be tuned for every type of door. For instance, xrootd doors can rely on clients retrying requests and the door thus propagates a failure all the way back to the client. The SRM door on the other hand may return SRM_FILE_UNAVAILABLE, letting the client know that the pool with the file is offline. An FTP door will retry the pool selection internally.
The logic for suspending requests has not changed. A request that repeatedly fails will eventually get suspended. As before, doors will wait for a suspended request to be unsuspended.
The pool selection process consists of two parts: The first part uses an admin configured rule engine. This is called the pool selection unit and controls to and from which pools particular files may be written, read or staged. Once a set of candidate pools has been determined, the second step chooses one of those based on criteria such as free space and load.
The second step is now pluggable. This means that the admin may choose among several selection algorithms, and that third party developers may write custom plugins to further tweak and tune dCache.
To support such pluggable selection algorithms, the partition manager was rewritten. Partition manager allows several sets of parameters (partitions) to be defined and associated with different links. This mechanism has now been extended such that the pool selection logic itself is part of a partition. The partition is pluggable such that different links may use different pool selection strategies.
As part of the rewrite, several partition manager related commands have changed. In particular pm create now takes a plugin type parameter and the output format of pm ls has changed.
Important: Before upgrading, PoolManager.conf MUST be regenerated by using the save command using either 1.9.12-9, 1.9.13-3 or newer. dCache will fail to start if PoolManager.conf contains any of the obsolete commands. If third party scripts are used to generate PoolManager.conf, then these scripts will likely have to be updated (see the release notes of dCache 2.0 for details).
As before, a partition named default is hardcoded into the system. It is used by all links that do not explicitly define the partition to use. The default partitions can however be recreated/overwritten using the pm create command. Doing so allows the partition type to be set.
Each partition has its own set of parameters and different types of partitions may support different sets of parameters. All partitions inherit from a global set of parameters. This set is modified using pm set -option=value (ie without specifying a partition name). Note: For legacy installations in which the default partition has not been explicitly recreated, the parameters of the default partition and the set of parameters inherited by other partitions are identical. This is done to ensure backwards compatibility with old pool manager configurations. Once pm create default is used this coupling is removed. We recommend that you create the default partition explicitly and regenerate the pool manager configuration. Support for legacy configurations will be removed in a future update.
The following partition types are supported:
Third party plugins providing additional partition types can be installed.
wass is the new default pool selection algorithm for new installations. Existing installations will continue to use the classic algorithm, however we encourage sites to transition to the wass algorithm.
The internals of the wass algorithm is much more complicated than the classic algorithm, however tuning it should be considerably easier, with less request clumping and more uniform filling of pools.
How to switch to WASS
First, inspect the existing partitions and parameters using pm ls -l. Keep the output for reference. The output will also show the partition type of each partition. To change the type the partition has to be recreated using the pm create command, eg:
(PoolManager) admin > pm create -type=wass defaultNote that this will reset the parameters of the partition. If you have partition specific parameters, like a replication threshold, then these need to be set again using pm set. It either case, it is a good idea to reset the cpucostfactor and spacecostfactor to their default values, eg:
(PoolManager) admin > pm set default -spacecostfactor=1.0 -cpucostfactor=1.0
The read pool selection is identical to the classic algorithm: In essence, the set of pools able to serve the file is computed, and the pool with the lowest performance cost is selected. Idle cost cuts, fall back cost cuts, etc are processed as before. It should be noted that space cost factor and cpu cost factor have no influence on read pool selection.
The crucial difference in wass is the write pool selection step: Essentially, pools are selected randomly with a weight computed from the available space of the pool. The weight is however adjusted to take the current write load and the garbage collectible files into account. The higher the current write load (number of write movers), the less likely a pool is selected. The more garbage collectible files and the older the last access time of those files, the more likely a pool is selected.
WASS parameters
The WASS algorithm can be tuned by using the following parameters:
- breakeven
- Set per pool. The value must be between 0.0 and 1.0. High values of breakeven mean that old files are more valuable. Low values mean that old files quickly become invaluable. Pools with invaluable files are more likely to be selected.
- mover cost factor and cpu cost factor
- The mover cost factor is set per pool, while the cpu cost factor is set per partition in the pool manager. The product of these two factors allows the aggressiveness of the write load feedback signal to be adjusted. A low value means we expect pools to scale well with load. A value of 0.0 means that load information is ignored completely; no feedback is used. A negative value would mean that a busy pool becomes more attractive; hardly a useful configuration. The intuitive meaning of the product is that for a value of f, the probability of choosing a pool is halved for every 1/f concurrent writers.
- space cost factor
- Set per partition in pool manager. Intuitively, the larger the value the more we prefer pools with free space. For a value of 1.0 the probability of selecting a pool is proportional with available space. With smaller values the role of available space drops, until at 0.0 available space no longer influences pool selection. For negative values the algorithm will give higher preference to pools with less free space, but that is hardly a useful configuration. At values higher than 1.0 we give additional preference to pools with free space.
The following table lists the useful range, the default value, and special values for all four parameters.
Parameter Useful range Default Special values mover cost factor Non-negative 0.5 0.0 means that write load has no influence on pool selection for this pool. The useful range of the product of mover cost factor and cpu cost factor is between 0.0 and 1.0. cpu cost factor Non-negative 1.0 0.0 means that write load has no influence on pool selection for this partition. The useful range of the product of mover cost factor and cpu cost factor is between 0.0 and 1.0. space cost factor Non-negative 1.0 0.0 means that free or garbage collectable space has no influence on pool selection. 1.0 means that pools are selected with a probability proportional to free space. breakeven [0;1] 0.7 0.0 means that garbage collectible files are considered as free space. 1.0 means that garbage collectible files are considered as used space. It is unlikely that large values for any of the above parameters leads to useful results.
Except for the mover cost factor all parameters exist for the classic algorithm too. They serve similar roles and increasing or decreasing either parameter has similar effects. However the details for how these parameters are used has changed significantly and we strongly recommend starting out with the defaults values. The exact mathematical meaning is unimportant at this stage. In our experience the default values are pretty good for most cases. We expect the tuning process to be iterative, with small changes to the above four parameters being applied, followed by an observation phase.
Here are some general tips for tuning: If you want pools with more free space to fill more quickly then increase the space cost factor. If you want pools with free space to attract fewer transfers then reduce the space cost factor. If you want unaccessed files to be garbage collected more aggressively then reduce the breakeven parameter. If you want unaccessed files to be kept longer then increase the breakeven value. If you want the write load to have a higher impact on write pool selection then increase either the cpu cost factor or the mover cost factor (depending on whether the effect should be for individual pools or for all pools). Conversely, reduce either factor to reduce the effect write load has on pool selection. Remember than these two factors are multiplied, so setting either to zero means write load is not taken into account.
One final point to note compared to the classic algorithm is that read movers have no direct influence on write pool selection. This is on purpose and prevents that popular files on particular pools lead to increased write clumping (which in turns leads to additional read clumping in the future, resulting in a negative feedback loop). If pool performance is significantly degraded by read access patterns, then write movers will eventually accumulate and result in a lower probability for the pool to be selected for further writes.
Several psu commands now accept wildcard (glob patterns). Use help psu to see which commands accept wildcards.
In previous releases pool manager would initiate a stage for any file if a disk copy was not online. It did so even for files for which no tape location was known. Starting with dCache 2.1, pool manager will only generate a stage request for files with a known tape location.
Sites that rely on the previous behaviour to import data stored to tape without dCache should contact support@dcache.org.
Pin manager is used by SRM and DCAP to trigger staging from tape and to ensure that the file is not garbage collected for a certain amount of time. It does this by placing a sticky flag (a pin) on the file on one of the pools.
In previous versions pin manager would unconditionally delegate pool selection to pool manager. Now, pin manager will handle some cases without delegating pool selection to pool manager. This is the case when a file is already online, or when a disk only file is offline. In other cases, eg when a pool to pool transfer or a stage from tape is required, pin manager continues to delegate pool selection to pool manager.
The benefit of running the pool selection algorithm in pin manager is that it reduces latency for the common cases that don't require any internal transfers. It also reduces load on pool manager.
Pool selection in pin manager is implemented by periodically exporting a snapshot of the configuration and pool status information from pool manager. Changes to the pool manager configuration may take up to 30 seconds to propagate to pin manager.
The stored PostgreSQL procedures used by Chimera have been updated. During upgrade, the SQL script to create/update the stored procedures has to be applied:
$ psql -U postgres -f /usr/share/dcache/chimera/sql/pgsql-procedures.sql chimera
The mode, owner and group of directory tags can now be changed (using the regular chmod, chown and chgrp utilities).
The command chimera-cli checksum was added to query the checksum of a file.
The checksum scanner has been extended with configurable Continous background checksumming. Any checksum errors are logged and files are marked as broken and will not be available for download. The new -scrub option of the csm set policy command allows the feature to be enabled. Eg.
(pool_0) admin > csm set policy -scrub -limit=2 -period=720
Consult the help output of that command for information about setting throughput limits and scan frequency.
Pool to pool transfers used to use DCAP. Pool to pool transfers now use HTTP. This change should be transparent and a WebDAV door is not required. The primary observable change is that the TCP connection between source and destination pool is now created from the destination pool to the source pool. In previous versions the direction was reversed. Since HTTP is classified as a WAN protocol, the port used by the source pool to listen for the TCP connection will be allocated from the WAN port range. The pool CLI commands pp set port and pp set listen are deprecated and replaced by the command pp interface. The pp set listen command however calls through to the pp interface command, meaning that old pool configurations will work as is. Important: Due to this change, new pools are only compatible with pools of releases 1.9.12-11, 1.9.13-4 and newer.
Parts of the info-provider configuration that used to be in /etc/dcache/info-provider.xml was moved to the /etc/dcache/dcache.conf. Therefore, you will need to edit dcache.conf so that it contains your configuration. See the defaults file /usr/share/dcache/defaults/info-provider.properties for the list of affected properties. You may want to recreate /etc/dcache/info-provider.xml to get rid of the excess configuration.
Note that, if you are using the default value for an info-provider property then you do not need to configure that property in dcache.conf: the default value will be used automatically.
GLUE2 compliance has been improved. Be sure you have at least v2.0.8 of glue-schema RPM installed on the node running the info provider.
The admin shell has received numerous improvements. Support for version 2 of the ssh protocol has been added and is available on port 22224. Support for version 1 still exists and is needed for the dCache GUI. Although allowed by the protocol, we have not been able to implement support for both protocols on the same TCP port. Support for version 1 will eventually be removed.
Color highlighting as well as limited tab completion has been added.
The output format of messages written to billing files is now configurable. Have a look at /usr/share/dcache/defaults/billing.properties for details about available formats.
The billing database schema has changed. The schema is automatically updated during upgrade. Downgrade is not possible once upgraded.
When using the billing database, the httpd service is able to generate plots from the information in the database. Support is enabled by setting the billingToDb property to yes for the httpd service. The plots are available under http://admin.example.org:2288/billingHistory/.
The SRM list operation provides information about file locality, among other things. In previous versions the SRM door would query pool manager to compute the file locality for each file being listed. dCache now computes the file locality internally in the SRM. The effect is that latency is reduced. The algorithm relies on a periodic snapshot of the pool manager configuration and pool state being transferred from pool manager to the SRM door (similar to how it is now done in pin manager).
The new srmPinOnlineFiles property controls whether dCache pins files that have ONLINE access-latency. If set to false then dCache will refrain from pinning ONLINE files; dCache still ensures that the file is available on a read pool before returning the transfer URL to the client, but no guarantee is made that the file will not be garbage collected before the transfer URL expires.
Pinning ONLINE files
In previous versions of dCache, when SRM clients asks dCache to prepare a file for download, the SRM door would always ask the pin manager to pin the file. This was to ensure that the file is indeed online, that the file's data is available on a pool the user may read from, and that the data will not be garbage collected during the transfer URL's lifetime. A correct implementation of the SRM protocol must provide these three guarantees so, for the general case, pinning is required even when access latency is ONLINE.
The disadvantage of always pinning ONLINE files is that it introduces latency that, in many cases, is unnecessary; for example, if a file is permanently available on a pool that the end user can read from then pinning the file is unnecessary.
Some dCache deployments only store files on pools that are readable: they have no pools dedicated for writing or staging. Pinning ONLINE files isn't required for such deployments as dCache already makes the necessary guarantees.
Other sites may know that the risk of a replicated file becoming garbage-collected during the lifetime of the transfer URL is small. If it is garbage-collected then opening the file will still succeed, but will incur a delay. The site-admin may know that their user community will accept this small risk in exchange for improved throughput, in which case pinning ONLINE files is unnecessary.
A side effect of disabling srmPinOnlineFiles is that it becomes possible to setup a tapeless system without pin manager. The default access latency in dCache is however NEARLINE, even when no HSM system is attached. The access latency has to be changed to ONLINE if dCache is to run without a pin manager (the system wide default is controlled through the DefaultAccessLatency property in PNFS manager).
Due to the changes to pin manager, SRM can now report SRM_FILE_UNAVAILABLE if files are offline, that is, when the pools holding the file are down and no tape copy is available.
The SRM protocol allows the client to request that existing files are overwritten upon upload. By default dCache rejects to follow this option. Instead it always refuses to overwrite a file.
The configuration property overwriteEnabled allowed this behaviour to be changed. When set to true the SRM would respect the overwrite request of the client. The way this was implemented however meant that the option had to be set in all doors that the SRM could redirect to. Thus one was faced with the choice of either not honoring the clients request or to enable overwrite by default for all other protocols too.
The handling of the overwrite flag in the SRM has changed. When overwriteEnabled is set to true and the client requests to overwrite a file, then the SRM will delete the file before redirecting to another door. This means that an SRM door can now honor the clients request even when all other doors are configured not to overwrite existing files.
srmCopy with GridFTP used to use GSI delegation to transfer credentials from the SRM door to the pool. This was a slow and CPU intensive process. The SRM door and pool has now been updated to transfer the credentials through the dCache internal message passing mechanism.
We assume that the message passing mechanism is secure. Care should be taken to properly firewall access to the message passing system. This should be done whether srmCopy is used or not.
nfsv41 doors now register with LoginBroker using the file:// protocol. This allows SRM to produce TURLs for this protocol.
Support for running the srm service so that it directly uses a custom kpwd file (independent of the rest of dCache) has been removed. All SRM authentication and authorisation activity must now go through gPlazma. Note that it is possible to configure SRM to use a different gPlazma configuration from the rest of dCache by 'embedding' gPlazma (see useGPlazmaAuthorizationModule and useGPlazmaAuthorizationCell options). These options, along with gPlazma's kpwd plugin, allow for an equivalent configuration.
The srm service now listens on two ports. These are, by default, 8443 (as before) and 8445. Port 8443 continues to be for SRM clients that use GSI-based communication and the new port is for SRM clients that use SSL. While the SSL port is configurable, it is recommended to use the default as this an agreed port-number for SSL-based SRM traffic.
Support for RPCSEC_GSS security was added to the NFS 4 door. To enable it, the following configuration is required:
nfs.rpcsec_gss=true kerberos.realm=EXAMPLE.ORG kerberos.jaas.config=/etc/dcache/gss.conf kerberos.key-distribution-center-list=your.kdc.server
com.sun.security.jgss.accept { com.sun.security.auth.module.Krb5LoginModule required doNotPrompt=true useKeyTab=true keyTab="${/}etc${/}krb5.keytab" debug=false storeKey=true principal="nfs/nfs-door.example.org@EXAMPLE.ORG"; };
The mapping is typically a two step process, with a mapping from KerberosPrincipal to UserNamePrincipal using either the krb5 plugin (which simply strips the domain suffix, ie it maps user@example.org to user) or the gridmap plugin with a mapping like
"user@EXAMPLE.ORG" some-user
followed by a mapping from UserNamePrincipal to UID and GID using the authzdb plugin or maybe the new nss plugin.
The kpwd plugin can also be used to achieve both mappings in one plugin.The dCache NFS implementation supports the following RPCSEC_GSS QOPs (quality of protection):
These correspond to krb5, krb5i and krb5p mount options, for example:
# mount -o krb5i server:/export /local/path
Notice, that all data access with NFS 4.1 uses the same QOP as it was specified for mount, e.g, if privacy was requested at the mount time, then all NFS traffic including data coming from pools will be encrypted.
ACLs can now be queried and updated through a mounted NFS 4.1 file system. No special configuration is required. Eg:
$ nfs4_getfacl /pnfs/desy.de/data/generated/acl-test
$ nfs4_setfacl -a A::tigran@desy.afs:arw
FTP doors now support renaming of files. This is provided through the RNFR and RNTO commands defined by RFC 959.
To use this functionality you must use a client that supports renaming. UberFTP supports renaming.
A service definition for hopping manager was added. The name of the new service is hopping.
Tab completion for the Bash shell was added. The FHS RPM and DEB packages automatically install the dcache.bash-completion script.
The subcommand dcache ports lists all used TCP and UDP ports and port ranges of configured services. Use the command like this:
$ dcache portsDOMAIN CELL SERVICE PROTO PORT dCacheDomain - - TCP 11112 dCacheDomain - - TCP 11113 dCacheDomain - - TCP 11111 dCacheDomain - - UDP 11111 dCacheDomain - - UDP 0 dCacheDomain httpd httpd TCP 2288 dCacheDomain info info TCP 22112 dCacheDomain DCap-gsi-dcache-vm gsidcap TCP 22128 dCacheDomain SRM-dcache-vm srm TCP 8443 dCacheDomain Xrootd-dcache-vm xrootd TCP 1094 dCacheDomain WebDAV-dcache-vm webdav TCP 2880 adminDomain - - UDP 0 adminDomain alm admin TCP 22223 namespaceDomain - - UDP 0 namespaceDomain NFSv3-dcache-vm nfsv3 TCP (111) namespaceDomain NFSv3-dcache-vm nfsv3 TCP 2049 namespaceDomain NFSv3-dcache-vm nfsv3 UDP (111) namespaceDomain NFSv3-dcache-vm nfsv3 UDP 2049 pool - - UDP 0 pool pool_0 pool TCP 20000-25000 pool pool_0 pool TCP 33115-33145 pool pool_1 pool TCP 20000-25000 pool pool_1 pool TCP 33115-33145 pool pool_2 pool TCP 20000-25000 pool pool_2 pool TCP 33115-33145 gridftp-Domain - - UDP 0 gridftp-Domain GFTP-dcache-vm gridftp TCP 2811 gridftp-Domain GFTP-dcache-vm gridftp TCP 20000-25000 testDomain - - UDP 0 testDomain pool10 pool TCP 20000-25000 testDomain pool10 pool TCP 33115-33145 Ports with '-' under the CELL and SERVICE columns provide inter-domain communication for dCache. They are established independently of any service in the layouts file and are configured by the broker.* family of properties. Entries where the port number is zero indicates that a random port number is chosen. The chosen port is guaranteed not to conflict with already open ports.
The dcache pool convert command replaces the existing procedure for converting between pool meta data repository formats. The subcommand supports conversion from file to db and from db to file. The metaDataRepositoryImport configuration property is no longer supported. Use the command like this:
$ dcache pool convert pool_0 dbINFO - Copying 000097E9203C0F264F8380C3014BCF405783 (1 of 540) INFO - Copying 0000F881F830AF3E471C9263D8752D5A4BA2 (2 of 540) ... INFO - Copying 0000DBFEDCB237C24DFA92E48ED9FCD6782D (539 of 540) INFO - Copying 00009AAC974918F4452AB7E08DF97DF477B3 (540 of 540) The pool meta data database of 'pool_0' was converted from type org.dcache.pool.repository.meta.file.FileMetaDataRepository to type org.dcache.pool.repository.meta.db.BerkeleyDBMetaDataRepository. Note that to use the new meta data store, the pool configuration must be updated by adjusting the metaDataRepository property, eg, in the layout file: metaDataRepository=org.dcache.pool.repository.meta.db.BerkeleyDBMetaDataRepository
The dcache pool yaml subcommand replaces the meta2yaml utility. The command dumps the pool meta data repository data to YAML format. Both the db and file pool backends are supported. Use the command like this:
$ dcache pool yaml pool_0000097E9203C0F264F8380C3014BCF405783: state: CACHED sticky: system: -1 storageclass: myStore:STRING cacheclass: null bitfileid: <Unknown> locations: hsm: osm filesize: 954896 map: uid: -1 StoreName: myStore gid: -1 path: /pnfs/dcache-vm/data/test-1314688795-15 SpaceToken: 18090188 retentionpolicy: REPLICA accesslatency: ONLINE 0000F881F830AF3E471C9263D8752D5A4BA2: state: CACHED sticky: system: -1 storageclass: myStore:STRING cacheclass: null bitfileid: <Unknown> locations: hsm: osm filesize: 954896 map: uid: -1 StoreName: myStore gid: -1 path: /pnfs/dcache-vm/data/test-1314817040-7 SpaceToken: 18100092 retentionpolicy: REPLICA accesslatency: ONLINE ....
Parsers for YAML are available for all major scripting languages. The dcache pool yaml command is the preferred mechanism for direct access (ie without using the pool) to the meta data of a dCache pool.
Two minor additions to the configuration language have been made. Both additions allow us to catch more misconfigurations and the additions provide added documentation value.
The immutable annotation means that the value of a property cannot be modified. We use this annotation to mark properties for internal use.
The one-of annotation limits properties to have one of a limited set of values. We typically use this annotation for boolean properties or enumerations.
Use this checklist to plan the upgrade to 2.2. The checklist focuses on upgrading a single node and focuses on a manual upgrade. Upgrades involving installation scripts like dCacheConfigure, YAIM, or site specific deployment scripts are not covered by this process.
(local) admin > cd PoolManager
(PoolManager) admin > save
$ psql -U postgres -f /usr/share/dcache/chimera/sql/pgsql-procedures.sql chimeraYou may have to provide a user name or password depending on your PostgreSQL configuration.
Term | Description |
---|---|
cell | A component of dCache. dCache consists of many cells. A cell must have a name which is unique within the domain hosting the cell. |
domain | A container hosting one or more dCache cells. A domain runs within its own process. A domain must have a name which is unique throughout the dCache instance. |
service | An abstraction used in dCache configuration files to describe atomic units to add to a domain. A service is typically implemented through one or more cells. |
layout | Definition of domains and services on a given host. The layout is specified in the layout file. The layout file may contain both domain- and service- specific configuration values. |
pool | A service providing physical data storage. |
This section lists all supported services. Those marked with a * are services that dCache requires to function correctly.
Name | Decscription |
---|---|
broadcast∗ | Internal message broadcast service. |
cleaner∗ | Service to remove files from pools and tapes when the name space entry is deleted. |
cns | Cell naming service used in conjuction with JMS for well known name lookup. |
dir | Directory listing support for DCAP. |
gplazma | Authorization cell |
hopping | Internal file transfer orchestration. |
loginbroker | Central registry of all doors. Provides data to SRM for load balancing. |
pinmanager | Pinning and staging support for SRM and DCAP. |
pnfsmanager∗ | Gateway to name space (either PNFS or Chimera). |
pool∗ | Provides physical data storage. |
poolmanager∗ | Central registry of all pools. Routes transfers to pools, triggers staging from tape, performs hot spot detection. |
replica | Manages file replication for Resilient dCache. |
spacemanager | Space reservation support for SRM. |
srm-loginbroker | Central registry of all SRM doors. |
Name | Decscription |
---|---|
admin | SSH based admin shell. |
billing∗ | Service for logging to billing files or the billing database. |
httpd | Legacy monitoring portal. Depends on: loginbroker, topo. |
info | Info service that collects information about the dCache instance. Recommends: httpd |
statistics | Collects usage statistics from all pools and generates reports in HTML. |
topo | Builds a topology map of all domains and cells in the dCache instance. |
webadmin | Web admin portal. Depends on: info |
Name | Decscription |
---|---|
authdcap | Authenticated DCAP door. Depends on: dir. Recommends: pinmanager. |
dcap | dCap door. Depends on: dir. Recommends: pinmanager. |
gsidcap | GSI dCap door. Depends on: dir. Recommends: pinmanager. |
kerberosdcap | Kerberized dCap door. Depends on: dir. Recommends: pinmanager. |
ftp | Regular FTP door without strong authentication. |
gridftp | GridFTP door. |
kerberosftp | Kerberized FTP door. |
nfsv3 | NFS 3 name space export (only works with Chimera). |
nfsv41 | NFS 4.1 door (only works with Chimera). |
srm | SRM door. Depends on: pinmanager, loginbroker, srm-loginbroker. Recommends: transfermanagers, spacemanager. |
transfermanagers | Server side srmCopy support for SRM. |
webdav | HTTP and WebDAV door. |
xrootd | XROOT door. |
Name | Reason |
---|---|
acl | Integrated into pnfsmanager service. |
dummy-prestager | DCAP uses pinmanager for staging. |
The following gPlazma 2 plugins ship with dCache and can be used in gplazma.conf. Note that several plugins implement more than one type. Usually such plugins should be added to all phases supported by the plugin.
Name | Type | Description |
---|---|---|
gplazma1 | auth | Legacy support for dcachesrm-gplazma.policy configuration. |
jaas | auth | Implements password authentication through the Java Authentcation and Authorization Services (JAAS). A valid JAAS setup for password verification has to be defined in etc/jgss.conf. Fails if no password credential is provided or if JAAS denies the login. A username principal is generated upon success. |
kpwd | auth | Implements password authentication using the kpwd file. Fails if no password credential is provided, if the username is not defined in the kpwd file, if the password is invalid, or if the entry has been disabled in the kpwd file. |
voms | auth | Validates any VOMS attributes in an X.509 certificate and extracts all valid FQANs. Requires that a vomsdir is configured. Fails if no valid FQAN is found. |
x509 | auth | Extracts the DN from an X.509 certificate. The certificate chain is not validated (it is assumed that the door already validated the chain). The plugin fails if no certificate chain is provided. |
xacml | auth | |
authzdb | map | Maps user and group name principals to UID and GID principals according to a storage authzdb file. The file format does not distinguish between user names and group names and hence each entry in the file maps to both a UID and one or more GIDs. Therefore the UID and the primary GID are determined by the mapping for the primary group name or user name. The name of that mapping is kept as the user name of the login and may be used for a session plugin or for authorization in space manager. Remaining GIDs are collected from other mappings of available group names. |
gplazma1 | maph | Legacy support for dcachesrm-gplazma.policy configuration. |
gridmap | map | Maps DN principals to user name principals according to a grid-mapfile. Fails if no DN was provided or no mapping is found. |
kpwd | map | Maps user names, DNs and Kerberos principals according to the kpwd file. Only user names verified by the kpwd auth plugin are mapped. Fails if nothing was mapped or if the kpwd entry has been disabled. Maps to user name, UID and GID principals. |
krb5 | map | Maps Kerberos principals to username principals by stripping the domain suffix. |
nis | map | Maps user name principals to UID and GID through lookup in NIS. |
nsswitch | map | Maps user name to UID and GID according to the system native Name Service Switch. |
vorolemap | map | Maps FQAN principals to group name principals according to a grid-vorolemap file. Each FQAN is mapped to the first entry that is a prefix of the FQAN. The primary FQAN (the first in the certificate) is mapped to the primary group name. Fails if no FQAN was provided or no mapping was found. |
argus | account | |
kpwd | account | Fails if the kpwd entry used during the map has been disabled. |
authzdb | session | Associates a user name with root and home directory and read-only status according to a storage authzdb file. |
gplazma1 | session | Legacy support for dcachesrm-gplazma.policy configuration. |
kpwd | session | Adds home and root directories and read-only status to the session. Only applies to mappings generated by the kpwd map plugin. |
nis | session | Associates a user name with a home directory through NIS lookup. The sessions root directory is always set to root and the session is newer read-only. |
nsswitch | session | Sets the session home directory and root directory to the file system root, and sets the session's read-only status to false. |
nis | identity | Maps user name principals to UID and group name principals to GID. |
nsswitch | identity | Maps user name principals to UID and group name principals to GID. |
Please consult /opt/d-cache/share/defaults/gplazma.properties for details about available configuration properties.
Most configuration properties are unchanged. Some have however been removed or replaced and others have been added. The following tables provide an overview of the properties that may need to be changed when upgrading from dCache 1.9.12 to 2.2.
Property | Alternative | Description |
---|---|---|
gplazmaPolicy | gplazma.legacy.config | Location of legacy gPlazma configuration file. |
Property | Default | Description |
---|---|---|
pnfsVerifyAllLookups | false | Whether to verify lookup permissions for the entire path. |
srmPinOnlineFiles | true | Whether to pin disk files |
nfs.port | 2049 | TCP port used by NFS doors. |
nfs.v3 | false | Whether to enable NFS 3 support in NFS 4.1 door. |
nfs.domain | The local NFSv4 domain name. | |
nfs.idmap.cache.size | 512 | Principal cache size of NFS door. |
nfs.idmap.cache.timeout | 30 | Principal cache timeout of NFS door. |
nfs.idmap.cache.timeout.unit | SECONDS | Principal cache timeout unit of NFS door. |
nfs.rpcsec_gss | false | Whether to enable RPCSEC_GSS for NFS 4.1 door. |
info-provider.site-unique-id | EXAMPLESITE-ID | Single words or phrases that describe your site. |
info-provider.se-unique-id | dcache-srm.example.org | Your dCache's Unique ID. |
info-provider.se-name | A human understandable name for your SE. | |
info-provider.glue-se-status | UNDEFINEDVALUE | Current status of dCache |
info-provider.dcache-quality-level | UNDEFINEDVALUE | Maturity of the service in terms of quality of the software components. |
info-provider.dcache-architecture | UNDEFINEDVALUE | the architecture of the storage |
info-provider.dit | resource | |
info.provider.paths.tape-info | /usr/share/dcache/xml/tape-info-empty.xml | Location of tape accounting information. |
collectorTimeout | 5000 | Webadmin timeout for the data collecting cell. |
transfersCollectorUpdate | 60 | Webadmin update time for the data collecting cell. |
webdavBasicAuthentication | false | Whether HTTP Basic authentication is enabled. |
admin.colors.enable | true | Whether to enable color output of admin door. |
sshVersion | both | Which version of the SSH protocol to use for the admin door. |
admin.ssh2AdminPort | 22224 | Port to use for SSH 2. |
admin.authorizedKey2 | /etc/dcache/admin/authorized_keys2 | Authorized keys for SSH 2. |
admin.dsaHostKeyPrivate | /etc/dcache/admin/ssh_host_dsa_key | Location of SSH 2 private key. |
admin.dsaHostKeyPublic | /etc/dcache/admin/ssh_host_dsa_key.pub | Location of SSH 2 public key. |
broker.messaging.port | 11111 | TCP port used for cells messaging. |
broker.client.port | 0 | UDP port for cells messaging client. |
billing.formats.MoverInfoMessage | See /usr/share/dcache/defaults/billing.properties | |
billing.formats.RmoveFileInfoMessage | See /usr/share/dcache/defaults/billing.properties | |
billing.formats.DoorRequestInfoMessage | See /usr/share/dcache/defaults/billing.properties | |
billing.formats.StorageInfoMessage | See /usr/share/dcache/defaults/billing.properties | |
gplazma.nis.server | nisserv.domain.com | NIS server contacted by gPlazma NIS plugin. |
gplazma.nis.domain | domain.com | NIS domain used by gPlazma NIS plugin. |
xrootd.gsi.hostcert.key | /etc/grid-security/hostkey.pem | Host key used by xrootd GSI plugin. |
xrootd.gsi.hostcert.cert | /etc/grid-security/hostcert.pem | Host certificated used by xrootd GSI plugin. |
xrootd.gsi.hostcert.refresh | 43200 | Host certificate reload period used by xrootd GSI plugin. |
xrootd.gsi.hostcert.verify | true | |
xrootd.gsi.ca.path | /etc/grid-security/certificates | CA certificates used by xrootd GSI plugin. |
xrootd.gsi.ca.refresh | 43200 | CA certificate reload period used by xrootd GSI plugin. |
webadmin.warunpackdir | /var/tmp | Place to unpack Webadmin WAR file. |
gplazma.jaas.name | gplazma | JAAS application name. |
ftp.read-only | false (true for weak ftp) | Whether FTP door allows users to modify content. |
srm.ssl.port | 8445 | TCP port for SRM over SSL. |
httpd.static-content.plots | /var/lib/dcache/plots | Where to look for billing plot files. |
httpd.static-content.plots.subdir | /plots | URI path element for billing plot files. |
Name | Description |
---|---|
aclTable | ACLs are now part of Chimera |
aclConnDriver | ACLs are now part of Chimera |
aclConnUrl | ACLs are now part of Chimera |
aclConnUser | ACLs are now part of Chimera |
aclConnPaswd | ACLs are now part of Chimera |
gplazma.version | gPlazma 2 is the only version of gPlazma included. |
srmGssMode | SRM over SSL now has a dedicated port. |
billingDb | Use billingLogsDir. |
Most of the following forbidden properties were marked as obsolete or deprecated in 1.9.12.
Property | Alternative |
---|---|
namespaceProvider | dcache.namespace |
webdav.templates.list | webdav.templates.html |
metaDataRepositoryImport | Use dcache pool convert command. |
SpaceManagerDefaultAccessLatency | DefaultAccessLatencyForSpaceReservation |
keyBase | dcache.paths.ssh-keys |
kerberosScvPrincipal | kerberos.service-principle-name |
gsiftpDefaultStreamsPerClient | Forbidden by GridFTP protocol. |
gPlazmaNumberOfSimutaneousRequests | gPlazmaNumberOfSimultaneousRequests |
srmDbHost | srmDatabaseHost |
srmPnfsManager | pnfsmanager |
srmPoolManager | poolmanager |
srmNumberOfDaysInDatabaseHistory | srmKeepRequestHistoryPeriod |
srmOldRequestRemovalPeriodSeconds | srmExpiredRequestRemovalPeriod |
srmJdbcMonitoringLogEnabled | srmRequestHsitoryDatabaseEnabled |
srmJdbcSaveCompletedRequestsOnly | srmStoreCompletedRequestsOnly |
srmJdbcEnabled | srmDatabaseEnabled |
java | Use JAVA_HOME environment variable. |
java_options | dcache.java.options or dcache.java.options.extra |
user | dcache.user |
pidDir | dcache.pid.dir |
logArea | dcache.log.dir |
logMode | dcache.log.mode |
classpath | dcache.java.classpath |
librarypath | dcache.java.library.path |
kerberosRealm | kerberos.realm |
kerberosKdcList | kerberos.key-distribution-center-list |
authLoginConfig | kerberos.jaas.config |
messageBroker | broker.scheme |
serviceLocatorHost | broker.host |
serviceLocatorPort | broker.port |
amqHost | broker.amq.host |
amdPort | broker.amq.port |
amqSSLPort | broker.amq.ssl.port |
amqUrl | broker.amq.url |
ourHomeDir | dcache.home |
portBase | set protocol-specific default ports |
httpPortNumber | webdavPort |
httpRootPath | webdavRootPath |
httpAllowedPaths | webdavAllowedPaths |
webdavContextPath | webdav.static-content.location |
cleanerArchive | cleaner.archive |
cleanerDB | cleaner.book-keeping.dir |
cleanerPoolTimeout | cleaner.pool-reply-timeout |
cleanerProcessFilesPerRun | cleaner.max-files-in-message |
cleanerRecover | cleaner.pool-retry |
cleanerRefresh | cleaner.period |
hsmCleaner | cleaner.hsm |
hsmCleanerFlush | cleaner.hsm.flush.period |
hsmCleanerRecover | cleaner.pool-retry |
hsmCleanerRepository | cleaner.hsm.repository.dir |
hsmCleanerRequest | cleaner.hsm.max-files-in-message |
hsmCleanerScan | cleaner.period |
hsmCleanerTimeout | cleaner.hsm.pool-reply-timeout |
hsmCleanerTrash | cleaner.hsm.trash.dir |
hsmCleanerQueue | cleaner.hsm.max-concurrent-requests |
trash | cleaner.trash.dir |
httpHost | info-provider.http.host |
xsltProcessor | info-provider.processor |
xylophoneConfigurationFile | info-provider.configuration.file |
saxonDir | info-provider.saxon.dir |
xylophoneXSLTDir | info-provider.xylophone.dir |
xylophoneConfigurationDir | info-provider.configuration.dir |
images | httpd.static-content.images |
styles | httpd.static-content.styles |