release notes | Book: 1.9.5, 1.9.12 (opt, FHS), 2.11 (FHS), 2.12 (FHS), 2.13 (FHS), 2.14 (FHS), | Wiki | Q&A black_bg
Web: Multi-page, Single page | PDF: A4-size, Letter-size | eBook: epub black_bg

Migration Module

The migration module is a component of dCache pools introduced in version 1.9.1. The purpose of the component is essentially to copy or move the content of a pool to one or more other pools. The migration module replaces the copy manager found in previous releases of dCache. We advice against using the old copy manager, as it known to have problems.

Typical use cases for the migration module include:

  • Vacating pools, that is, moving all files to other pools before decomisioning the pool.

  • Caching data on other pools, thus distributing the load and increasing availability.

  • As an alternative to the hopping manager.

[return to top]

Overview and Terminology

The migration module runs inside pools and hosts a number of migration jobs. Each job operates on a set of files on the pool on which it is executed and can copy or move those files to other pools. The migration module provides filters for defining the set of the files on which a job operates.

The act of copying or moving a file is called a migration task. A task selects a target pool and asks it to perform a pool to pool transfer from the source pool. The actual transfer is performed by the same component performing other pool to pool transfers. The migration module does not perform the transfer; it only orchestrates it.

The state of the target copy (the target state) as well as the source copy (the source state) can be explicitly defined. For instance, for vacating a pool the target state is set to be the same as the original source state, and the source state is changed to removed; for caching files, the target state is set to cached, and the source state is unmodified.

Sticky flags owned by the pin manager are never touched by a migration job, however the migration module can ask the pin manager to move the pin to the target pool. Care has been taken that unless the pin is moved by the pin manager, the source file is not deleted by a migration job, even if asked to do so. To illustrate this, assume a source file marked precious and with two sticky flags, one owned by foobar and the other by the pin manager. If a migration job is configured to delete the source file, but not to move the pin, the result will be that the file is marked cached, and the sticky flag owned by foobar is removed. The pin remains. Once it expires, the file is eligible for garbage collection.

All operations are idempotent. This means that a migration job can be safely rerun, and as long as everything else is unchanged, files will not be transferred a again. Because jobs are idempotent they do not need to maintain persistent state, which in turns means the migration module becomes simpler and more robust. Should a pool crash during a migration job, the job can be rerun and the remaining files will be transfered.

It is safe to run migration jobs while pools are in use. Once started, migration jobs run to completion and do only operate on those files that matched the selection filters at the time the migration job started. New files that arrive on the pool are not touched. Neither are files that change state after a migration job has been initialized, even though the selection filters would match the new state of the file. The exception to the rule is when files are deleted from the pool or change state such that they do no longer match the selection filter. Such files will be excluded from the migration job, unless the file was already processed. Rerunning a migration job will force it to pick up any new files. Because the job is idempotent, any files copied before are not copied again.

Permanent migration jobs behave differently. Rather than running to completion, permanent jobs keep running until explicitly cancelled. They monitor the pool for any new files or state changes, and dynamically add or remove files from the transfer queue. Permanent jobs are made persistent when the save command is executed and will be recreated on pool restart. The main use case for permanent jobs is as an alternative to using a central hopping manager.

Idempotence is achieved by locating existing copies of a file on any of the target pools. If an existing copy is found, rather than creating a new copy, the state of the existing copy is updated to reflect the target state specified for the migration job. Care is taken to never make a file more volatile than it already is: Sticky flags are added, or existing sticky flags are extended, but never removed or shortened; cached files may be marked precious, but not vice versa. One caveat is when the target pool containing the existing copy is offline. In that case the existence of the copy cannot be verified. Rather than creating a new copy, the task fails and the file is put back into the transfer queue. This behaviour can be modified by marking a migration job as eager. Eager jobs create new copies if an existing copy cannot be immediately verified. As a rule of thumb, permanent jobs should never be marked eager. This is to avoid that a large number of unnecessary copies are created when several pools are restarted simultaneously.

A migration task aborts whenever it runs into a problem. The file will be reinserted at the end of the transfer queue. Consequently, once a migration job terminates, all files have been successfully transferred. If for some reason tasks for particular files keep failing, then the migration job will never terminate by itself as it retries indefinitely.

[return to top]

Command Summary

All commands begin with the string migration, e.g. migration copy. The commands migration copy, migration cache and migration move create new migration jobs. These commands take the same options and only differ in default values. Except for the number of concurrent tasks, transfer parameters of existing jobs cannot be changed. This is by design to ensure idempotency of jobs. The concurrency can be altered through the migration concurrency command.

Jobs are assinged a job ID and are executed in the background. The status of a job may be queried through the migration info command. A list of all jobs can be obtained through migration ls. Jobs stay in the list even after they have terminated. Terminated jobs can be cleared from the list through the migration clear command.

Jobs can be suspended, resumed and cancelled through the migration suspend, migration resume and migration cancel commands. Existing tasks are allowed to finish before a job is suspended or cancelled.

[return to top]

Examples

[return to top]

Vacating a pool

To vacate <sourcePool>, we first mark the pool read-only to avoid that more files are added to the pool, and then move all files to <targetPool>. It is not strictly necessary to mark the pool read-only, however if not done there is no guarantee that the pool is empty when the migration job terminates. The job can be rerun to move remaining files.

(<sourcePool>) admin > pool disable -rdonly
(<sourcePool>) admin > migration move <targetPool>
[1] RUNNING      migration move <targetPool>
(<sourcePool>) admin > migration info 1
Command    : migration move <targetPool>
State      : RUNNING
Queued     : 0
Attempts   : 1
Targets    : <targetPool>
Completed  : 0 files; 0 bytes; 0%
Total      : 830424 bytes
Concurrency: 1
Running tasks:
[0] 0001000000000000000BFAE0: TASK.Copying -> [<targetPool>@local]
(<sourcePool>) admin > migration info 1
Command    : migration move <targetPool>
State      : FINISHED
Queued     : 0
Attempts   : 1
Targets    : <targetPool>
Completed  : 1 files; 830424 bytes
Total      : 830424 bytes
Concurrency: 1
Running tasks:
(<sourcePool>) admin > rep ls
(<sourcePool>) admin >

[return to top]

Caching recently accessed files

Say we want to cache all files belonging to the storage group atlas:default and accessed within the last month on a set of low-cost cache pools defined by pool group cache_pools. We can achieve this through the following command.

(<sourcePool>) admin > migration cache -target=pgroup -accessed=0..2592000 -storage=atlas:default cache_pools
[1] INITIALIZING migration cache -target=pgroup -accessed=0..2592000 -storage=atlas:default cache_pools
(<sourcePool>) admin > migration info 1
Command    : migration cache -target=pgroup -accessed=0..2592000 -storage=atlas:default cache_pools
State      : RUNNING
Queued     : 2577
Attempts   : 2
Targets    : pool group cache_pools, 5 pools
Completed  : 1 files; 830424 bytes; 0%
Total      : 2143621320 bytes
Concurrency: 1
Running tasks:
[72] 00010000000000000000BE10: TASK.Copying -> [pool_2@local]

The files on the source pool will not be altered. Any file copied to one of the target pools will be marked cached.