release notes | Book: 2.10, 2.11, 2.12, 2.13, 2.14 | Wiki | Q&A black_bg
Web: Multi-page, Single page | PDF: A4-size, Letter-size | eBook: epub black_bg

The Basic Setup

It is not necessary to run the alarms service in a separate domain, though depending on the individual system configuration it may still be advisable not to embed the service in a domain already burdened with higher memory requirements. To handle alarms under a relatively high load (100Hz on the server end) requires only about 2% more of the cpu, but about 75-100 MiB of additional space. In order to capture any alarms from other domains at startup, it is also necessary to arrange for the alarm service to start before the other doors and pools.

While there is nothing strictly preventing the use of multiple alarms services, under normal circumstances this should not be necessary. The only constraint on the service set-up has to do with the storage option. Unlike for the RDBMS (relational database) back-end, there is currently no option for remote access of the XML file. Since the alarms storage needs to be written to by the alarms service but also read by the httpd service, it is thus necessary for that file to exist on a shared or mounted partition visible to both. Obviously, no such requirement exists for the RDBMS option.

Add the alarms service to a domain in the layout file:

    [alarmsDomain]
    [alarmsDomain/alarms]
    alarms.db.type=rdbms
    ...
    

OR

    [someotherDomain]
    [someotherDomain/alarms]
    alarms.db.type=rdbms
    [someotherDomain/someotherservice]
    ...
    

Note that the storage type setting alarms.db.type must be defined either in the layout or /etc/dcache/dcache.conf file because its default value is off; this can be set to either xml, or rdbms. In the latter case, the standard set of properties can be used to configure the connection url, db user, and so forth. Before using the rdbms option for the first time, be sure to run:

        createdb -U alarms.db.user alarms
    

to create the database; as usual, the actual schema will be initialized automatically when the service is started.

For the XML option, the storage file is usually found in the shared directory for alarms (corresponding to alarms.dir); the usual path is /var/lib/dcache/store.xml, but the location can be changed by setting alarms.db.xml.path. This will automatically be propagated to alarms.db.url and consequently to httpd.alarms.db.url if the two domains are on the same host; if they are not (and share this file path via a mount, for instance), be sure to set the httpd property in the layout or /etc/dcache/dcache.conf on the httpd host to correspond to the new alarms.db.url.

As a rule of thumb, the choice between XML and RDBMS is dictated by two factors: whether it is feasible to share the XML file between the two services as noted, and how much history is to be preserved. While the XML option is more lightweight and easier to configure, it is limited by performance, experiencing considerable read and write slowdown as the file fills (beyond 1000 entries or so). If you do not need to maintain records of alarms (and either manually delete alarms which have been serviced, or use the built-in cleanup feature – see below), then this option should be sufficient. Otherwise, the extra steps of installing postgreSQL on the appropriate node and creating the alarms database (as above) may be worth the effort.

[return to top]

Configure where the alarms service is Running

The alarms infrastructure is actually a wrapper around the logging layer and makes use of a simple tcp socket logger to transmit logging events to the server. In each domain, the /etc/dcache/logback.xml configuration references the following properties to control remote logging:

        dcache.log.level.remote=off
        dcache.log.server.host=localhost
        dcache.log.server.port=9867
    

As with the alarms service database type, remote logging is turned off by default. Under normal circumstances it should be sufficient to set this to error in order to receive alarms. All internally generated alarms (see below) are in fact guaranteed to be sent at this logging level. Remote transmission of events at lower logging levels is possible, but caution should be taken inasmuch anything below warn significantly increases network traffic and could risk overloading the server or creating a bottleneck. This service was not designed to provide robust centralized debugging.

If all of your dCache domains run on the same host, then the default (localhost value will work. But usually your dCache will not be configured to run on a single node, so each node will need to know the destination of the remote logging events. On all the nodes except where the actual alarms service resides, you will thus need to modify the /etc/dcache/dcache.conf file or the layout file to set the dcache.log.server.host property (and restart dCache if it is already up). The default port should usually not need to be modified; in any case, it needs to correspond to whatever port the service is running on. From inspection of the /usr/share/dcache/alarms.properties file, you can see that the alarms-specific properties mirror the logger properties:

     #  ---- Host on which this service is running
     alarms.net.host=${dcache.log.server.host}
     #  ---- TCP port the alarms service listens on
     alarms.net.port=${dcache.log.server.port}
    

The first property should not need any adjustment, but if alarms.net.port is modified, be sure to modify the dcache.log.server.port property on the other nodes to correspond to it. In general, it is advisable to work directly with the dcache.log.server properties everywhere.

Example:

An example of a dCache which consists of a head node, some door nodes and some pool nodes. Assume that the httpd service and the alarms service are running on the head node. Then you would need to set the property dcache.log.server.host on the pool nodes and on the door nodes to the host on which the alarms service is running.

            dcache.log.server.host=<head-node>
       

[return to top]

Types of Alarms

As stated previously, the dCache alarm system runs on top of the logging system (and more specifically, depends on the ch.qos.logback logging library). It promotes normal logging events to alarm status in one of two ways.

BUILT-IN (MARKED) ALARMS

Some alarms are already coded into dCache. These bear the general logging marker ALARM and also can carry sub-markers for type and uniqueness identifiers. They also carry information indicating the host, domain and service which emits them. All such alarms are logged at the ERROR event level.

SERVER-SIDE (OPTIONAL) ALARMS

Logging events which arrive at the alarm server, but which do not carry a specific alarm type marker (these may be events at any logging level, not just ERROR), can nevertheless be redefined as a specific type of alarm via a set of filters provided by the administrator. These filters or custom alarm definitions reside in a special XML file usually written to the alarms space. Further explanation as to how to create such filters is given in another section below.

[return to top]

Alarm Priority

The notion of alarm or alert carries the implication that this particular error or condition requires user attention/intervention; there may be, however, differences in urgency which permit the ordering of such notices in terms of degree of importance. dCache allows the administrator complete control over this prioritization.

The available priority levels are:

  • CRITICAL
  • HIGH
  • MODERATE
  • LOW

Any alarm can be set to whatever priority level is deemed appropriate. This can be done through the admin interface commands (see below). Without any customization, all alarms (of both types) are given a default priority level. This level can be changed via the value of <variable>alarms.priority-mapping.default</variable>, which by default is critical.

Filtering based on priority is possible both in the webadmin page (see below), and for alarms sent via email (<variable>alarms.email.threshold</variable>; fuller discussion of how to enable email alarms is given in a later section).

Note

There also exists the possibility of filtering out only alarms from the main database into a separate log file. This option is enabled using <variable>alarms.enable.history</variable>, and similarly has a priority threshold, <variable>alarms.history.threshold</variable>. This is particularly useful in tandem with the XML storage option; it allows preservation of a condensed record of the alarms even after their full entries have been deleted from the database.

[return to top]

Working with Alarms: Shell Commands

Some basic alarm commands are available as part of the dCache shell. The following is an abbreviated description; for fuller information, see the dCache man page.

alarm send

Send an arbitrary alarm message to the alarm server. The remote server address is taken from the local values for <variable>dcache.log.server.host</variable> and <variable>dcache.log.server.port</variable>. If the [-t=TYPE] option is used, it must be a predefined (internal) alarm type.

alarm list

Displays a list of all alarm types currently defined in dCache code (i.e., predefined, internal types). Since these types can be modified with any incremental release, a listing in this manual would be of limited value. It is easy enough to check which ones currently are defined using this command, the predefined ls admin command, or the auto-completing Alarm Type combo box on the webadmin alarms page.

alarm [add | modify | remove]

Activates an interpreter for adding, modifying or removing a filter definition. The interpreter walks you through the steps and choices. The results are written to the definitions file given by the local value for <variable>alarms.custom-definitions.path</variable>.

[return to top]

Working with Alarms: Admin Commands

A similar set of commands is available through the admin interface. To see fuller information for each of these, do help [command].

definition add [OPTIONS]

Add a new custom definition; if a definition of this type already exists, the new definition will overwrite it.

definition keywords

Print the list of attribute names whose values can be used as keyword identifiers for the alarm.

definition ls [type]

Print a single definition or sorted list of definitions.

definition reload [path]

Reinitialize the definitions from the saved changes.

definition rm type

Remove the existing alarm definition.

definition save [path]

Save the current definitions to persistent back-up.

definition set type name value

Set the attribute of an existing alarm definition.

definition unset type name

Unset (remove) the attribute of an existing alarm definition.

predefined ls

Print a list of all internally defined alarms.

priority get default

Get the current default alarm priority value.

priority ls [type]

Print a single priority level or sorted list of priority levels for all known alarms.

priority reload [path]

Reinitialize priority mappings from saved changes.

priority restore all

Set all defined alarms to the current default priority value.

priority save [path]

Save the current priority mappings to persistent back-up.

priority set type low|moderate|high|critical

Set the priority of the alarm type.

priority set default low|moderate|high|critical

Set the default alarm priority value.

send [OPTIONS] message

Send an alarm to the alarm service.

Note

Custom definitions and priority mappings are backed by files corresponding to the properties <variable>alarms.custom-definitions.path</variable> and <variable>alarms.priority-mapping.path</variable>, respectively. It is always possible to modify these files directly by hand. These by default are mapped to /var/lib/dcache/alarms.custom-definitions.xml and /var/lib/dcache/alarms-priority.properties. In order for the changes to take effect, either restart the alarms domain, or use the respective reload admin command. It should be understood that when using the admin commands, any modifications are done in memory only and are not flushed automatically to the underlying file, so any permanent changes need to be made via the save command.

Note

It is possible to change the file locations by setting the above-mentioned properties in the layout or /etc/dcache/dcache.conf. As can be seen from the admin commands, it is also possible to specify the path as an option on the respective save and reload commands. Note, however, that this is meant mainly for temporary or back-up purposes, as the path defined in the local dcache configuration will remain unaltered after that command completes and the priority map or definitions will be reloaded from there once again whenever the domain is restarted.

Note

Any changes made via the priority set default command are in-memory only. To change this default permanently, set the <variable>alarms.priority-mapping.default</variable> property in the layout or /etc/dcache/dcache.conf.

[return to top]

Working with Alarms: The Webadmin Alarms Page

The Alarms Web Page is an admin page and thus requires authentication. You must enable HTTPS and set an admin gid (0 by default):

Note

For the authenticated mode you need to generate a pk12 hostcert for SSL. This can be done by running the dcache command: import hostcert [--hostcert=FILE] [--hostkey=FILE] [--out=FILE] [--password=PASSWORD] after obtaining a hostcert and hostkey, which are by default placed in /etc/grid-security.

    [httpdDomain]
        httpd.enable.authn=true
        httpd.authz.admin-gid=<1234>
    [httpdDomain/httpd]
    
  1. The QUERY FILTER form can be used to limit the display of alarms in the table. The underlying query to the database is based on whether the entry has been marked as an alarm (the radio buttons indicating undefined, yes and no, respectively), the time interval in which to search, the alarm type, and the result range; these are marked by 'D' on the example screenshot above. The date referred to in after and before is that of the latest update to that entry, not the timestamp of its original arrival. Each click of the Refresh button will reload the data from the database based on these parameters. The default behavior is ALL ALARMS ONLY (unspecified type or range). Placing a single date in the Beginning box will give you all entries from that date up to today (inclusive); a single date in the Ending box will give all entries up to that date (inclusive). The other options, marked by 'M', all do in-memory filtering.
  2. The Priority choice works like a debugging level, such that choosing MODERATE will expose all alarms of that priority or above, thus including HIGH and CRITICAL, but excluding LOW. Note that non-alarms are unaffected by this setting.
  3. The Match Expression filters by appending all fields to a single string and searching for a matching substring. If the Regular Expression box is checked, the expression to match is compiled as a (Java) regex.
  4. The header of the result table contains two checkboxes which allow you to check or uncheck the respective columns for all displayed items. Checking Delete and then clicking Refresh will actually eliminate the entry from persistent store.
  5. Closed is a way of marking the alarm as having been dealt with while maintaining a record of it. The Show Closed Alarms checkbox allows you to display them (turned off by default).
  6. All column titles appearing in white can be clicked to sort the table by that column. While there is no column indicating alarm priority (as this attribute is external to the alarm schema), alarms are sorted by priority first. Non-alarms follow alarms. If both alarms and non-alarms are displayed, alarms are colored dark red.
  7. Notes is an editable field to be used for any special remarks.

When Refresh is clicked, any updates to Closed and Notes are first saved, then any Deletes are processed, and finally, the table is repopulated using the current query filter. The entire form is set to auto-refresh every 60 seconds.