It is advisable to run the alarms
service in a separate
domain and list this domain first in the layout file. That way
the alarms
service gets booted first and can catch
startup errors reported by the other domains. Since both the
httpd
service and the alarms
service will access
the storage file generated at
/var/lib/dcache/alarms/alarms.xml
the
alarms
service should be defined on the same host as the
httpd
service. You can modify where this file is placed
by setting the property alarms.store.path
to
a different location.
Add a domain for the alarms
service to the layout file
where the httpd
service is defined.
[alarmserverDomain] [alarmserverDomain/alarms] ... [httpdDomain]
If all of the dCache domains run on the same host, then the default setting (localhost) will work.
In general your dCache will not be configured to run on one
node. In this case each node needs to know on which node the
alarms
service is running. The alarms
service
and the httpd
will run on one of the nodes. On all the
other nodes you need to modify the
/etc/dcache/dcache.conf
file or the
layout file to set the alarms.server.host
property to the host on which the alarms
service is
running and restart dCache.
Example:
Look at an example of a dCache which consists of a head
node, some door nodes and some pool nodes. Assume that the
httpd
service and the alarms
service are
running on the head node. Then you would need to set the
property alarms.server.host
on the pool
nodes and on the door nodes to the host on which the
alarms
service is running.
alarms.server.host=<head-node>
The alarms defined are listed below. There are four
different levels of severity, CRITICAL
,
HIGH
, MODERATE
and
LOW
.
CRITICAL
SERVICE_CREATION_FAILURE
DB_OUT_OF_CONNECTIONS
DB_UNAVAILABLE
JVM_OUT_OF_MEMORY
OUT_OF_FILE_DESCRIPTORS
The affected dCache can’t work (is down).
HIGH
IO_ERROR
HSM_READ_FAILURE
HSM_WRITE_FAILURE
LOCATION_MANAGER_UNAVAILABLE
POOL_MANAGER_UNAVAILABLE
These functions are affected and not working or not working properly, even though the dCache domain may be running.
MODERATE
POOL_DISABLED
CHECKSUM
There is an issue which should be taken care of in the interest of performance or usability, but which is not impeding the functioning of the system as a whole.
LOW
This issue might be worth investigating if it occurs, but is not urgent.
Given that an alarm has been triggered, you will find an entry
in the file
/var/lib/dcache/alarms/alarms.xml
.
As it is not very convenient to read an XML file, the Alarms Web Page can be used to inspect and manage the generated warnings.