IBM Books

Administration Guide


Monitoring system events

A flexible distributed system monitoring application is provided by Reliable Scalable Cluster Technology (RSCT). The RSCT monitoring application allows you to define conditions on your system that you want to monitor. An event occurs when a monitored condition reaches a threshold that is defined in an event expression. When an event occurs, automated responses to the event take place. Multiple actions can be defined as components of a response. These actions include sending a notification, running a predefined script, or running a user-defined script.

A set of commands is provided for setting up the monitoring application to meet your needs. A set of predefined conditions and responses is also provided. You can use the predefined conditions and responses as they are, or you can copy and then modify them to fit your needs.

System resources that can be monitored include:

The monitoring application, its components, and predefined conditions and responses are described at a high level in Monitoring your system. For the full details, refer to IBM RSCT for Linux: Guide and Reference. RSCT command syntax, descriptions, and examples are available as integrated man pages or in IBM RSCT for Linux: Technical Reference.

CFM ERRM conditions and responses

CSM provides predefined conditions and responses that you can use for monitoring your system. In order for monitoring to become active, you will need to set up an association between a condition and the response you want to take. You do this with the RSCT mkcondresp command. After the association between a condition and a response has been set up, you will then use the RSCT startcondresp command to start monitoring. For information on using the mkcondresp and startcondresp command, see the IBM RSCT for Linux: Guide and Reference.

When a condition. is met, ERRM runs the cfmupdatenode command in response. The monitored conditions are as follows:

Table 1. ERRM conditions and responses

Condition Name Condition Description Corresponding Response Response Description
CFMRootModTimeChanged Change in /cfmroot directory CFMModResp Runs cfmupdatenode to update all nodes in the cluster.
NodeGroupMembershipChanged Change to a node group definition that has corresponding CFM files CFMNodeGroupResp Runs cfmupdatenode to update all the nodes in the changed node group.

To see a list of the predefined conditions, use the lscondition command. See the lscondition man page or the IBM RSCT for Linux: Technical Reference for more information.

In general, you will be most concerned with the CFMRootModTimeChanged condition, which watches for changes to the /cfmroot directory. Anytime a file in this directory is updated, or a new file is added, the condition is met. In response, the cfmupdatenode command is run after two minutes, which causes the updated configuration files to be copied to the nodes in the cluster. Note that the CFMRootModTimeChanged condition is not met when files are removed.

If you add files to the /cfmroot directory during heavy system use, the ERRM CFMRootModTimeChanged condition may degrade system performance. As a result, you may want to change the response for CFMRootModTimeChanged to one that will have a lower impact on the system (such as EmailEventsToRootAnyTime). Then, when the system is not under such heavy use, you can run the cfmupdatenode command to distribute the new files to the rest of the cluster.

For more information on ERRM, see IBM RSCT for Linux: Guide and Reference.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]