This chapter describes planning for monitoring your system and tracking system events, and using and modifying the predefined scripts, expressions, commands, and responses packaged with the Monitoring application. Predefined components and how to use them are described in detail in Components Provided for Monitoring.
First, you select conditions to monitor that would have a severe impact on your system. These conditions might include:
When you have determined the resource problems you want to monitor, review the predefined conditions and identify the conditions you want to use. Use the lscondition command to view all conditions. If a predefined condition deviates from your requirements in some way, you can use it as a template to create a customized condition, or create a completely new condition (using the mkcondition command). After you have selected conditions for monitoring, you need to plan one or more responses to be taken for the event and optionally, the rearm event.
A set of predefined responses comes installed with your system (see Predefined Responses). Each response has one or more actions associated with it. Each action can be configured to fit your particular work environment and schedule.
The predefined actions are:
You can also write your own commands that correct or mitigate conditions and run them using the Run program option.
You might specify different actions based on when the monitored condition occurs. For example, you could have one set of actions to respond to a condition during working hours and another set to respond to a condition on nights and weekends.
This section describes how to start using the Monitoring application's command line interface to:
See the man pages or the IBM Cluster Systems Management for Linux Technical Reference for detailed usage information.
To associate a response with a condition, use the following commands:
To view events, use the lsaudrec command to open the audit log. To log events to a user-defined file, use the logevent script.
To stop monitoring for a condition, use the stopcondresp command.
The following examples show how to monitor your system using the command line interface. See the man pages or the IBM Cluster Systems Management for Linux Technical Reference for detailed usage information.
This example shows the sequence of setting up a new condition and response to have an event and rearm events respond accordingly:
The following examples show uses of individual commands and sample command output:
Name Monitoring Status "/tmp space used" "Not monitored" "var space used" "Monitored" (more conditions listed...)
Name "Critical notification" "Warning notification" "Informational notification" "Remove unwanted files" (more responses listed...)
Condition Response State "/tmp space used" "Broadcast event on-shift" Active "/tmp space used" "E-mail root anytime" Not Active
mkcondresp "FileSystem space used" "Broadcast event on-shift" "E-mail root anytime"
startcondresp "/tmp space used" "critical notification" "remove unwanted files"
stopcondresp "/tmp space used"
To stop monitoring the condition "/tmp space used" with a specific response, "critical notification," type:
stopcondresp "/tmp space used" "critical notification"
mkcondition -c "/tmp space used" "my test condition"
lsaudrec -l
mkresponse -n "E-mail root" -d 1+7 \ -s "/usr/sbin/rsct/bin/notifyevent root" -e b "E-mail root anytime"
For a complete list of predefined commands, scripts, and utilities, see Commands, Scripts, Utilities, and Files.
For information about monitoring events, rearm events, actions, and errors that have occurred, view the audit log using the lsaudrec command. See the lsaudrec man page and Using the Audit Log to Track Monitoring Activity for details.
Audit log records include the following:
The administrator can use the audit log to track activity that may not be visible otherwise because the activity is related to subsystems running in the background. To list audit log records, use the lsaudrec command. To remove audit log records, use the rmaudrec command. For details see the command man pages, or IBM Cluster Systems Management for Linux Technical Reference.
The IBM Cluster Systems Management for Linux Technical Reference contains information about predefined scripts that are provided with the Event Response resource manager (ERRM). The following scripts are provided:
You can also use existing operating system commands and user-written scripts in the definition of an action.
The displayevent, logevent, msgevent, notifyevent, and wallevent scripts are examples of the types of actions that system administrators can use to respond to events.
For complete descriptions of these scripts, see IBM Cluster Systems Management for Linux Technical Reference or the command man pages.
You can use these scripts as-is or treat them as templates by copying and modifying them to create new scripts that suit your needs. For example, to use the wallevent script as a template for a page event command, do the following:
For a command to run in response to an event or a rearm event defined by a condition, the command must be included as an action in an Event Response resource. When an Event Response resource is defined, specify the entire path name for a script that is used within an action. Use the Event Response resource manager commands to set up responses.
Test any scripts or commands that you have created or modified before you use them as actions in a production environment.
After ERRM has subscribed to RMC to monitor a condition and that condition occurs, ERRM runs commands in the user's operating system environment. The Event Response resource contains a list of commands to be run. Before each command is run, the following environment variables are established for the command to use (see Event Response Resource Manager for a detailed description of ERRM):
The following data types are represented with this environment variable as a decimal string: CT_INT32, CT_UINT32, CT_INT64, CT_UINT64, CT_FLOAT32, and CT_FLOAT64.
CT_CHAR_PTR is represented as a string for this environment variable.
CT_BINARY_PTR is represented as a hexadecimal string separated by spaces.
CT_SD_PTR is enclosed in square brackets and has individual entries within the SD that are separated by commas. Arrays within an SD are enclosed within braces {}. For example, ["My Resource Name",{1,5,7},{0,9000,20000},{7000,11000,25000}] See the definition of ERRM_SD_DATA_TYPES for an explanation of the data types that these values represent.
(See "Resource Handle" on page *** for a definition and an example of a resource handle.)
The information in this section is for advanced users who want to:
Permissible data types, operators, and operator order of precedence are described below. RMC uses these functions to match a selection string against the persistent attributes of a resource and to implement the evaluation of an event expression or a rearm expression.
An expression is similar to a C language statement or the WHERE clause of
an SQL query. It is composed of variables, operators, and
constants. The C and SQL syntax styles may be intermixed within a
single expression. The following table relates the RMC terminology to
SQL terminology:
RMC | SQL |
---|---|
attribute name | column name |
select string | WHERE clause |
operators | predicates, logical connectives |
resource class | table |
SQL syntax is supported for selection strings, with the following restrictions:
The term variable is used in this context to mean the column name
or attribute name in an expression. Variables and constants in an
expression may be one of the following data types that are supported by the
RMC subsystem:
Symbolic Name | Description |
CT_INT32 | Signed 32-bit integer |
CT_UINT32 | Unsigned 32-bit integer |
CT_INT64 | Signed 64-bit integer |
CT_UINT64 | Unsigned 64-bit integer |
CT_FLOAT32 | 32-bit floating point |
CT_FLOAT64 | 64-bit floating point |
CT_CHAR_PTR | Null-terminated string |
CT_BINARY_PTR | Binary data - arbitrary-length block of data |
CT_RSRC_HANDLE_PTR | Resource handle - an identifier for a resource that is unique over space and time (20 bytes) |
In addition to the base data types, aggregates of the base data types may be used as well. The first aggregate data type is similar to a structure in C in that it can contain multiple fields of different data types. This aggregate data type is referred to as structured data (SD). The individual fields in the structured data are referred to as structured data elements, or simply elements. Each element of a structured data type may have a different data type which can be one of the base types in the preceding table or any of the array types discussed in the next section, except for the structured data array.
The second aggregate data type is an array. An array contains zero or more values of the same data type, such as an array of CT_INT32 values. Each of the array types has an associated enumeration value (CT_INT32_ARRAY, CT_UINT32_ARRAY). Structured data may also be defined as an array but is restricted to have the same elements in every entry of the array.
Literal values can be specified for each of the base data types as follows:
Entries of an array can be accessed by specifying a subscript as in the C programming language. The index corresponding to the first element of the array is always zero; for example, List [2] references the third element of the array named List. Only one subscript is allowed. It may be a variable, a constant, or an expression that produces an integer result. For example, if List is an integer array, then List[2]+4 produces the sum of 4 and the current value of the third entry of the array.
"0xabcd 0x01020304050607090a0b0c0d0e0f1011121314"
"0x4018 0x0001 0x00000000 0x0069684c 0x00519686 0xaf7060fc"
Variable names refer to values that are not part of the expression but are accessed while running the expression. For example, when RMC processes an expression, the variable names are replaced by the corresponding persistent or dynamic attributes of each resource.
Entries of an array may be accessed by specifying a subscript as in 'C'. The index corresponding to the first element of the array is always 0 (for example, List[2] refers to the third element of the array named List). Only one subscript is allowed. It may be a variable, a constant, or an expression that produces an integer result. A subscripted value may be used wherever the base data type of the array is used. For example, if List is an integer array, then "List[2]+4" produces the sum of 4 and the current value of the third entry of the array.
The elements of a structured data value can be accessed by using the following syntax:
<variable name>.<element name>
For example, a.b
The variable name is the name of the table column or resource attribute, and the element name is the name of the element within the structured data value. Either or both names may be followed by a subscript if the name is an array. For example, a[10].b refers to the element named b of the 11th entry of the structured data array called a. Similarly, a[10].b[3] refers to the fourth element of the array that is an element called b within the same structured data array entry a[10].
Variable names refer to values that are not part of an expression but are accessed while running the expression. When used to select a resource, the variable name is a persistent attribute. When used to generate an event, the variable name is a dynamic attribute. When used to select audit records, the variable name is the name of a field within the audit record.
A variable name is restricted to include only 7-bit ASCII characters that are alphanumeric (a-z, A-Z, 0-9) or the underscore character (_). The name must begin with an alphabetic character. When the expression is used by the RMC subsystem for an event or a rearm event, the name can have a suffix that is the '@' character followed by 'P', which refers to the previous observation.
Constants and variables may be combined by an operator to produce a result that in turn may be used with another operator. The resulting data type or the expression must be a scalar integer or floating-point value. If the result is zero, the expression is considered to be FALSE; otherwise, it is TRUE.
The set of operators that can be used in strings is summarized in the
following table:
Operator | Description | Left Data Types | Right Data Types | Example | Notes |
---|---|---|---|---|---|
+ | Addition | Integer,float | Integer,float | "1+2" results in 3 | None |
- | Subtraction | Integer,float | Integer,float | "1.0-2.0" results in -1.0 | None |
* | Multiplication | Integer,float | Integer,float | "2*3" results in 6 | None |
/ | Division | Integer,float | Integer,float | "2/3" results in 1 | None |
- | Unary minus | None | Integer,float | "-abc" | None |
+ | Unary plus | None | Integer,float | "+abc" | None |
.. | Range | Integers | Integers | "1..3" results in 1,2,3 | Shorthand for all integers between and including the two values |
% | Modulo | Integers | Integers | "10%2" results in 0 | None |
| | Bitwise OR | Integers | Integers | "2|4" results in 6 | None |
& | Bitwise AND | Integers | Integers | "3&2" results in 2 | None |
~ | Bitwise complement | None | Integers | ~0x0000ffff results in 0xffff0000 | None |
^ | Exclusive OR | Integers | Integers | 0x0000aaaa^0x0000ffff results in 0x00005555 | None |
>> | Right shift | Integers | Integers | 0x0fff>>4 results in 0x00ff | None |
<< | Left shift | Integers | Integers | "0x0ffff<<4" results in 0xffff0 | None |
==
| Equality | All but SDs | All but SDs |
"2==2" results in 1
| Result is true (1) or false (0) |
!=
| Inequality | All but SDs | All but SDs |
"2!=2" results in 0
| Result is true (1) or false (0) |
> | Greater than | Integer,float | Integer,float | "2>3" results in 0 | Result is true (1) or false (0) |
>= | Greater than or equal | Integer,float | Integer,float | "4>=3" results in 1 | Result is true (1) or false (0) |
< | Less than | Integer,float | Integer,float | "4<3" results in 0 | Result is true (1) or false (0) |
<= | Less than or equal | Integer,float | Integer,float | "2<=3" results in 1 | Result is true (1) or false (0) |
=~ | Pattern match | Strings | Strings | "abc"=~"a.*" results in 1 | Right operand is interpreted as an extended regular expression |
!~ | Not pattern match | Strings | Strings | "abc"!~"a.*" results in 0 | Right operand is interpreted as an extended regular expression |
=?
| SQL pattern match | Strings | Strings | "abc"=? "a%" results in 1 | Right operand is interpreted as a SQL pattern |
!?
| Not SQL pattern match | Strings | Strings | "abc"!? "a%" results in 0 | Right operand is interpreted as a SQL pattern |
|<
| Contains any | All but SDs | All but SDs | "{1..5}|<{2,10}" results in 1 | Result is true (1) if left operand contains any value from right operand |
><
| Contains none | All but SDs | All but SDs | "{1..5}><{2,10}" results in 1 | Result is true (1) if left operand contains no value from right operand |
&< | Contains all | All but SDs | All but SDs | "{1..5}&<{2,10}" results in 0 | Result is true (1) if left operand contains all values from right operand |
||
| Logical OR | Integers | Integers | "(1<2)||(2>4)" results in 1 | Result is true (1) or false (0) |
&&
| Logical AND | Integers | Integers | "(1<2)&&(2>4)" results in 0 | Result is true (1) or false (0) |
!
| Logical NOT | None | Integers | "!(2==4)" results in 1 | Result is true (1) or false (0) |
When integers of different signs or size are operands of an operator, standard
C style casting is implicitly performed. When an expression with
multiple operators is evaluated, the operations are performed in the order
defined by the precedence of the operator. The default precedence can
be overridden by enclosing the portion or portions of the expression to be
evaluated first in parentheses (). For example, in the expression
"1+2*3", multiplication is normally performed before addition to produce a
result of 7. To evaluate the addition operator first, use parentheses
as follows: "(1+2)*3". This produces a result of 9. The
default precedence rules are shown in the following table. All
operators in the same table cell have the same or equal precedence.
Operators | Description |
---|---|
. | Structured data element separator |
~
|
Bitwise complement
|
*
|
Multiplication
|
+
|
Addition
|
<<
|
Left shift
|
<
|
Less than
|
==
|
Equality
|
& | Bitwise AND |
^ | Bitwise exclusive OR |
| | Bitwise inclusive OR |
&& | Logical AND |
|| | Logical OR |
, | List separator |
Two types of pattern matching are supported; extended regular expressions and that which is compatible with the standard SQL LIKE predicate. This type of pattern may include the following special characters:
Some examples of the types of expressions that can be constructed follow:
Name =~'tr.*0' Name LIKE 'tr%0'
IntList|<{1,3,5..7} IntList in (1,3,5..7)
(Name LIKE "tr%0")&&(IntList|<(1,3,5..7)) (Name=~'tr.*0') AND (IntList IN {1,3,5..7})