Using Traps to Monitor Appliance Health
Recommendations for configuring alerts that monitor appliance health.
When monitoring the health of the Integration Appliance, you can
use one or both of the following methods:
- Poll - Actively monitor runtime resource usage including appliance garbage collection cycles, appliance memory usage, and appliance disk usage.
- Trap - Receive notifications indicating hardware situations such as failed fans, high temperatures, or failed disks. For more information about hardware related SNMP traps, see About the Platform Module.
For more information about creating and enabling notification alerts, see the WMC Online Help or the Cast Iron Web Management Console Guide in the IBM WebSphere Cast Iron Information Center.
Table 1 provides recommended thresholds for notifications regarding garbage collection, memory usage, and disk usage.
- Garbage Collection - This parameter counts the number of garbage collections that have occurred since the last system restart. Garbage collection that occurs at a rate over the recommended threshold can indicate that the appliance is overworked and might start to experience performance issues.
- Memory Usage - This parameter measures the percentage of
total memory in use, expressed in hundredths of a percent. This number
is updated after each garbage collection. Note: This value is provided as an integer, but the MIB causes the SNMP Manager to alter its displayed value by a hundredths of a percent. For example: a value of '1234' displays as '12.34'." The SNMP Management tool should handle this automatically, but verify that it does so.
- Disk Usage - This parameter measures the percentage of total work-in-progress (WIP) disk space that is in use, expressed in hundredths of a percent. The percentage of disk usage determines job purging activities for the appliance.
Parameters to Monitor | Recommended Thresholds | SNMP Name and OID |
---|---|---|
Garbage Collection | Create a notification that triggers an alert if this value changes quickly, by more than 6 counts in a 1-minute time period. | CASTIRON-IA-MIB::ciIaResNbrGarbageCollects .1.3.6.1.4.1.13336.2.2.2.1.1.2.1.0 |
Memory Usage | Create a notification that triggers an alert if this value goes over 80% (raw value of 8000). | CASTIRON-IA-MIB::ciIaResPctMemoryUsed .1.3.6.1.4.1.13336.2.2.2.1.1.2.2.0 |
Disk Usage | Create a notifications triggers an alert if this value goes over 75% (raw value of 7500). | CASTIRON-IA-MIB::ciIaResPctWipFull .1.3.6.1.4.1.13336.2.2.2.1.1.2.3.0 |
Note: The parameters to monitor, described in the table above, are
for SNMP polling only.