Administration Guide

Chapter 33. High Availability Cluster Multi-processing, Enhanced Scalability (HACMP ES) for AIX

Enhanced scalability (ES) is a feature of High Availability Cluster Multi-processing (HACMP) for AIX Version 4.2.2, which currently runs only on RS/6000 SP nodes.

This feature provides the same failover recovery as HACMP, and has the same event structure as previous HACMP versions (see HACMP for AIX, V4.2.2, Enhanced Scalability Installation and Administration Guide). Enhanced scalability also provides:

Larger HACMP clusters, with scalability up to 16 nodes per cluster.
Additional error coverage through user-defined events. Monitored areas can trigger user-defined events, which can be as diverse as the death of a process, or the fact that paging space is nearing capacity. Such events include pre- and post-events that can be added to the failover recovery process, if needed. Extra functions that are specific to the different implementations can be placed within the HACMP pre- and post-event streams.

A rules file (/usr/sbin/cluster/events/rules.hacmprd) contains the HACMP events. User-defined events are added to this file. The script files that are to be run when events occur are part of this definition.

For more information about user-defined events and the rules file, see HACMP ES Event Monitoring and User-defined Events.
HACMP client utilities for monitoring and detecting status changes (in one or more clusters) from AIX physical nodes outside of the HACMP cluster.

The nodes in HACMP ES clusters exchange messages called heartbeats, or keepalive packets, by which each node informs the other nodes about its availability. A node that has stopped responding causes the remaining nodes in the cluster to invoke recovery. The recovery process is called a node_down event and may also be referred to as failover. The completion of the recovery process is followed by the re-integration of the node into the cluster. This is called a node_up event.

There are two types of events: standard events that are anticipated within the operations of HACMP ES, and user-defined events that are associated with the monitoring of parameters in hardware and software components.

One of the standard events is the node_down event. When planning what should be done as part of the recovery process, HACMP allows two failover options: hot (or idle) standby, and mutual takeover.

[ Top of Page | Previous Page | Next Page ]