Administration Guide

Cluster Configuration

In a hot standby configuration, the AIX processor node that is the takeover node is not running any other workload. In a mutual takeover configuration, the AIX processor node that is the takeover node is running other workloads.

Generally, DB2 Universal Database Enterprise - Extended Edition (UDB EEE) runs in mutual takeover mode with partitions on each node. One exception is a scenario in which the catalog node is part of a hot standby configuration.

When planning a large DB2 installation on an RS/6000 SP using HACMP ES, you need to consider how to divide the nodes of the cluster within or between the RS/6000 SP frames. Having a node and its backup in different SP frames allows takeover in the event one frame goes down (that is, the frame power/switch board fails). However, such failures are expected to be exceedingly rare, because there are N+1 power supplies in each SP frame, and each SP switch has redundant paths, along with N+1 fans and power. In the case of a frame failure, manual intervention may be required to recover the remaining frames. This recovery procedure is documented in the SP Administration Guide. HACMP ES provides for recovery of SP node failures; recovery of frame failures is dependent on the proper layout of clusters within one or more SP frames.

Another planning consideration is how to manage big clusters. It is easier to manage a small cluster than a big one; however, it is also easier to manage one big cluster than many smaller ones. When planning, consider how your applications will be used in your cluster environment. If there is a single, large, homogeneous application running, for example, on 16 nodes, it is probably easier to manage the configuration as a single cluster rather than as eight two-node clusters. If the same 16 nodes contain many different applications with different networks, disks, and node relationships, it is probably better to group the nodes into smaller clusters. Keep in mind that nodes integrate into an HACMP cluster one at a time; it will be faster to start a configuration of multiple clusters rather than one large cluster. HACMP ES supports both single and multiple clusters, as long as a node and its backup are in the same cluster.

HACMP ES failover recovery allows pre-defined (also known as cascading) assignment of a resource group to a physical node. The failover recovery procedure also allows floating (or rotating) assignment of a resource group to a physical node. IP addresses, and external disk volume groups, or file systems, or NFS file systems, and application servers within each resource group specify either an application or an application component, which can be manipulated by HACMP ES between physical nodes by failover and reintegration. Failover and reintegration behavior is specified by the type of resource group created, and by the number of nodes placed in the resource group.

For example, consider a DB2 database partition (logical node). If its log and table space containers were placed on external disks, and other nodes were linked to those disks, it would be possible for those other nodes to access these disks and to restart the database partition (on a takeover node). It is this type of operation that is automated by HACMP. HACMP ES can also be used to recover NFS file systems used by DB2 instance main user directories.

Read the HACMP ES documentation thoroughly as part of your planning for recovery with DB2 UDB EEE. You should read the Concepts, Planning, Installation, and Administration guides, then build the recovery architecture for your environment. For each subsystem that you have identified for recovery, based on known points of failure, identify the HACMP clusters that you need, as well as the recovery nodes (either hot standby or mutual takeover). This is a starting point for completing the HACMP worksheets that are included in the documentation.

It is strongly recommended that both disks and adapters be mirrored in your external disk configuration. For DB2 physical nodes that are configured for HACMP, care is required to ensure that nodes on the volume group can vary from the shared external disks. In a mutual takeover configuration, this arrangement requires some additional planning, so that the paired nodes can access each other's volume groups without conflicts. For DB2 UDB EEE, this means that all container names must be unique across all databases.

One way to achieve uniqueness is to include the partition number as part of the name. You can specify a node expression for container string syntax when creating either SMS or DMS containers. When you specify the expression, the node number can be part of the container name or, if you specify additional arguments, the results of those arguments can be part of the container name. Use the argument " $N" ([blank]$N) to indicate the node expression. The argument must occur at the end of the container string, and can only be used in one of the following forms:

Table 56. Arguments for Creating Containers
The node number is assumed to be five.
Syntax Example Value
[blank]$N " $N" 5
[blank]$N+[number] " $N+1011" 1016
[blank]$N%[number] " $N%3" 2
[blank]$N+[number]%[number] " $N+12%13" 4
[blank]$N%[number]+[number] " $N%3+20" 22

Notes:

  1. % is modulus.

  2. In all cases, the operators are evaluated from left to right.

Following are some examples of how to create containers using this special argument:

Figure 104 and Figure 105 show an example of a DB2 SSA I/O subsystem configuration, and some of the planning necessary to ensure both a highly available external disk configuration, and the ability to access all volume groups without conflict.

Figure 104. No Single Point of Failure

No Single Point of Failure

Figure 105. Volume Group and Logical Volume Setup

Volume Group and Logical Volume Setup

Configuring a DB2 Database Partition

Once configured, each database partition in an instance is started by HACMP ES, one physical node at a time. Multiple clusters are recommended for starting parallel DB2 configurations that are larger than four nodes. Note that in a 64-node parallel DB2 configuration, it is faster to start 32 two-node HACMP clusters in parallel, than four 16-node clusters.

A script file, rc.db2pe, is packaged with DB2 UDB EEE (and installed on each node in /usr/bin) to assist in configuring for HACMP ES failover or recovery in either hot standby or mutual takeover nodes. In addition, DB2 buffer pool sizes can be customized during failover in mutual takeover configurations from within rc.db2pe. (Buffer pool sizes need to be configured to ensure proper performance when two database partitions run on one physical node.)

When you create an application server in an HACMP configuration of a DB2 database partition, specify rc.db2pe as a start and stop script as follows:

   /usr/bin/rc.db2pe <instance> <dpn> <secondary dpn> start <use switch>
   /usr/bin/rc.db2pe <instance> <dpn> <secondary dpn> stop <use switch>
 
   where:
 
   <instance> is the instance name.
   <dpn> is the database partition number.
   <secondary dpn> is the "companion" database partition number in
      mutual takeover configurations only; in hot standby configurations,
      it is the same as <dpn>.
   <use switch> is usually blank; when blank, it indicates that
      the SP switch network is used for the hostname field
      in the db2nodes.cfg file (all traffic for DB2 is routed over the SP switch);
      if not blank, the name used is the host name of the SP node to be used.

The DB2 command LIST DATABASE DIRECTORY is used from within rc.db2pe to find all databases configured for this database partition. The script file then looks for the /usr/bin/reg.parms.DATABASE file and the /usr/bin/failover.parms.DATABASE file, where DATABASE is each of the databases configured for this database partition. In a mutual takeover configuration, it is recommended that you create the parameter files reg.parms.xxx and failover.parms.xxx. In the failover.parms.xxx file, the settings for BUFFPAGE, DBHEAP, and any others affecting buffer pools, should be adjusted to account for the possibility of more than one buffer pool. Sample files reg.parms.SAMPLE and failover.parms.SAMPLE are provided for your use.

One of the important parameters in this environment is the start_stop_time database manager configuration parameter, which has a default value of 10 minutes. However, rc.db2pe sets this parameter to 2 minutes. You should set this parameter through rc.db2pe to a value of 10 minutes, or slightly more. In this context, the specified duration is the time interval between the failure of the partition, and its recovery. If applications running on a partition are issuing frequent COMMITs, 10 minutes following failure on a database partition should be sufficient time to roll back uncommitted transactions and to reach a point of consistency in the database on that partition. If your workload is heavy, or you have many partitions, you may need to increase the duration to decrease the probability of timeouts occurring before the rollback operation completes.

Following is an example of a hot standby configuration and a mutual takeover configuration. In both examples, the resource groups contain a Service IP switch alias address. This switch alias address is used for:

If your implementation does not require these aliases, they can be removed. If removed, be sure to set the MOUNT_NFS parameter to NO in the rc.db2pe script file.

Example of a Hot Standby Configuration

The assumption in this example is that a hot standby configuration exists between physical nodes 1 and 2, and that the DB2 instance name is POWERTP. The database partition is 1, and the database is TESTDATA, residing on file system /database.

   Resource group name: db2_dp_1
   Node Relationship: cascading
   Participating nodenames: node1_eth, node2_eth
   Service_IP_label: nfs_switch_1     (<<< this is the switch alias address)
   Filesystems: /database/powertp/NODE0001
   Volume Groups: DB2vg1
   Application Servers: db2_dp1_app
   Application Server Start Script: /usr/bin/rc.db2pe powertp 1 1 start
   Application Server Stop Script: /usr/bin/rc.db2pe powertp 1 1 stop

Example of a Mutual Takeover Configuration

The assumption in this example is that a mutual takeover configuration exists between physical nodes 1 and 2, and that the DB2 instance name is POWERTP. The database partitions are 1 and 2, and the database is TESTDATA, residing on file system /database.

   Resource group name: db2_dp_1
   Node Relationship: cascading
   Participating nodenames: node1_eth, node2_eth
   Service_IP_label: nfs_switch_1     (<<< this is the switch alias address)
   Filesystems: /database/powertp/NODE0001
   Volume Groups: DB2vg1
   Application Servers: db2_dp1_app
   Application Server Start Script: /usr/bin/rc.db2pe powertp 1 2 start
   Application Server Stop Script: /usr/bin/rc.db2pe powertp 1 2 stop
 
   Resource group name: db2_pd_2
   Node Relationship: cascading
   Participating nodenames: node2_eth, node1_eth
   Service_IP_label: nfs_switch_2     (<<< this is the switch alias address)
   Filesystems: /database/powertp/NODE0002
   Volume Groups: DB2vg2
   Application Servers: db2_dp2_app
   Application Server Start Script: /usr/bin/rc.db2pe powertp 2 1 start
   Application Server Stop Script: /usr/bin/rc.db2pe powertp 2 1 stop

Configuration of an NFS Server Node

The rc.db2pe script can also be used to make available NFS-mounted directories of DB2 parallel instance user directories. This can be accomplished by setting the MOUNT_NFS parameter to YES in the rc.db2pe script file, and configuring the NFS failover server pair as follows:

Example of an NFS Server Takeover Configuration

The assumption in this example is that there is an NFS server file system /nfshome in the volume group nfsvg over the IP address "nfs_server". The DB2 instance name is POWERTP, and the home directory is /dbhome/powertp.

   Resource group name: nfs_server
   Node Relationship: cascading
   Participating nodenames: node1_eth, node2_eth
   Service_IP_label: nfs_server     (<<< this is the switch alias address)
   Filesystems: /nfshome
   Volume Groups: nfsvg
   Application Servers: nfs_server_app
   Application Server Start Script: /usr/bin/rc.db2pe powertp NFS SERVER start
  Application Server Stop Script: /usr/bin/rc.db2pe powertp NFS SERVER stop

In this example:

Considerations When Configuring the SP Switch

When implementing HACMP ES with the SP switch, consider the following:

DB2 HACMP Configuration Examples

The following examples illustrate different failover support configurations and show what happens when failure occurs.

In the case of DB2 HACMP mutual takeover configurations (Figure 106, Figure 107, and Figure 108):

Figure 106. Mutual Takeover with NFS Failover - Normal

Mutual Takeover with NFS Failover - Normal

Figure 107. Mutual Takeover with NFS Failover - NFS Failover

Mutual Takeover with NFS Failover - NFS Failover

Figure 108. Mutual Takeover with NFS Failover - DB2 Failover

Mutual Takeover with NFS Failover - DB2 Failover

In the case of DB2 HACMP hot standby configurations (Figure 109 and Figure 110):

Figure 109. Hot Standby with NFS Failover - Normal

Hot Standby with NFS Failover - Normal

Figure 110. Hot Standby with NFS Failover - DB2 Failover

Hot Standby with NFS Failover - DB2 Failover

In the case of DB2 HACMP mutual takeover without NFS failover configurations (Figure 111 and Figure 112):

Figure 111. Mutual Takeover without NFS Failover - Normal

Mutual Takeover without NFS Failover - Normal

Figure 112. Mutual Takeover without NFS Failover - DB2 Failover

Mutual Takeover without NFS Failover - DB2 Failover

DB2 HACMP Startup Recommendations

It is recommended that you do not specify that HACMP is to be started at boot time in /etc/inittab. HACMP should be started manually after the nodes are booted. This allows for non-disruptive maintenance of a failed node.

As an example of "disruptive maintenance", consider the case in which a node has a hardware failure and crashes. Failover is initiated automatically by HACMP, and recovery completes successfully. However, the failed node needs to be fixed. If HACMP was configured in /etc/inittab to start on reboot, this node will try to reintegrate after boot completion, which is not desirable in this case.

For "non-disruptive maintenance", consider manually starting HACMP on each node. In this way, failed nodes can be fixed and reintegrated without affecting the other nodes. The ha_cmd script is provided for controlling HACMP start and stop commands from the control workstation.
Note:When creating a DB2 instance for the first time, the following entry is appended to the /etc/inittab file:
   rcdb2:2:once:/etc/rc.db2 > /dev/console 2>&1 # Autostart DB2 Services
If HACMP or HACMP ES is enabled, update the /etc/inittab file by placing the above line before the HACMP entry. Following is a sample HACMP entry in the /etc/inittab file:
   clinit:a:wait:touch /usr/sbin/cluster/.telinit # HACMP for AIX
The entry must be the last entry in the /etc/inittab file.


[ Top of Page | Previous Page | Next Page ]