Administration Guide

Cluster Configuration

In a hot standby configuration, the AIX processor node that is the takeover node is not running any other workload. In a mutual takeover configuration, the AIX processor node that is the takeover node is running other workloads.

Generally, DB2 Universal Database Enterprise - Extended Edition (UDB EEE) runs in mutual takeover mode with partitions on each node. One exception is a scenario in which the catalog node is part of a hot standby configuration.

When planning a large DB2 installation on an RS/6000 SP using HACMP ES, you need to consider how to divide the nodes of the cluster within or between the RS/6000 SP frames. Having a node and its backup in different SP frames allows takeover in the event one frame goes down (that is, the frame power/switch board fails). However, such failures are expected to be exceedingly rare, because there are N+1 power supplies in each SP frame, and each SP switch has redundant paths, along with N+1 fans and power. In the case of a frame failure, manual intervention may be required to recover the remaining frames. This recovery procedure is documented in the SP Administration Guide. HACMP ES provides for recovery of SP node failures; recovery of frame failures is dependent on the proper layout of clusters within one or more SP frames.

Another planning consideration is how to manage big clusters. It is easier to manage a small cluster than a big one; however, it is also easier to manage one big cluster than many smaller ones. When planning, consider how your applications will be used in your cluster environment. If there is a single, large, homogeneous application running, for example, on 16 nodes, it is probably easier to manage the configuration as a single cluster rather than as eight two-node clusters. If the same 16 nodes contain many different applications with different networks, disks, and node relationships, it is probably better to group the nodes into smaller clusters. Keep in mind that nodes integrate into an HACMP cluster one at a time; it will be faster to start a configuration of multiple clusters rather than one large cluster. HACMP ES supports both single and multiple clusters, as long as a node and its backup are in the same cluster.

HACMP ES failover recovery allows pre-defined (also known as cascading) assignment of a resource group to a physical node. The failover recovery procedure also allows floating (or rotating) assignment of a resource group to a physical node. IP addresses, and external disk volume groups, or file systems, or NFS file systems, and application servers within each resource group specify either an application or an application component, which can be manipulated by HACMP ES between physical nodes by failover and reintegration. Failover and reintegration behavior is specified by the type of resource group created, and by the number of nodes placed in the resource group.

For example, consider a DB2 database partition (logical node). If its log and table space containers were placed on external disks, and other nodes were linked to those disks, it would be possible for those other nodes to access these disks and to restart the database partition (on a takeover node). It is this type of operation that is automated by HACMP. HACMP ES can also be used to recover NFS file systems used by DB2 instance main user directories.

Read the HACMP ES documentation thoroughly as part of your planning for recovery with DB2 UDB EEE. You should read the Concepts, Planning, Installation, and Administration guides, then build the recovery architecture for your environment. For each subsystem that you have identified for recovery, based on known points of failure, identify the HACMP clusters that you need, as well as the recovery nodes (either hot standby or mutual takeover). This is a starting point for completing the HACMP worksheets that are included in the documentation.

It is strongly recommended that both disks and adapters be mirrored in your external disk configuration. For DB2 physical nodes that are configured for HACMP, care is required to ensure that nodes on the volume group can vary from the shared external disks. In a mutual takeover configuration, this arrangement requires some additional planning, so that the paired nodes can access each other's volume groups without conflicts. For DB2 UDB EEE, this means that all container names must be unique across all databases.

One way to achieve uniqueness is to include the partition number as part of the name. You can specify a node expression for container string syntax when creating either SMS or DMS containers. When you specify the expression, the node number can be part of the container name or, if you specify additional arguments, the results of those arguments can be part of the container name. Use the argument " $N" ([blank]$N) to indicate the node expression. The argument must occur at the end of the container string, and can only be used in one of the following forms:

Table 56. Arguments for Creating Containers

The node number is assumed to be five.
Syntax Example Value
[blank]$N " $N" 5
[blank]$N+[number] " $N+1011" 1016
[blank]$N%[number] " $N%3" 2
[blank]$N+[number]%[number] " $N+12%13" 4
[blank]$N%[number]+[number] " $N%3+20" 22

Notes:

% is modulus.
In all cases, the operators are evaluated from left to right.

The node number is assumed to be five.
Syntax	Example	Value
[blank]$N	" $N"	5
[blank]$N+[number]	" $N+1011"	1016
[blank]$N%[number]	" $N%3"	2
[blank]$N+[number]%[number]	" $N+12%13"	4
[blank]$N%[number]+[number]	" $N%3+20"	22
Notes: % is modulus. In all cases, the operators are evaluated from left to right.

Following are some examples of how to create containers using this special argument:

Creating containers for use on a two-node system.

   CREATE TABLESPACE TS1 MANAGED BY DATABASE USING
      (device '/dev/rcont $N' 20000)

The following containers would be used:

   /dev/rcont0   - on Node 0
   /dev/rcont1   - on Node 1

Creating containers for use on a four-node system.

   CREATE TABLESPACE TS2 MANAGED BY DATABASE USING
      (file '/DB2/containers/TS2/container $N+100' 10000)

The following containers would be used:

   /DB2/containers/TS2/container100   - on Node 0
   /DB2/containers/TS2/container101   - on Node 1
   /DB2/containers/TS2/container102   - on Node 2
   /DB2/containers/TS2/container103   - on Node 3

Creating containers for use on a two-node system.

   CREATE TABLESPACE TS3 MANAGED BY SYSTEM USING
      ('/TS3/cont $N%2, '/TS3/cont $N%2+2')

The following containers would be used:

   /TS3/cont0   - on Node 0
   /TS3/cont2   - on Node 0
   /TS3/cont1   - on Node 1
   /TS3/cont3   - on Node 1

Figure 104 and Figure 105 show an example of a DB2 SSA I/O subsystem configuration, and some of the planning necessary to ensure both a highly available external disk configuration, and the ability to access all volume groups without conflict.

Figure 104. No Single Point of Failure

Figure 105. Volume Group and Logical Volume Setup

Configuring a DB2 Database Partition

Once configured, each database partition in an instance is started by HACMP ES, one physical node at a time. Multiple clusters are recommended for starting parallel DB2 configurations that are larger than four nodes. Note that in a 64-node parallel DB2 configuration, it is faster to start 32 two-node HACMP clusters in parallel, than four 16-node clusters.

A script file, rc.db2pe, is packaged with DB2 UDB EEE (and installed on each node in /usr/bin) to assist in configuring for HACMP ES failover or recovery in either hot standby or mutual takeover nodes. In addition, DB2 buffer pool sizes can be customized during failover in mutual takeover configurations from within rc.db2pe. (Buffer pool sizes need to be configured to ensure proper performance when two database partitions run on one physical node.)

When you create an application server in an HACMP configuration of a DB2 database partition, specify rc.db2pe as a start and stop script as follows:

   /usr/bin/rc.db2pe <instance> <dpn> <secondary dpn> start <use switch>
   /usr/bin/rc.db2pe <instance> <dpn> <secondary dpn> stop <use switch>
 
   where:
 
   <instance> is the instance name.
   <dpn> is the database partition number.
   <secondary dpn> is the "companion" database partition number in
      mutual takeover configurations only; in hot standby configurations,
      it is the same as <dpn>.
   <use switch> is usually blank; when blank, it indicates that
      the SP switch network is used for the hostname field
      in the db2nodes.cfg file (all traffic for DB2 is routed over the SP switch);
      if not blank, the name used is the host name of the SP node to be used.

The DB2 command LIST DATABASE DIRECTORY is used from within rc.db2pe to find all databases configured for this database partition. The script file then looks for the /usr/bin/reg.parms.DATABASE file and the /usr/bin/failover.parms.DATABASE file, where DATABASE is each of the databases configured for this database partition. In a mutual takeover configuration, it is recommended that you create the parameter files reg.parms.xxx and failover.parms.xxx. In the failover.parms.xxx file, the settings for BUFFPAGE, DBHEAP, and any others affecting buffer pools, should be adjusted to account for the possibility of more than one buffer pool. Sample files reg.parms.SAMPLE and failover.parms.SAMPLE are provided for your use.

One of the important parameters in this environment is the start_stop_time database manager configuration parameter, which has a default value of 10 minutes. However, rc.db2pe sets this parameter to 2 minutes. You should set this parameter through rc.db2pe to a value of 10 minutes, or slightly more. In this context, the specified duration is the time interval between the failure of the partition, and its recovery. If applications running on a partition are issuing frequent COMMITs, 10 minutes following failure on a database partition should be sufficient time to roll back uncommitted transactions and to reach a point of consistency in the database on that partition. If your workload is heavy, or you have many partitions, you may need to increase the duration to decrease the probability of timeouts occurring before the rollback operation completes.

Following is an example of a hot standby configuration and a mutual takeover configuration. In both examples, the resource groups contain a Service IP switch alias address. This switch alias address is used for:

NFS access to a file server for the DB2 instance owner file systems
Other client access that needs to be maintained in the case of a failover, TSM (Tivoli Storage Manager, formerly ADSM) connection, or other similar operation.

If your implementation does not require these aliases, they can be removed. If removed, be sure to set the MOUNT_NFS parameter to NO in the rc.db2pe script file.

Example of a Hot Standby Configuration

The assumption in this example is that a hot standby configuration exists between physical nodes 1 and 2, and that the DB2 instance name is POWERTP. The database partition is 1, and the database is TESTDATA, residing on file system /database.

   Resource group name: db2_dp_1
   Node Relationship: cascading
   Participating nodenames: node1_eth, node2_eth
   Service_IP_label: nfs_switch_1     (<<< this is the switch alias address)
   Filesystems: /database/powertp/NODE0001
   Volume Groups: DB2vg1
   Application Servers: db2_dp1_app
   Application Server Start Script: /usr/bin/rc.db2pe powertp 1 1 start
   Application Server Stop Script: /usr/bin/rc.db2pe powertp 1 1 stop

Example of a Mutual Takeover Configuration

The assumption in this example is that a mutual takeover configuration exists between physical nodes 1 and 2, and that the DB2 instance name is POWERTP. The database partitions are 1 and 2, and the database is TESTDATA, residing on file system /database.

   Resource group name: db2_dp_1
   Node Relationship: cascading
   Participating nodenames: node1_eth, node2_eth
   Service_IP_label: nfs_switch_1     (<<< this is the switch alias address)
   Filesystems: /database/powertp/NODE0001
   Volume Groups: DB2vg1
   Application Servers: db2_dp1_app
   Application Server Start Script: /usr/bin/rc.db2pe powertp 1 2 start
   Application Server Stop Script: /usr/bin/rc.db2pe powertp 1 2 stop
 
   Resource group name: db2_pd_2
   Node Relationship: cascading
   Participating nodenames: node2_eth, node1_eth
   Service_IP_label: nfs_switch_2     (<<< this is the switch alias address)
   Filesystems: /database/powertp/NODE0002
   Volume Groups: DB2vg2
   Application Servers: db2_dp2_app
   Application Server Start Script: /usr/bin/rc.db2pe powertp 2 1 start
   Application Server Stop Script: /usr/bin/rc.db2pe powertp 2 1 stop

Configuration of an NFS Server Node

The rc.db2pe script can also be used to make available NFS-mounted directories of DB2 parallel instance user directories. This can be accomplished by setting the MOUNT_NFS parameter to YES in the rc.db2pe script file, and configuring the NFS failover server pair as follows:

Configure the home directory and export it as "root" using /etc/exports and the exportfs command to the IP address used on the nodes in the same subnet as the NFS server's IP address. Include both the HACMP boot and service addresses. The NFS server's IP address is the same address as the service address in HACMP, and which can be taken over by a backup. The home directory of the DB2 instance owner should be NFS-mounted directly, not automounted. (The use of the automounter is not supported by the scripts as a DB2 instance owner home directory.)
Using SMIT or a bottom-line configuration, create a separate /etc/filesystems entry for this file system, so that all nodes in the DB2 parallel grouping, including the file server, can mount using the NFS file system command.
For example, an /nfshome JFS file system can be exported to all nodes as /dbhome. Each node creates an NFS file system /dbname, which is nfs_server:/nfshome. Therefore, the home directory of the DB2 instance owner would be /dbhome/powertp if the instance name is "powertp".
Ensure that the NFS parameters for the mount in /etc/filesystems are "hard", "bg", "intr", and "rw".
Ensure that the DB2 instance owner definitions associated with the home directory /dbhome/powertp in /etc/passwd are the same on all nodes.
The user definitions in an SP environment are typically created on the control workstation, and "supper" or "pcp" is used to distribute /etc/passwd, /etc/security/passwd, /etc/security/user, and /etc/security/group to all nodes.
Do not configure the "nfs_filesystems to export" in HACMP resource groups for the volume group and the file system that is exported. Instead, configure it normally to NFS. The scripts for the NFS server will control the exporting of the file systems.
Ensure that the major number of the volume group where the file system resides is the same on both the primary node and the takeover node. This is accomplished by using importvg with the -V option.
Verify that the MOUNT_NFS parameter is set to YES in the rc.db2pe script file, and that each node has the NFS file system to mount in /etc/filesystems. If this is not the case, rc.db2pe will not be able to mount the file system and start DB2.
If the DB2 instance owner was already created, and you are copying the user's directory structure to the file system you are creating, ensure that you tar (-cvf) the directory. This ensures the preservation of symbolic links.
Do not forget to mirror both the adapters and the disks for the logical volumes, and the file system logs of the file system you are creating.

Example of an NFS Server Takeover Configuration

The assumption in this example is that there is an NFS server file system /nfshome in the volume group nfsvg over the IP address "nfs_server". The DB2 instance name is POWERTP, and the home directory is /dbhome/powertp.

   Resource group name: nfs_server
   Node Relationship: cascading
   Participating nodenames: node1_eth, node2_eth
   Service_IP_label: nfs_server     (<<< this is the switch alias address)
   Filesystems: /nfshome
   Volume Groups: nfsvg
   Application Servers: nfs_server_app
   Application Server Start Script: /usr/bin/rc.db2pe powertp NFS SERVER start
  Application Server Stop Script: /usr/bin/rc.db2pe powertp NFS SERVER stop

In this example:

/etc/filesystems on all nodes would contain an entry for /dbhome as mounting nfs_server:/nfshome. nfs_server is a Service IP switch alias address.
/etc/exports on the nfs_server node and the backup node would include the boot and service addresses, and contain an entry for /nfsfs -root=nfs_switch_1, nfs_switch_2, ....

Considerations When Configuring the SP Switch

When implementing HACMP ES with the SP switch, consider the following:

There are "base" and "alias" addresses on the SP switch. The base addresses are those defined in the SP System Data Repository (SDR), and are configured by rc.switch when the system is "booted". The alias addresses are IP addresses configured, in addition to the base address, into the css0 interface through the ifconfig command with an alias attribute. For example:
```
   ifconfig css0 inet alias sw_alias_1 up
```
When configuring the DB2 db2nodes.cfg file, SP switch "base" IP address names should be used for both the "hostname" and the "netname" field. Switch IP address aliases are only used to maintain NFS connectivity. DB2 failover is achieved by restarting DB2 with the db2start (RESTART) command (which updates db2nodes.cfg).
Do not confuse the switch addresses with the etc/hosts aliases. Both the SP switch addresses and the SP switch alias addresses are real in either etc/hosts or DNS. The switch alias addresses are not another name for the SP switch base address; each has its own separate address.
The SP switch base addresses are always present on a node when it is up. HACMP ES does not configure or move these addresses between nodes.
If you intend to use SP switch alias addresses, configure these to HACMP as boot and service addresses for "heartbeating" and IP address takeover. If you do not intend to use SP switch alias addresses, configure the base SP switch address to HACMP as a service address for "heartbeating" only (no IP address takeover). Do not, in any configuration, configure alias addresses and the switch base address; this configuration is not supported by HACMP ES.
Only the SP switch alias addresses (and not the SP switch base addresses) are moved between nodes for an IP takeover configuration.
The need for SP switch aliases arises because there can only be one SP switch adapter per node. Using alias addresses allows a node to take over another node's switch alias IP address without adding another switch adapter. This is useful in nodes that are "slot-constrained". For more information about handling recovery from SP switch adapter failures, see the network failure section under HACMP ES Script Files.
If you configure the SP switch for IP address takeover, you will need to create two extra alias IP addresses per node: one as a boot address and one as a service address.
Do not forget to use "HPS" in the HACMP ES network name definition for an SP switch base IP address or an SP switch alias IP address.
rc.cluster in HACMP automatically ifconfigs in the SP switch boot address when HACMP is started. No additional configuration is required, other than creating the IP address and name, and defining them to HACMP.
The Eprimary node of the SP switch is the server that implements the Estart, Efence, and Eunfence commands. The HACMP scripts attempt to Eunfence or to Estart a node when HACMP is started, and make the switch available if it is defined as one of its networks. For this reason, ensure that the Eprimary node is available when you start HACMP. The HACMP code waits up to 12 minutes for an Eprimary failover to complete before it exits with an error.
The Eprimary node of the SP switch is moved between nodes by the SP Parallel System Support Program (PSSP), and not HACMP. If an Eprimary node goes offline, the PSSP automatically has a backup node assume responsibility as the Eprimary node. The switch network is unaffected by this change and remains up.

DB2 HACMP Configuration Examples

The following examples illustrate different failover support configurations and show what happens when failure occurs.

In the case of DB2 HACMP mutual takeover configurations (Figure 106, Figure 107, and Figure 108):

HACMP adapters are defined for ethernet, and SP switch alias boot and service aliases -- base addresses are untouched. Remember to use an "HPS" string in the HACMP network name.
The NFS_server/nfshome is mounted as /dbhome on all nodes through switch aliases.
The db2nodes.cfg file contains SP switch base addresses. The db2nodes.cfg file is changed by the db2start (RESTART) command after a DB2 database partition (logical node) failover.
The SP switch alias boot addresses are not shown.
Nodes can be in different SP frames.

Figure 106. Mutual Takeover with NFS Failover - Normal

Figure 107. Mutual Takeover with NFS Failover - NFS Failover

Figure 108. Mutual Takeover with NFS Failover - DB2 Failover

In the case of DB2 HACMP hot standby configurations (Figure 109 and Figure 110):

HACMP adapters are defined for ethernet, and SP switch alias boot and service aliases -- base addresses are untouched. Remember to use an "HPS" string in the HACMP network name.
The NFS_server/nfshome is mounted as /dbhome on all nodes through switch aliases.
The db2nodes.cfg file contains SP switch base addresses. The db2nodes.cfg file is changed by the db2start (RESTART) command after a DB2 database partition (logical node) failover.
The SP switch alias boot addresses are not shown.

Figure 109. Hot Standby with NFS Failover - Normal

Figure 110. Hot Standby with NFS Failover - DB2 Failover

In the case of DB2 HACMP mutual takeover without NFS failover configurations (Figure 111 and Figure 112):

HACMP adapters are defined for ethernet, and SP switch base addresses. Remember that when base addresses are configured to HACMP as service addresses, there is no boot address (only a "heartbeat"). Do not forget to use an "HPS" string in the HACMP network name for the SP switch.
The db2nodes.cfg file contains SP switch base addresses. The db2nodes.cfg file is changed by the db2start (RESTART) command after a DB2 database partition (logical node) failover.
No NFS failover functions are shown.
Nodes can be in different SP frames.

Figure 111. Mutual Takeover without NFS Failover - Normal

Figure 112. Mutual Takeover without NFS Failover - DB2 Failover

DB2 HACMP Startup Recommendations

It is recommended that you do not specify that HACMP is to be started at boot time in /etc/inittab. HACMP should be started manually after the nodes are booted. This allows for non-disruptive maintenance of a failed node.

As an example of "disruptive maintenance", consider the case in which a node has a hardware failure and crashes. Failover is initiated automatically by HACMP, and recovery completes successfully. However, the failed node needs to be fixed. If HACMP was configured in /etc/inittab to start on reboot, this node will try to reintegrate after boot completion, which is not desirable in this case.

For "non-disruptive maintenance", consider manually starting HACMP on each node. In this way, failed nodes can be fixed and reintegrated without affecting the other nodes. The ha_cmd script is provided for controlling HACMP start and stop commands from the control workstation.
Note: When creating a DB2 instance for the first time, the following entry is appended to the /etc/inittab file:
rcdb2:2:once:/etc/rc.db2 > /dev/console 2>&1 # Autostart DB2 Services
If HACMP or HACMP ES is enabled, update the /etc/inittab file by placing the above line before the HACMP entry. Following is a sample HACMP entry in the /etc/inittab file:
clinit:a:wait:touch /usr/sbin/cluster/.telinit # HACMP for AIX
The entry must be the last entry in the /etc/inittab file.

[ Top of Page | Previous Page | Next Page ]