Generally, in an active/passive cluster failover configuration, one or more passive or standby nodes are available to take over for failed nodes. Only the primary node is used for processing. When a node fails, the standby node takes over the resources and the identity of the failed node. The services provided by the failed node are started on the standby node. After the “take over”, clients are able to access the services unaware that the services are being provided by a different node.
The following figure illustrates an active/passive database failover configuration. Both the active/passive nodes share the same disk subsystem although only the primary database server has access to the disk subsystem. The path from the standby node to the shared disk subsystem is not activated.
During normal operations, the application connects to the database server with a hostname of dbprod that gets resolved to an IP address of 192.168.10.1.
During a node failure, the following typically occurs.
On the original primary node:
On the standby node:
These failover or takeover steps can be automated. Some of the software that can be used include:
Fully automated, the failover could take 5 to 10 minutes.
In subsequent sections, we present the use of active/passive failover configurations to protect many of the Sterling Selling and Fulfillment Foundation components in more detail.