The planning and installation process described in this guide helps a system administrator to get IBM Cluster Systems Management for Linux (CSM), hereafter known as Cluster Systems Management, up and running easily by setting up a management server and managed nodes either on existing Linux systems or by doing a full installation of the Linux operating system and CSM on the nodes of the cluster (not the management server, which must be set up first).
The application is available as an installation image in a directory or on the CSM CD-ROM. This document describes the minimum hardware and software requirements needed to use this product. See the Specified Operating Environment.
Information is also provided about planning and pre-installation tasks that you need to perform so that the installation will go smoothly and easily. Next, there is a step-by-step procedure for installing and setting up the cluster either on an existing group of nodes (CSM-only installation) or for a full installation of both the operating system and CSM (Kickstart installation). Finally, a troubleshooting section is provided in the form of frequently asked questions. You should read this document carefully and be familiar with it throughout before beginning the installation and set-up tasks.
This section describes the minimum hardware and software that are required for the Cluster Systems Management specified operating environment. For more detailed information, see the announcement.
This product is supported on the IBM Cluster 1300
Model 7080 Primary Rack, 7080 Expansion Rack, 7081, and 7082.
For the management server, a minimum of 128MB of memory and 120MB of disk space is required for installing CSM, and an additional 1.5GB is required for a full installation.
For the managed node, a minimum of 128MB of memory and 20MB of disk space is required for CSM and the appropriate amount of additional disk space for the operating system and RPMS that you choose to install.
TCP/IP and an Ethernet adapter
In configuring a Cluster Systems Management cluster, give particular attention to the following:
To support remote control, the following hardware is required:
A cluster of up to 128 nodes is supported.
IBM Cluster Systems Management for Linux has requirements for non-IBM software as well as IBM-developed software. As a convenience, the required non-IBM software that is not part of the Red Hat distribution is included on the CSM CD-ROM. Unless otherwise specified, the software is required on the management server and on the managed nodes.
The following file sets comprise the IBM Cluster Systems Management for Linux product:
The Red Hat Linux 7.1 distribution is required. The Red Hat 7.1 distribution includes the following required non-IBM software:
In addition, IBM provides the following required non-IBM software:
It is mandatory that the CSM client RPM be installed either on all of the General Parallel File System for Linux (GPFS) nodes or on none of them. The Resource Monitoring and Control (RMC) subsystem requires that the definitions of its resource classes be the same on all nodes within the GPFS cluster. If the CSM client RPM is installed on only some of the GPFS nodes, some of the RMC daemons will not be able to join their peer group successfully, an undesirable situation. The CSM management server node should not be part of the GPFS cluster because it adds other resource classes to RMC, and it is not practical or desirable to install the CSM server RPM on all of the GPFS nodes.
GPFS and CSM should be using the same level of RSCT code. The current release of RSCT is Version 2.2.
This section describes the tasks that you need to perform before installing this product. Before you do anything else, ensure that the required IBM software is installed.
Read IBM Cluster Systems Management for Linux Remote Control Guide and Reference before beginning the installation process. IBM Cluster Systems Management for Linux Remote Control Guide and Reference contains illustrated information on hardware and networking set up and configuration, including the default installation values; for example, eth0, which is the value for the first Ethernet adapter that is connected to the cluster VLAN. It also provides a sample completed node-attribute table that you can use for guidance in completing your own blank template. A blank template is provided in Appendix A, Node Attributes Template for your convenience.
There are several tasks that the administrator must do to prepare for installation of Cluster Systems Management:
configure make make install
During the full installation process, Kickstart installs the operating system on each node. The Kickstart post-installation script does any additional set-up. As the node boots for the first time, the makenode command is run, which converts the node from the PreManagedNode class to the ManagedNode class.
Before installing CSM, create a partition called /tftpboot that consists of 1GB of space for a full installation, which includes the operating system, or 100MB for a CSM-only installation. This will hold the required RPM and tarball packages for installation.
To create /tftpboot by using cfdisk, do the following:
Make a note of the device name and number of the new partition because you will need it for the next steps. Examples of the new partition name might be similar to /dev/hda7 or /dev/sda8.
mkfs /dev/device
(where device is the name of the new partition; for example, hda7 or sda8.)
mkdir /tftpboot
mount /dev/device /tftpboot
/dev/device /tftpboot ext2 defaults 1 2
A distributed shell program (dsh) is used to run commands on the nodes. It is contained in the csm.dsh RPM and installed by the installms command. The dsh program uses a remote shell of your choice to issue remote commands to the managed nodes from the management server. The following preparation to enable the remote shell is required on each node before dsh is installed:
The DSH_REMOTE_CMD environment variable is used to specify a remote shell other than the default. This environment variable should always be set when CSM commands are issued because some CSM commands use dsh internally and will use rsh as the default if DSH_REMOTE_CMD is not set. The full install process configures rsh on the nodes if DSH_REMOTE_CMD is set to rsh (or if DSH_REMOTE_CMD is not set). This configuration adds the management server to /root/.rhosts, starting the rsh daemon on the nodes.
For more information on dsh, see the man page or the IBM Cluster Systems Management for Linux Technical Reference.
Two installation processes are offered to install the nodes. If the Linux operating system is not installed, you can install both the Linux operating system and Cluster Systems Management on the nodes. This is referred to as a full installation. If a Linux operating system already exists, you can install just Cluster Systems Management. This is referred to as a CSM-only installation.
For both a full installation and a CSM-only installation, you can take one of two approaches. For a simple installation without interim verification, you can run the following commands in sequence:
The addnode command runs definenode, setupks (if this is a full installation), and then installnode automatically.
If you need more control and would like the ability to double-check and make interim changes during installation, run the underlying commands individually as follows:
For a CSM-only installation, do the following:
For a full installation, do the following:
All of these commands are run on the management server. Details on these commands can be found in their man pages or in IBM Cluster Systems Management for Linux Technical Reference.
The installms command performs the tasks that are necessary to make this system a management server. It installs the Open Source and IBM CSM software listed in Specified Operating Environment on the management server automatically if it is not already installed or if it is installed at a previous level.
IBM suggests that you set up the /tftpboot partition before you run installms. You also need to provide and mount the CSM distribution CD-ROM. The default mount point is /mnt/cdrom.
The program first copies installation packages from a download directory or from the CD-ROM that contains the CSM application to /tftpboot/rpm and /tftpboot/tarball.
After installms has been run successfully, the management server is installed. You are ready to run the definenode or addnode command next.
For more information on installms, see the man page or the IBM Cluster Systems Management for Linux Technical Reference.
After the management server is installed by running installms, run definenode to define all of the nodes in the cluster or addnode to define and install the nodes in the cluster. These commands have certain prerequisites, which you need to be aware of.
Before you run definenode, you must prepare certain information and do some manual set up.
A node that has already been defined cannot be redefined with the definenode command. Including such a node in the command-line input causes the command to fail without defining any nodes. Including such a node in the node definition (nodedef) file, causes the definition of that node to fail, but the other nodes specified in the file are defined successfully. An error message is issued for the undefined node or nodes.
Before you run definenode or addnode, information needs to be gathered and recorded on a template similar to that in Appendix A, Node Attributes Template. A completed example of this table can be found in IBM Cluster Systems Management for Linux Remote Control Guide and Reference. This information can be entered into a node definition (nodedef) file, or it can be entered at the command line.
A nodedef file allows you enter the host names of the nodes that you wish to define in a file and thus avoid the error-prone task of having to type all of the host names on the command line in order to define nodes. If you intend to use a nodedef file, start with the sample file in /opt/csm/install/nodedef.sample and complete the information from the node-attribute planning template that you completed earlier.
See the nodedef man page or IBM Cluster Systems Management for Linux Technical Reference for more details about the node definition file.
Attention: |
---|
IBM suggests using the nodedef file and not the command line to define the nodes in the cluster. |
The definenode command defines all the nodes in the cluster. It does not actually install the nodes. Node installation is done by installnode. If you run addnode, you do not need to run installnode because addnode runs installnode for you.
If some arguments are not provided, the command prompts you for each piece of information that it needs. If you should inadvertently miss a required option, the command prompts you for the missing information.
You can either use a node-definition file to define the nodes, console servers, and service processors to the cluster, or you can enter the information from the command line. To use a node definition file in order to define the nodes, console server information and service processors, type:
definenode -f nodedef
To see the arguments that you need to enter from the command line, type:
definenode -h
To define the node with the host name clsn02.ppd.pok.ibm.com to the cluster, type:
definenode -n clsn02.ppd.pok.ibm.com
The command prompts for missing information when some or all of the arguments are not provided.
See the man page or IBM Cluster Systems Management for Linux Technical Reference for details on definenode or addnode command-line syntax and more examples of the usage of the command.
See Example of definenode command run interactively for an example that demonstrates the interactive approach.
After definenode has been run successfully, verify the node definitions, and then run installnode. See Verifying node definitions for details.
Some error messages may be returned if definenode is not completely successful. See FAQs, hints, tips, and troubleshooting for troubleshooting information.
If you run the definenode command without any options, the program prompts you for the required information. Also, if you miss a piece of required information, the program prompts you for that information.
The following example shows sample input for nodes, console servers, and service processors with an interactive program. The example uses the definenode command, but the addnode command can be used instead with the same usage and arguments. Instead of requiring you to enter everything at once on the command line, the interactive program allows you to enter a little bit at a time. User input is shown in bold type.
Enter starting node name (hostname or IP address): clsn01 Enter number of nodes to define (default = 1): 12 Enter list of Hardware Control Points (press ENTER for none): Format: hwctrlpt[:method:spname][,...] hwctrlpt = Hardware Control Point hostname or IP address. method = Power method (default=netfinity) spname = Starting service processor name or 'hostname' (default=node01) Example: hwctrlpt1::node06,hwctrlpt2,hwctrlpt3 Example: hwctrlpt1::hostname,hwctrlpt2::hostname mgtn03,mgtn04::hostname Enter list of Console Servers (press ENTER for none): Format: csname[:method:csnum:port][, ...] csname = Console server name (hostname or IP address) method = Console method (default=esp) csnum = Console server number (default=1) port = Starting console port number (default=1) Example: cs1:::4,cs2:conserver,cs3 mgtn02 Enter Hardware Type (default = netfinity): netfinity Enter Install Method (csmonly or kickstart, default=csmonly): kickstart definenode: Adding CSM Nodes: definenode: Adding Node clsn01.ppd.pok.ibm.com(9.114.133.193) definenode: Adding Node clsn02.ppd.pok.ibm.com(9.114.133.194) definenode: Adding Node clsn03.ppd.pok.ibm.com(9.114.133.195) definenode: Adding Node clsn04.ppd.pok.ibm.com(9.114.133.196) definenode: Adding Node clsn05.ppd.pok.ibm.com(9.114.133.197) definenode: Adding Node clsn06.ppd.pok.ibm.com(9.114.133.198) definenode: Adding Node clsn07.ppd.pok.ibm.com(9.114.133.199) definenode: Adding Node clsn08.ppd.pok.ibm.com(9.114.133.200) definenode: Adding Node clsn09.ppd.pok.ibm.com(9.114.133.201) definenode: Adding Node clsn10.ppd.pok.ibm.com(9.114.133.202) definenode: Adding Node clsn11.ppd.pok.ibm.com(9.114.133.203) definenode: Adding Node clsn12.ppd.pok.ibm.com(9.114.133.204)
After definenode has run, the management server has been set up with all the node information for CSM.
For a CSM-only installation, the cluster nodes are now ready to be installed.
For a full installation, setupks must still be run before verification can take place. Perform the activities in Running setupks before attempting to install the cluster nodes for a full installation.
When you are ready, this section describes how to verify and customize the cluster node definitions before the actual installing of the nodes takes place. Since the actual node installation has not happened yet, you can make changes to any node definitions here.
Verify the csm node information as follows:
If something needs to be corrected, either you can use rmnode -P to remove the node that was not successfully defined and then rerun definenode with the correct arguments, or you can use chnode -P to make changes to any attributes of a node. Note that all of the attributes for a node might not be completed at this point. See the chnode, definenode, lsnode, and rmnode man pages or the IBM Cluster Systems Management for Linux Technical Reference for more information.
The definenode command is run to set up the hardware-control-point and console-server attributes that are needed by the rpower and rconsole commands. Before running setupks to perform a full installation, you need to make sure the internal service processor (ISP) passwords are correct.
The ISP password file is created from /etc/opt/csm/netfinity_power.config.templ when definenode is run. Afterwards you can modify the netfinity_power.config file to specify individual passwords and user IDs for each node, if needed. Or, you can leave the default passwords if that is suitable.For more information on netfinity_power.config, see the man page or the IBM Cluster Systems Management for Linux Technical Reference.
More information and additional steps on changing the user IDs and passwords are supplied in IBM Cluster Systems Management for Linux Remote Control Guide and Reference.
The setupks command collects configuration information and uses a template create the Kickstart configuration file for each node containing the information that it has collected. The command also:
A sample annotated template is located in the Appendix. The current template is located on your system at /opt/csm/install/kscfg.tmpl.
You can use the template as-is or modify it. See the annotated kscfg.tmpl file in the Appendix for instructions on how to modify this file. Modifications made to the template file affect the entire cluster and should be made before running setupks.
You can also modify the generated Kickstart configuration file for each node. The resulting Kickstart configuration file that is generated for each node by setupks is called /tftpboot/ks71/node-ip-address-kickstart Modifying this file affects only the settings on this node and should be done after running setupks.
In particular, there are variables in the Kickstart configuration template and the generated file in the format #VARIABLE# that must not be deleted. Many of these are automatically customized with the appropriate values during the process of generating the Kickstart configuration file. For example, the following are some of the variables that are automatically customized (see Appendix B, Appendix B, Sample kscfg.tmpl File for all of the variables):
The following information is in the kscfg.tmpl file and in the generated Kickstart configuration file for each node, and can be modified if needed:
A sample disk partition table is provided, which can be modified.
A standard list is provided; you can modify the list, or you can use your own list.
This script does standard set-up and CSM-specific set-up. The script may be modified to suit your installation provided the marked sections are not altered.
Data is collected for each node for the /etc/dhcp.conf file and Kickstart configuration file.
The following information is collected for the /etc/dhcp.conf file. Any existing /etc/dhcp.conf file is saved to /etc/dhcp.conf.orig and a new file is created. After running setupks, any entries that were previously in /etc/dhcp.conf should be replaced manually.
This information is collected by the getmacs command and goes into the Macaddr attribute of the PreManaged node object and in the /etc/dhcp.conf file, provided a value does not already exist. An existing value will not be replaced. See IBM Cluster Systems Management for Linux Technical Reference for details.
A firstboot file is also generated for each node. The template file is in /opt/csm/install/firstboot.tmpl and the generated files for each node are in /tftpboot/bin/<node-hostname>.firstboot. The firstboot script is run the first time a node boots from its hard drive after it is installed. The purpose of the firstboot script is to run the makenode command. Code may be added to the template or script to run additional functions during firstboot.
This command is used to install the nodes in the cluster by running makenode remotely on each node for a CSM-only installation or by using Kickstart to run makenode for a full installation. The appropriate software listed in Specified Operating Environment is installed automatically by the installnode command on the nodes if it is not already installed or if it is installed at a previous level.
The monitorinstall command displays the installation status for each node. In addition, the makenode.log file is maintained on each node in /var/log/csm/makenode.log that contains information on what happened during installation on each node and a similar installnode.log file is maintained on the management server with information on the whole installation process. For a full installation, a /var/log/csm/kickstart.log file and /var/log/csm/firstboot.log file on each node contain information about Kickstart and firstboot post-installation.
For more information on installnode and makenode, see the man pages or the IBM Cluster Systems Management for Linux Technical Reference.
Before the installnode command can be run, the following must be done on the management server:
After a full install, the boot order in the BIOS of the node can be left as is: floppy, CD-ROM, network, hard disk. In this case, every time the node boots it uses dhcp to contact the management server, which uses pxelinux to boot the node from its hard drive. Or after the full installation is complete, you can change the boot order in the BIOS back to: floppy, DC-ROM, hard disk, network.
The installation monitor tool is started by running the monitorinstall command. The tool displays the status of the installation on each of the nodes. It returns the following types of status: installed, installing, not installed, failed install.
You can also run the rconsole command to view installation progress on each node for a full installation. Type the following:
rconsole -Pa
You can add another node to the cluster either by running definenode, setupks if this is a full installation, and then installnode again or by running addnode again. To add a new node to the cluster, do the following:
lsnode -Al
And, type the following to see the attributes of premanaged nodes:
lsnode -AlP
definenode
You will be prompted for the required information.
lsnode -Al nodename
Removing a node from a cluster does not uninstall CSM and its prerequisites from the node. Rather, it disassociates the node from its management server. It removes the node from the database of the management server, and it informs the node that it is no longer attached to the management server. To remove a node from the cluster, type:
rmnode hostname
A removed node can be added back into the cluster by running definenode, setupks if this is a full installation, and installnode or addnode again.
This section tells you how you can determine whether the installation was successful. It also gives you some suggestions on how to get started using Cluster Systems Management. After installation is successfully completed, remote RMC and CSM commands are enabled. To verify that the installation was successful, enter the following commands. If everything is as it should be, you should see the following results:
dsh -a date
A list of nodes with the date on each node is returned.
mkdir /cfmroot/etc cp /etc/passwd /cfmroot/etc/passwd startcondresp "CFMRootModTimeChanged" "CForce"
rpower -a query
A list of nodes with their associated power state is returned.
lsnode -p
The ping status of the nodes is returned.
To try out monitoring, use the following example.
startcondresp NodeReachability BroadcastEventsAnyTime
lscondition
lscondresp
nodegrp -a c5bn07,c5bn08,c5bn09,c5bn10,c5bn11 servers
nodegrp -a c5bn12,c5bn13 admin
nodegrp -l
The output is similar to:
admin servers
nodegrp servers
The output is:
c5bn07.ppd.pok.ibm.com c5bn08.ppd.pok.ibm.com c5bn09.ppd.pok.ibm.com c5bn10.ppd.pok.ibm.com c5bn11.ppd.pok.ibm.com
dsh -N servers vmstat | dshbak
The output is similar to:
HOST: c5bn08.ppd.pok.ibm.com ---------------------------- procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 4 1 442440 192576 56292 635808 0 0 0 0 1 1 0 0 0 HOST: c5bn09.ppd.pok.ibm.com ---------------------------- procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 4 1 423692 214232 56240 615396 0 0 0 0 1 1 0 0 0 HOST: c5bn10.ppd.pok.ibm.com ---------------------------- procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 4 1 405904 162404 56248 604424 0 0 0 0 4 1 0 0 1 HOST: c5bn11.ppd.pok.ibm.com ---------------------------- procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 4 1 443564 135240 56212 636256 0 0 0 0 4 1 0 0 1