CSM & xCAT Coexistence HOWTO

Version: 1.3
Last revised: 5/23/2003
Author: Vallard Benincosa - vallard@us.ibm.com
As always, comments and suggestions are appreciated and welcomed.

Change History


5/23/2003 1.3 added switch, updated sections
2/19/2003 1.2 added remote bios intructions (6.1)
1/25/2003 1.1 initial release.

Table of Contents

1. Introduction
2. Install xcatcsm tools
   
3.  Installing CSM on an xCAT cluster
3.1 Run xcat2csm --migrate
3.2 Install CSM
3.3 Run xcat2csm
3.4 Other Uses For xcat2csm 
3.5 Verify Conserver and ATFTP work
   
4.  Installing xCAT on a CSM cluster
4.1 Get and install xCAT
4.2 What not to install
4.3 Run csm2xcat
   
5.  Installing a node with CSM or xCAT
5.1 Installing a node via xCAT with CSM installed
5.2 Installing a node via CSM with xCAT installed
   
6 Common Tasks
6.1 CSM remote bios flash
6.2 Collecting Mac Addresses via the switch
   

1. Introduction

xCAT and CSM can coexist on the same cluster, and in fact can be used at the same time. This document will not go into the details of installing xCAT, nor into the details of installing CSM. What it will cover is what to do to make the transition from one to the other. This will include a description of the tools found in the csm.ect rpm. So if you need to install xCAT, please go to http://xcat.org and follow the appropriate links. If you would like to install CSM, please go to
http://www-1.ibm.com/servers/eserver/clusters/library/csmsetup.html.
This document assumes that you have installed either xCAT or CSM on your system and that it is functioning properly.
Also, theses tools (as of today 1/20/2003) have only been tested on RedHat machines and may not work on SUSE or SLES or other distributions of Linux.

2. Install xcatcsm tools

Assuming you have xCAT or CSM installed, you should now obtain the csm.ect rpm. This is available on the IBM alphaworks site at http://alphaworks.ibm.com/tech/ect4linux. Make sure that you have version 1.3.1-14 or greater. Now just install this like you would any other RPM.
# rpm -i /root/csm.ect-1.3.1-14.i386.rpm
You will need to have the perl-DBI 1.21 installed as it is required. Also, csm.ect will only work with csm 1.3.1 installed. (Although it doesn't require any csm to be installed for ect to be installed, this is just so that xcat2csm will work without csm being installed)

3. Installing CSM on an xCAT cluster

So you have xCAT running on your system and it works perfectly. Now, you want to put CSM on the same cluster so that you can use event monitoring and some of the other CSM tools. If you are used to installing nodes via xCAT then you should do so. Define all the attributes in the /opt/xcat/etc/ tables and install the nodes like you would normally do, if you haven't done so already.

3.1 Run xcat2csm --migrate

The first thing to do is run xcat2csm --migrate. This tool is packaged inside the csm.ect rpm. By default, it is installed under /opt/csm/ect/bin,
[root@imaster4 RedHat]# /opt/csm/ect/bin/xcat2csm --migrate
Let's see if CSM is installed...................1.3.1.0
Migrating xCAT configuration files to CSM... 
Would you like to link xCAT install files for CSM to use?
(yes is recommended)  [y/n] y
linking files under /install/rh73
linking files under /install/rh73
linking files under /install/rh73/cdrom
linking files under /install/rh73/cdrom
linking files under /install/rh73/cdrom/RedHat
...
linking files under /install/rh73/RedHat
linking files under /install/rh73/RedHat/base
Migration Complete.  You can now install CSM
When the csm.ect rpm was installed, xcat2csm was put into the /opt/csm/ect/bin directory. Running this command will do a few things to make your system ready for a CSM install. Here is what it does:
  1. It makes a directory called /tmp/bak.
  2. It makes a "hard" link from /install/ to the directories under /csminstall that CSM requires for full installs. This is so that you can still do xCAT linux installs as well as CSM linux installs. The reason it's a hard link and not a soft link is because if you were to NFS mount one of them, then the links would fail.
  3. It checks to see if you have conserver and atftpd running
  4. If you do, (like you should if xCAT is installed properly) then the files atftpd and conserver from the /etc/rc.d/init.d directory will be moved to /tmp/bak.
  5. Shut down atftp and conserver services. This will make it so the xCAT remote console functions (rcons,wcons,etc) do not work as well as many of the installation functions, such as pxe boot.
You may be asking, why do I want to shut down those services? I need those. The reason why is because CSM installs both of those services and it trips if those are already installed. But don't worry, once CSM is installed, then those services will be installed again.

I should note here that the atftp that will be installed by CSM will be version 0.3. CSM installs an RPM while xCAT uses a tarball. The xCAT version will probably be at 0.6 and you may decide that you would rather have 0.6 running instead of 0.3. That's fine. Wait until CSM is installed, then after it is installed do the following:

# vi /etc/xinetd.d/tftp
Add the following line inside the brackets:
service tftp
{
        ...
        disable = yes
        ...
}

# service xinetd restart
# cp /tmp/bak/atftpd /etc/rc.d/init.d/
# service atftpd restart
It will be to your benefit to use the CSM conserver. This is because, when new nodes are added to CSM, the conserver.cf file is automatically updated.

3.2 Install CSM Management Server

Follow the directions in the CSM Installation guides. Do not define the nodes
* note: Make sure that you accept the CSM license before moving on. To accept the "try and buy" license that is only good for a few weeks run:
# /opt/csm/bin/csmconfig -L

If you have the full license, then run:
# /opt/csm/bin/csmconfig -L 

3.3 Run xcat2csm

This time, don't specify any arguments and the nodes will automatically be put in the CSM database:
[root@imaster4 xcat]# lsnode            # this is the CSM command - no nodes
[root@imaster4 xcat]# nodels            # this is the xCAT command - 1 node
i3
[root@imaster4 xcat]# /opt/csm/ect/bin/xcat2csm
Let's see if CSM is installed...................1.3.1.0
Let's see if there are xCAT tables in place......Yep.
Is c5cn20 an ELS or CPS console server? [els | cps]
els
Getting Current CSM configuration...
Defining i3 in CSM database....
[root@imaster4 xcat]# lsnode
i3.clusters.com
In the above example, I only had one node defined in xCAT and no nodes defined in CSM. I ran xcat2csm and it took data from the xcat tables and magically put it into the CSM database.

You should now go through and check the CSM database to make sure that the attributes are the way you like them.

3.4 Other Uses For xcat2csm

You do not need CSM installed to take xCAT tables and put them into a CSM nodedef file. If you run xcat2csm with no arguments when CSM is not installed, it will create a file: /tmp/nodedef that can be used by CSM's definenode command to define all of the nodes. It also creates a file called /tmp/nodegrpdef that takes all your nodegroups in xCAT and moves them to CSM's nodegroup. You have to use the nodegrp command to do this.
Here's how:
# definenode -f /tmp/nodedef
# nodegrp -f /tmpnodegrpdef
You may also want to make a delta file to see what has changed in CSM:
# xcat2csm -d
(puts the file in /tmp/nodedef)
You can also use the -f option to specify where you want the nodedef file to go. (the nodegrp file will still go to /tmp/nodegrpdef)
# xcat2csm -f /home/csm/csmnodedef

3.5 Verify Conserver and ATFTP work

You should be able to run CSM's rconsole and xCAT's rcons,wcons, etc. If you can't then compare the /opt/xcat/etc/conserver.tab with the /etc/opt/conserver/conserver.cf. You'll notice that CSM does not put "localhost" as an allowed client. It may not be worth your time to add "localhost" in the file, since CSM updates this file whenever a new node is added or Console attributes are changed. Instead, you should update the /opt/xcat/etc/conserver.tab to use the actual host name of the management server.
You can verify that atftp is working by going to /tmp and doing the following:
[root@imaster4 tmp]# atftp localhost
tftp> get pxelinux.0
tftp> quit

4. Installing xCAT on a CSM cluster

There are many advantages to adding xCAT tools to a CSM cluster. These advantages include additional functionality such as remote BIOS flash, many hardware setup functions, and HPC stack tools.

4.1 Get and install xCAT - well maybe...

Go to http://xcat.org for information on how to do this. But before you do, read on so that you don't install unnecessary things. ECT will begin shipping xCAT tools that it uses, so in many cases you will not have to install xCAT separately.

4.2 What not to install

You don't need to install atftp or conserver since these are already installed. You will still have all the same functionality any other xCAT server would have, however, if you wish you may want to use the xCAT atftp since it is a later version and some of the bugs have been worked out. See section 3.1 for information on that.

4.3 Run csm2xcat

csm2xcat will convert the CSM database into xCAT tables. csm2xcat is not an onto function, since there are more tables then data contained in the CSM database. For this reason, you should verify and add tables that you need when the function is done.
[root@imaster4 xcat]# csm2xcat
Writing tables in /opt/xcat/etc
Let's see if CSM is installed...................1.3.1.0
Getting Current CSM configuration...
csm2xcat complete

5. Installing a node with CSM or xCAT

If you started out with xCAT on your system and then followed the instructions in section 3 on migrating to CSM, then you may already be set up to do the installation of a node. You may use xCAT or CSM.

5.1 Installing a node via xCAT with CSM installed

Follow the standard xCAT procedures for installing a compute node. Once you are done, then you need to make that node part of the CSM cluster. You will find that the node is in PreManaged state in CSM:
[root@imaster4 bin]# lsnode -a InstallStatus
i3:  PreManaged
To make this node part of the CSM cluster, run updatenode. Before doing so check that RedHat CDs have been copied (or linked via xcat2csm -migrate) to the appropriate directory under /csminstall. You can run csmsetupks to do this if the CDs haven't been copied there yet.
[root@imaster4 RedHat]# csmsetupks -n i3
Copying Red Hat Images from /mnt/cdrom.
Insert Red Hat Linux 7.3 disk 1.
 Press Enter to continue.
...
Now that the CDs are in place run updatenode -P to put all the premanaged nodes in managed state. This will install the CSM client software on the nodes.
[root@imaster4 bin]# updatenode -P
i3.clusters.com: Setting Management Server to imaster4.
i3.clusters.com: Updating RPMs
i3.clusters.com: Node Install - Successful.
[root@imaster4 RedHat]# lsnode -a InstallStatus
i3:  Managed
If you had problems with this command check that the following attributes have been set in the CSM database:
InstallPkgArchitecture, InstallOSName,ManagementServer, InstallDistribution*, and InstallCSMVersion.

5.2 Installing a node via CSM with xCAT installed

There really shouldn't be any interference with xCAT when trying to do a full CSM install.
Provided the CD's were copied in the right directory, you should be able to accomplish the full install with 2 commands: csmsetupks and installnode.
You should note however that xCAT and CSM both overwrite the /etc/dhcpd.conf file. Both programs have a way of backing them up, but you may want to copy yours into a safe place before running csmsetupks (CSM) or makedhcp (xCAT)
[root@imaster4 etc]# csmsetupks -xn i3
Setting up PXE.
Generating /etc/dhcpd.conf file for MAC address collection.
Setting up Kickstart.
10684 blocks
Adding nodes to /etc/dhcpd.conf file for Kickstart install: i3.clusters.com.
[root@imaster4 etc]# installnode -n i3
...

Common CSM/xCAT tasks

CSM remote bios flash

Before I get more into this, let me mention that there is some work underway to make this process much easier. That should be out 3rd Quarter 2003.
There is an excellent remote flash document included in the xcat-dist-core tarball. Let me repeat what that says here, just so there is no mistake:
Remote Flashing is the ability to perform BIOS and Firmware upgrades remotely. xCAT support for remote flashing is very limited and should be considered experimental. Use at your own risk.
The same goes for CSM and the xcat2csm tools.

Provided you read the warning, let's go through an example of updating the remote flash of an x345 node that is currently a CSM Managed node. In this example, CSM has been completely installed but xCAT has not. Here is how to proceed:

  1. Get the latest xCAT distributions. Untar these files so that /opt/xcat is the top directory of the compacted files:
  2. [root@devmstr opt]# cd /opt/
    [root@devmstr opt]# tar zxvf xcat-dist-core-1.1.8.tgz
    xcat/bin/
    xcat/bin/rcad
    xcat/bin/rpower
    xcat/bin/nodeset
    ...
    xcat/windows/sid.cmd
    xcat/windows/reboot.cmd
    
    [root@devmstr opt]# tar zxvf xcat-dist-ibm-1.1.8.tgz
    xcat/flash/basefs.dos/command.com
    xcat/flash/tools/sr-cmdr.exe
    ...
    xcat/i686/sbin/hawkname
    
    
    [root@devmstr opt]# tar zxvf xcat-dist-oss-1.1.8.tgz
    ...
    [root@devmstr opt]# tar zxvf xcat-dist-intel-1.1.8.tgz 
    ...
    
    
  3. Get the latest csm.ect that contains xcat2csm and csm2xcat. This is available off the alphaworks website: http://www.alphaworks.ibm.com/tech/ect4linux Install this by running
    rpm -i csm.ect-<version>.rpm
    
  4. Fill in the HWModel attribute. You can do this by running

  5. chnode -n node HWModel=hw-model-type
    e.g:
    [root@devmstr bin]# chnode node1-node3 HWModel=x345
    [root@devmstr bin]# lsnode -a HWModel
    node1:  x345
    node2:  x345
    node3:  x345
  6. Run csm2xcat
  7. [root@devmstr bin]# ./csm2xcat
    csm2xcat: env XCAT not defined!
    csm2xcat: XCATROOT  does not exist
    It doesn't appear that you have xCAT installed
    We will use "/opt/xcat" as XCATROOT
    Writing tables in /opt/xcat/etc
    Let's see if CSM is installed...................1.3.1.0
    Getting Current CSM configuration...
    csm2xcat complete
  8. Now Take a look at the files that csm2xcat created in /opt/xcat/etc There are several attributes that need to be filled or checked so that remote flash will work.
  9. Make flash directory for your nodes: Since I'm doing this for an x345, here's what I do:
    	[root@devmstr flash]# cd /opt/xcat/flash 
    	[root@devmstrflash]# cp -r x340 x345 
    	[root@devmstr x345]# echo " " > bios.post.bat 
    
    (The bios.post.bat file tells the node to reboot when the bios is complete.
    For the x345's we don't want to reboot after they are done, because they may go into infinite loop. The reason for this is that we have no way to update the /tftpboot/pxelinux.cfg/<IP in HEX> file. The same is true for all machines that are not using the e100pro that the standard x340's x342's, and x330's use.)
  10. Make floppy into dd file: Go to www.pc.ibm.com/support and get the latest bios update images then make a floppy using the directions as specified on the web page. Take the floppy and mount it on your linux machine and run:
    	[root@vallard root]# dd if=/dev/fd0 of=/tmp/345.bios.2.05 
    
    Now take the 345.bios.2.05 file and
    cp /tmp/345.bois.2.05 /opt/xcat/flash/x345 
    ln -s 345.bios.2.05 bios.dd 
    ln -s 345.bios.2.05 cmos.dd 
    cd ../basefs.dos 
    vi autoexec.bat 
    
    Make it say: "DONE,DONE,DONE"
    and remove the lines below (or put #'s in front of them):
    #e100bpkt.com 0x60 16 
    #sshdos -P -S -s #FLASHPASS# #FLASHUSER# #MIP# 
    #boot e100bpkt.com -u
    
  11. Now run the following commands to update the BIOS
    [root@devmstr bin]# /opt/xcat/flash/mkflash 
    [root@devmstr bin]# ./rflash node2.clusters.com bios rflash can render
    a very large number of machines brain dead and useless. Are you sure you
    know what you are doing? (YES to go on) 
    YES 
    node2.clusters.com: flash x345-bios
    node2: ping to 176.60.22.8 failed 
    [root@devmstr bin]# 
    
    The ping fails because we have not yet set up xCAT fping. This is fine and you can ignore the ping message. There now remains for us only to reboot the node. If there were any other errors then you should check the xCAT tables listed above.

    The xCAT documentation says that you shouldn't watch the reboot through the rconsole... I accidentally did this one time, and was pleased that it still functioned. This was only on an x345 however. I dare you to live dangerously, but don't come to me if you break something. This procedure is not warrented and we're not responsible if you break your machine.
    When the bios update is complete, you will see this on the screen:
    Thanks for using the POST/BIOS Update Utility 
    
    A:\>cmosutil.exe /r a:\cmos 
    NS417 chipset detected 
    CMOS settings restored from file
    
    A:\>echo "DONE,DONE,DONE" 
    "DONE,DONE,DONE"
    A:\> 
    A:\> 
    A:\> 
    
    
    Now, you need to remove the hex file so that the next time the node boots up it doesn't try to update its BIOS again:
    [root@devmstr pxelinux.cfg]# cd /tftpboot/pxelinux.cfg/ 
    [root@devmstr pxelinux.cfg]# ls -ltr | tail -1 
    -rwxrwxrwx 1 root root 119 Feb 11 13:40 B03C320C
    [root@devmstr pxelinux.cfg]# cp default B03C320C 
    cp: overwrite `B03C320C'? y 
    [root@devmstr pxelinux.cfg]#
    
    Finally, reboot the node:
    rpower -n node2 reboot
    
    Unfortunately, at this point the rconosole output will be lost. This is because the bios has been set to defaults with the new upgrade. Upon boot, the node will hang at a bios screen indicating errors. You should now go into the bios and configure console redirection, as well as put the startup sequence back to floppy, CDROM, Network, Hard Disk. This will get rid of the "126" configuration errors that were seen before. I believe the reason for this error message is just to show the user that the BIOS has been upgrades, but I am not sure about that.
  12. All done. Now after you have tested this a few times with one node and are ready to do the rest of your nodes, (presuming they all the same bios level) run the rflash against a range of nodes and reboot them. Once completed (and they all say "DONE,DONE,DONE") copy /tftpboot/pxelinux.cfg/default over each HEX file in the /tftpboot/pxelinux.cfg directory. (or you can just remove them all too.). Then reboot them again, modifying the BIOS and all should come out correctly.

Collecting MAC addresses via the switch


Collecting MAC addresses (The Media Access Control address) via the switch is the process where we display the mac address of a node, by querying the switch that the NIC is attached to. This is nice in that we don't have to go look at the back of each ethernet card for the MAC address. Which may or may not be visible. There are several tasks that must be done first for the xcat/csm tools to be able to collect the mac addresses:
  1. Install the csm.ect rpm. It is available at http://alphaworks.ibm.com/tech/ect4linux.
  2. There are only a few tables that need to be filled in by you.
  3. As long as these tables are all set up, then you should be able to run /opt/csm/bin/switch_mac -n node1-node3 to get the MAC address of those three nodes.
    [root@devmstr etc]# /opt/csm/bin/switch_mac -n node1,node2,node3,node4
    MACDATA:node1 eth0 NA 00:02:55:7b:07:6a NA NA
    MACDATA:node2 eth0 NA 00:02:55:7b:06:8e NA NA
    MACDATA:node3 eth0 NA 00:02:55:7b:06:5a NA NA
    MACDATA:node4 eth0 NA 00:02:55:7b:05:c0 NA NA
    
    You'll notice that the output is a bit strange. There are several fields. The important thing is that there is the node name (2nd field) and the MAC address (5th field). The reason for this output is for future implementations of getadapters which will be out in the May '03 release of CSM. The other fields should be ignored.