IBM Cluster Systems Management for Linux (CSM) provides a distributed system management solution for machines, or nodes, that are running the Linux operating system. With this software, an administrator can easily set up and maintain a Linux cluster by using functions like monitoring, hardware control, and configuration file management. The concepts and software are derived from IBM Parallel System Support Programs for AIX(R) (PSSP) and from applications available as open source tools.
Within a CSM cluster, nodes can be added, removed, changed, or listed (with persistent configuration information displayed about each node in the list). Commands can be run across nodes or node groups in the cluster, and responses can be gathered. Nodes and applications can be monitored as to whether they are up or down; CPU, memory, and system utilization can be monitored; and automated responses can be run when events occur in the cluster. Configuration File Manager is provided for synchronization of files across multiple nodes. A single management server is the control point for the CSM cluster.
Note that CSM manages a loose cluster of machines. It does not provide high availability services or fail-over technology, although high-availability clusters can be part of the set of machines that CSM is managing. For information on IBM high availability solutions, see http://www.ibm.com/servers/eserver/pseries/solutions/ha/index.html.
The following diagram (Figure 1) shows a remote control compatible
hardware and networking configuration for CSM with IBM Cluster
1300 (xSeries(TM)) 330 nodes. (See the IBM Cluster Systems Management for Linux Remote Control Guide and
Reference for additional hardware configuration diagrams.) The
Management Server connects to the Management VLAN and the Cluster VLAN through
Ethernet adapters. The terminal server, an Equinox Serial Provider
(ESP) in this example, connects to the Management VLAN through its Ethernet
adapter, and to the cluster nodes through their serial (COM) ports as
shown. (An ESP-16 can connect up to 16 nodes. Other terminal
servers may have different capacities.) The nodes must be connected to
the Cluster VLAN through their Ethernet adapters, and directly or indirectly
to an IBM Remote Supervisor adapter (RSA) through their ISP ports. The
Management VLAN connects to the RSAs in select nodes. (One RSA is
required for every 10 nodes.) The RSAs connect to their own node's
ISP port, and up to 9 more node ISP ports are daisy-chained from there.
Configuration for a Public VLAN is optional and can be defined by the system
administrator.
Figure 1. CSM Remote Control Hardware and Networking Configuration for IBM xSeries 330 Nodes
View figure.
More information is provided on CSM tasks as follows:
The IBM Cluster Systems Management for Linux Planning and Installation Guide provides a simple process for installing and configuring CSM on an existing Linux system, or for doing a full installation of CSM and Linux. The installation process allows you to do the following:
For more information, see the man pages or IBM Cluster Systems Management for Linux Technical Reference for the following set up commands and files:
The distributed management server provides a set of commands for managing nodes. It stores information about nodes in a central repository, and it defines static and dynamic node groups. These definitions are then accessible to the Configuration File Manager (cfm) command for configuration file management, the dsh command for running shell commands remotely, for hardware control, and for monitoring the cluster by using the Event Response subsystem (ERRM). All of these functions make use of the definitions stored by the node group and node repository. Thus, a node group is defined in only one place and is then accessible for use by other functions.
Persistent information on each node can be kept, including operating system type, hostname, machine type, model and serial number. Currently, the user must enter this information manually to store on each node. In the future, the cluster infrastructure will collect much of this information automatically when a node joins the cluster. In addition, the status of the node is determined periodically by means of the fping command.
The node and node group commands are built on top of a Perl DBI layer backed by a set of DBDs (database drivers) so that data can be stored in a variety of formats and shared with other tools.
See the man pages or the IBM Cluster Systems Management for Linux Technical Reference for details on the following commands that manage node and node group information:
You can control the hardware on remote nodes by using the remote control commands. For example, you can control computers on a ship from an office on the mainland, provided the correct connectivity exists.
See the man pages or IBM Cluster Systems Management for Linux Technical Reference for details on the following commands:
Opens a remote console.
Boots and resets hardware, powers hardware on and off, and queries the power state.
See the IBM Cluster Systems Management for Linux Remote Control Guide and Reference for additional information on controlling hardware.
The distributed shell (dsh) command runs commands remotely across multiple nodes. It optionally can use any underlying remote shell that is specified by the user; for example, a remote secure shell that complies with the IETF (Internet Engineering Task Force) Secure Shell protocol. By default, rsh is used.
The dsh command can retrieve a complete list of the nodes in the CSM cluster or the list of nodes in a specified node group.
See the man pages or IBM Cluster Systems Management for Linux Technical Reference for details on the following commands:
Configuration File Manager provides a file repository for the common configuration files among nodes in a cluster. In general, all the configuration files that need to be shared are stored in one location on the management server. Changes to these files are propagated and synchronized throughout the cluster. Though the files are common, there are mechanisms to allow for variations based on groups, IP address, and host name.
Configuration File Manager is built on top of the GNU software package cfengine. The cfengine software package is a scripting package that uses a class-based decision structure to test and configure UNIX-like systems attached to a TCP/IP network. There are many capabilities built into cfengine itself, which a system administrator can use over and above what Configuration File Manager uses.
Configuration File Manager greatly enhances the copy functionality and usability of cfengine by providing the concept of a repository. Instead of requiring an administrator to write a cfengine script to keep files up to date, the repository allows automatic updating without script changes.
See the man pages or IBM Cluster Systems Management for Linux Technical Reference for details on the cfm and cforce commands.
At the time this document was written, detailed information on cfengine could be found at the following URL: http://www.cfengine.org.
A flexible distributed system monitoring application is provided by CSM. This monitoring application allows the administrator to define conditions of interest to monitor on a system. An event occurs when a monitored condition of interest reaches a threshold that is defined in an event expression. When an event occurs, automated responses to the event take place. Multiple actions can be defined as components of a response, including notification, running a predefined script, or running a user-defined script.
A set of commands is provided for setting up the monitoring application to meet your needs. A set of predefined conditions and responses is provided to be used as is or to be copied and modified. System resources that can be monitored include:
The monitoring application, its components, and predefined conditions and responses are described in detail later in this book. Command syntax, descriptions, and examples are available as integrated man pages or in the IBM Cluster Systems Management for Linux Technical Reference.
Security is provided by the operating system - only root can run functions or modify data. Flexibility in the degree of security required by a specific environment is provided by remote shells that conform to the IETF (Internet Engineering Task Force) Secure Shell protocol. Remote shells can be specified using the dsh command. Network security for other functions is built on the identd function.
See Authorization for details on authorization, and the dsh man page or the IBM Cluster Systems Management for Linux Technical Reference for details on how to specify the remote shell of your choice by using the DSH_REMOTE_CMD environment variable.
The distributed shell dsh command uses the underlying rsh security protocol, or any underlying remote shell that is specified by the user, such as a remote secure shell that complies with the IETF Secure Shell protocol. By default, rsh is used. It is the system administrator's responsibility to configure and enable remote shell access to other systems, and to fulfill the particular security obligations of a specified environment. For more information, see the man pages or IBM Cluster Systems Management for Linux Technical Reference for details on the dsh and dshbak commands.
CSM implements authorization using the access control list (ACL) file. You can create a new ACL file to apply access control to resource classes. Or you can use the default ACL file, which contains the following permissions:
OTHER root@LOCALHOST * rw LOCALHOST * r
The ACL file is in stanza format. Each stanza begins with the stanza name, which is the name of a resource class. A stanza with the name of OTHER applies to all resource classes that are not otherwise specified in the file.
Each line of the stanza contains a user identifier, an object type, and an optional set of permissions. A stanza line indicates that the user at the host has the permissions to access the resource class or resource instances (or both) for the resource class named by the stanza. The user identifier can have one of the following three forms:
user_name@host_name
host_name
A host_name is a fully qualified host domain name or the keyword LOCALHOST. The first form specifies a user running an RMC application on the named host. If the host name is the keyword LOCALHOST, then the application is running on the same node as the RMC subsystem. The second form specifies any user running an RMC application on the named host. The third form specifies any user running an RMC application on any host.
The object type is one of the characters C, R or *. The letter C indicates that the permissions provide access to the resource class. The letter R indicates that the permissions provide access to all of the resource instances of the class. The asterisk indicates that the permissions provide access to both the resource class and all resource instances of the class.
The permissions provided are represented by one, both, or none of the characters r and w. The letter r indicates that the specified user at the specified host has read permission. The letter w indicates that the specified user at the specified host has write permission. Both letters indicate the user has read and write permission. If the permissions are omitted, then the user does not have access to the objects specified by the type character. Read permission allows you to register and unregister for events, query attribute values, and validate resource handles. Write permission allows you to run all other command interfaces. Note that no permissions are needed to query resource class and attribute definitions.
For any command issued against a resource class or its instances, the RMC subsystem examines the lines of the stanza matching the specified class in the order specified in the ACL file. The first line that contains 1) an identifier that matches the user issuing the command and 2) an object type that matches the objects specified by the command is the line used to determine access permissions. Therefore, lines containing more specific user identifiers and object types should be placed before lines containing less specific user identifiers and object types.
CSM implements authentication (identity verification) using the Ident protocol. The daemon identd listens for TCP connections on a known TCP port 113. Application servers need to connect to this daemon on the host where the client is running. The servers have to provide identd with the local and remote ports. The daemon then returns the identity of the owner of the process connected to the remote port, if it exists. The application servers can then use this identity as the remote client's Unix identity.
The security infrastructure assumes that identd is running and listening on Port 113. Red Hat Linux includes identd. The identd code can also be downloaded from one of the following URLs:
The /etc/services file should contain the following:
auth 113/tcp authentication tap ident
The /etc/identd.conf file should contain the following comment:
#-- Disable username lookups (only return uid numbers) #result:uid-only = no
The ACL file on the management server should look similar to Example 1 in Examples of ACL File Stanzas. The ACL file on a managed node should look similar to Example 2 in Examples of ACL File Stanzas.
A stanza begins with a line containing the stanza name, which must start in column one. A stanza line consists of leading white space (one or more blanks, tabs, or both) followed by one or more white-space-separated tokens. Comments may be present in the file. Any line in which the first non-white-space character is a pound sign (#) is a comment. Blank lines are also considered comment lines and are ignored. Any part of a line that begins with two consecutive forward slash characters (//), not surrounded by double quotes ("), is considered to be a comment from that point through the end of the line. The stanza lines in an ACL file each contain two or three tokens:
stanza_name user_identifier type permissions user_identifier type permissions | | | | user_identifier type permissions
The permissions token may be omitted.
IBM.PreManagedNode root@clsn01.pok.ibm.com * rw clsn01.pok.ibm.com * r root@LOCALHOST * rw # root on this node always has access LOCALHOST * r # Everyone else on this node can only read IBM.ManagedNode root@clsn01.pok.ibm.com * rw clsn01.pok.ibm.com * r root@LOCALHOST * rw # root on this node always has access LOCALHOST * r # Everyone else on this node can only read IBM.NodeGroup root@clsno1.pok.ibm.com * rw # root on this node always has access clsn01.pok.ibm.com * r # Everyone else on this node can only read root@LOCALHOST * rw # root on this node always has access LOCALHOST * r # Everyone else on this node can only read
IBM.ManagementServer root@clsn01.pok.ibm.com * rw # Grant root on c175n13.ppd.pok.ibm.com r/w access clsn01.pok.ibm.com * r # Everyone else on c175n13.ppd.pok.ibm.com has read-only access root@LOCALHOST * rw # root on this node always has access LOCALHOST * r # Everyone else on this node has read-only access OTHER clsn01.pok.ibm.com * r # The default denies write access to everyone from c175n13.ppd.pok.ibm.com root@LOCALHOST * rw # root on this node always has access LOCALHOST * r # Everyone else has read-only access
The following examples show how the ACL file can be modified.
Class_A user1@sys1.pok.ibm.com R rw root@sys1.pok.ibm.com * rw sys1.pok.ibm.com * r
user1@sys3.pok.ibm.com C rw user2@sys3.pok.ibm.com * sys3.pok.ibm.com * r
root@LOCALHOST * rw
Class_B root@LOCALHOST * rw * * r
OTHER root@sys1.pok.ibm.com * r root@LOCALHOST * rw
A sample ACL file is provided in /usr/sbin/rsct/cfg/ctrmc.acls. This file contains the following default permissions:
OTHER root@LOCALHOST * rw LOCALHOST * r
To change these defaults, you must copy the sample ACL file to /var/ct/cfg/ctrmc.acls and put your modifications in that file (or you can create a new ACL file with the same name and location). Then to activate your new permissions, type:
refresh -s ctrmc
Provided there are no errors in the modified ACL file, the permissions will take effect. If errors are found in the modified ACL file, they are logged to /var/ct/IW/log/mc/default.