[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
[ Top ]
Installing Platform LSF HPC
Installing Platform LSF HPC involves the following steps:
- Get a Platform LSF HPC license.
- Download Platform LSF HPC Packages.
- Run lsfinstall.
- Run hostsetup to configure host-based resources and set up automatic LSF startup on server hosts.
Running hostsetup is optional on AIX and Linux. You must run hostsetup on SGI hosts (IRIX, TRIX, and Altix), HP-UX hosts, and Linux QsNet hosts.
ENABLE_HPC_CONFIG
Make sure ENABLE_HPC_CONFIG=Y is specified in
install.config
to enable Platform LSF HPC features.Get a Platform LSF HPC license
Before installing Platform LSF HPC, you must get a permanent or demo license key. Contact Platform Computing at
license@platform.com
to request a license key.Copy the license key to a file named
license.dat
in the same directory where you downloaded the LSF HPC distribution tar files.Download Platform LSF HPC Packages
Use FTP to download the Platform LSF HPC distribution packages.
Access to the Platform FTP site is controlled by login name and password.
The Platform LSF HPC distribution packages are located in
/distrib/7.0/
.The Platform LSF Version 7
lsf7Update5_release_notes.html
file for information about downloading the LSF HPC distribution packages.Before installing
If you are planning to use AFS with MPICH-GM and have made any custom changes to your existing AFS or MPICH-GM
esub
, create a backup of these.
- Installs Platform LSF HPC binary and configuration files
- Installs the LSF HPC license file
- Automatically configures the following files:
For the default host,
lsfinstall
enables "!" in the MXJ column of the HOSTS section oflsb.hosts
. For example:Begin Host HOST_NAME MXJ r1m pg ls tmp DISPATCH_WINDOW # Keywords #hostA () 3.5/4.5 15/ 12/15 0 () # Example default ! () () () () () HPPA11 ! () () () () () #pset host End Host
- Adds the external scheduler plugin module names to the PluginModule section of
lsb.modules
:Begin PluginModule SCH_PLUGIN RB_PLUGIN SCH_DISABLE_PHASES schmod_default () () schmod_fcfs () () schmod_fairshare () () schmod_limit () () schmod_reserve () () schmod_preemption () () schmod_advrsv () () ... schmod_cpuset () () schmod_pset () () schmod_rms () () schmod_crayx1 () () schmod_crayxt3 () () End PluginModule End PluginModule
The LSF HPC plugin names must be configured after the standard LSF plugin names in the PluginModule list.
For IBM POE jobs,
lsfinstall
configures the ReservationUsage section inlsb.resources
to reserve HPS resources on a per-slot basis.Resource usage defined in the ReservationUsage section overrides the cluster-wide RESOURCE_RESERVE_PER_SLOT parameter defined in
lsb.params
if it also exists.Begin ReservationUsage RESOURCE METHOD adapter_windows PER_SLOT ntbl_windows PER_SLOT csss PER_SLOT css0 PER_SLOT End ReservationUsage
- Configures
hpc_ibm
queue for IBM POE jobs and thehpc_ibm_tv
queue for debugging IBM POE jobs through Etnus TotalView®:Begin Queue QUEUE_NAME = hpc_ibm PRIORITY = 30 NICE = 20 # ... RES_REQ = select[ poe > 0 ] EXCLUSIVE = Y REQUEUE_EXIT_VALUES = 133 134 135 DESCRIPTION = Platform HPC 7 for IBM. This queue is to run POE jobs ONLY. End Queue Begin Queue QUEUE_NAME = hpc_ibm_tv PRIORITY = 30 NICE = 20 # ... RES_REQ = select[ poe > 0 ] REQUEUE_EXIT_VALUES = 133 134 135 TERMINATE_WHEN = LOAD PREEMPT WINDOW RERUNNABLE = NO INTERACTIVE = NO DESCRIPTION = Platform HPC 7 for IBM TotalView debug queue. This queue is to run POE jobs ONLY. End Queue- Configures
hpc_linux
queue for LAM/MPI and MPICH-GM jobs andhpc_linux_tv
queue for debugging LAM/MPI and MPICH-GM jobs through Etnus TotalView®:Begin Queue QUEUE_NAME = hpc_linux PRIORITY = 30 NICE = 20 # ... DESCRIPTION = Platform HPC 7 for linux. End Queue Begin Queue QUEUE_NAME = hpc_linux_tv PRIORITY = 30 NICE = 20 # ... TERMINATE_WHEN = LOAD PREEMPT WINDOW RERUNNABLE = NO INTERACTIVE = NO DESCRIPTION = Platform HPC 7 for linux TotalView Debug queue. End QueueBy default, LSF sends a SIGUSR2 signal to terminate a job that has reached its run limit or deadline. Since LAM/MPI does not respond to the SIGUSR2 signal, you should configure the
hpc_linux
queue with a custom job termination action specified by the JOB_CONTROLS parameter.- Configures
rms
queue for RMS jobs running in LSF HPC for LinuxQsNet.Begin Queue QUEUE_NAME = rms PJOB_LIMIT = 1 PRIORITY = 30 NICE = 20 STACKLIMIT = 5256 DEFAULT_EXTSCHED = RMS[RMS_SNODE] # LSF will using this scheduling policy if # -extsched is not defined. # MANDATORY_EXTSCHED = RMS[RMS_SNODE] # LSF enforces this scheduling policy RES_REQ = select[rms==1] DESCRIPTION = Run RMS jobs only on hosts that have resource 'rms' defined End Queue
To make the one of the LSF queues the default queue, set DEFAULT_QUEUE inlsb.params
.
Use the
bqueues -l
command to view the queue configuration details. Before using LSF HPC, see the Platform LSF Configuration Reference to understand queue configuration parameters inlsb.queues
.lsf.cluster.cluster_name
- Removes
lsf_data
andlsf_parallel
from the PRODUCTS line oflsf.cluster.
cluster_name if they are already there- For IBM POE jobs, configures the ResourceMap section of
lsf.cluster.
cluster_name to map the following shared resources for POE jobs to all hosts in the cluster:Begin ResourceMap RESOURCENAME LOCATION adapter_windows [default] ntbl_windows [default] poe [default] dedicated_tasks (0@[default]) ip_tasks (0@[default]) us_tasks (0@[default]) End ResourceMap
- LSB_SUB_COMMANDNAME=Y to
lsf.conf
to enable the LSF_SUB_COMMANDLINE environment variable required byesub
.- LSF_ENABLE_EXTSCHEDULER=Y
LSF uses an external scheduler for topology-aware external scheduling.
- LSB_CPUSET_BESTCPUS=Y
LSF schedules jobs based on the shortest CPU radius in the processor topology using a best-fit algorithm for SGI cpuset allocation.
LSF_IRIX_BESTCPUS is obsolete.
- On SGI IRIX and SGI Altix hosts, sets the full path to the SGI vendor MPI library
libxmpi.so
:You can specify multiple paths for LSF_VPLUGIN, separated by colons (
:
). For example, the following configures both/usr/lib32/libxmpi.so
for SGI IRIX, and/usr/lib/libxmpi.so
for SGI IRIX:LSF_VPLUGIN="/usr/lib32/libxmpi.so:/usr/lib/libxmpi.so"- On HP-UX hosts, sets the full path to the HP vendor MPI library
libmpirm.sl
.LSF_VPLUGIN="/opt/mpi/lib/pa1.1/libmpirm.sl"- LSB_RLA_PORT=port_number
Where port_number is the TCP port used for communication between the Platform LSF HPC topology adapter (RLA) and
sbatchd
.The default port number is 6883.
- LSB_SHORT_HOSTLIST=1
Displays an abbreviated list of hosts in
bjobs
andbhist
for a parallel job where multiple processes of a job are running on a host. Multiple processes are displayed in the following format:processes*hostADefines the following shared resources required by LSF HPC in
lsf.shared
:Begin Resource RESOURCENAME TYPE INTERVAL INCREASING DESCRIPTION # Keywords rms Boolean () () (RMS) pset Boolean () () (PSET) slurm Boolean () () (SLURM) cpuset Boolean () () (CPUSET) mpich_gm Boolean () () (MPICH GM MPI) lammpi Boolean () () (LAM MPI) mpichp4 Boolean () () (MPICH P4 MPI) mvapich Boolean () () (Infiniband MPI) sca_mpimon Boolean () () (SCALI MPI) ibmmpi Boolean () () (IBM POE MPI) hpmpi Boolean () () (HP MPI) sgimpi Boolean () () (SGI MPI) intelmpi Boolean () () (Intel MPI) crayxt3 Boolean () () (Cray XT3 MPI) crayx1 Boolean () () (Cray X1 MPI) fluent Boolean () () (fluent availability) ls_dyna Boolean () () (ls_dyna availability) nastran Boolean () () (nastran availability) pvm Boolean () () (pvm availability) openmp Boolean () () (openmp availability) ansys Boolean () () (ansys availability) blast Boolean () () (blast availability) gaussian Boolean () () (gaussian availability) lion Boolean () () (lion availability) scitegic Boolean () () (scitegic availability) schroedinger Boolean () () (schroedinger availability) hmmer Boolean () () (hmmer availability) adapter_windows Numeric 30 N (free adapter windows on css0 on IBM SP) ntbl_windows Numeric 30 N (free ntbl windows on IBM HPS) poe Numeric 30 N (poe availability) css0 Numeric 30 N (free adapter windows on css0 on IBM SP) csss Numeric 30 N (free adapter windows on csss on IBM SP) dedicated_tasks Numeric () Y (running dedicated tasks) ip_tasks Numeric () Y (running IP tasks) us_tasks Numeric () Y (running US tasks) End Resource
You should add the appropriate resource names under the RESOURCES column of the Host section oflsf.cluster.
cluster_name.
Run lsfinstall
lsfinstall
runs the LSF installation scripts and configuration utilities to install a new Platform LSF cluster or to upgrade LSF from a previous release.
Make sure ENABLE_HPC_CONFIG=Y is specified ininstall.config
to enable Platform LSF HPC features.
- Log on to the file server host as root.
You can run lsfinstall as a non-root user, but to install a fully operational LSF cluster that all users can access, you should install as root.
- Download, uncompress, and extract
lsf7Update5_lsfinstall.tar.Z
to the distribution directory where you downloaded the LSF HPC product distribution tar files.
Do not extract the Platform LSF HPC distribution files.
- Change to the directory
lsf7Update5_lsfinstall/
.- Read
lsf7Update5_lsfinstall/install.config
orlsf7Update5_lsfinstall/slave.config
and decide which installation variables you need to set.- Edit
lsf7Update5_lsfinstall/install.config
orlsf7Update5_lsfinstall/slave.config
.Uncomment any other options you want in the template file, and replace the example values with your own settings.
The sample values in the install.config and slave.config template files are examples only. They are not default installation values.
- Run
lsfinstall
as root:#./lsfinstall -f install.config
You can install Platform LSF HPC as a non-root user with some limitations. During installation,
lsfinstall
detects that you are not root. You must choose to configure either a multi-user cluster or a single-user cluster:
- Single-user--Your user account must be primary LSF administrator. You will be able to start LSF daemons, but only your user account can submit jobs to the cluster. Your user account must be able to read the system kernel information, such as
/dev/kmem
.To run IBM POE jobs, you must manually change the ownership and setuid bit for
swtbl_api
andntbl_api
to root, and the file permission mode to-rwsr-xr-x
(4755) so that the user ID bit for the owner is setuid.Use the following commands to set the correct owner, user ID bit, and file permission mode:
#chown root swtbl_api ntbl_api
#chmod 4755 swtbl_api ntbl_api
- Multi-user--By default, only root can start the LSF daemons. Any user can submit jobs to your cluster. To make the cluster available to other users, you must manually change the ownership and setuid bit for
lsadmin
andbadmin
to root, and the file permission mode to-rwsr-xr-x
(4755) so that the user ID bit for the owner issetuid
.Use the following commands to set the correct owner, user ID bit, and file permission mode for a multi-user cluster:
#chown root lsadmin badmin eauth swtbl_api ntbl_api
#chmod 4755 lsadmin badmin eauth swtbl_api ntbl_api
Run hostsetup
You must run hostsetup on SGI hosts (IRIX, TRIX, and Altix), HP-UX hosts, and Linux QsNet hosts. Running hostsetup is optional on all other systems.
- For SGI IRIX, TRIX, and Altix cpuset hosts,
hostsetup
adds thecpuset
Boolean resource to the HOSTS section oflsf.cluster.
cluster_name for each cpuset host.- For HP-UX pset hosts,
hostsetup
adds thepset
Boolean resource to the HOSTS section oflsf.cluster.
cluster_name for each pset host.- For Linux QsNet hosts,
hostsetup:
Use the
--boot="y"
option onhostsetup
to configure system scripts to automatically start and stop LSF daemons at system startup or shutdown. You must runhostsetup
as root to use this option to modify the system scripts. The default is--boot="n"
.For complete
hostsetup
usage, enterhostsetup -h
.
- Log on to each LSF server host as root. Start with the LSF master host.
- Run
hostsetup
on each LSF server host. For example:#cd /usr/share/hpc/7.0/install
#./hostsetup --top="/usr/share/hpc" --boot="y"
Optional configuration
After installation, you can define the following in
lsf.conf
:
- LSF_LOGDIR=directory
In large clusters, you should set LSF_LOGDIR to a local file system (for example,
/var/log/lsf
).- LSB_RLA_WORKDIR=directory parameter, where directory is the location of the status files for RLA. Allows RLA to recover its original state when it restarts. When RLA first starts, it creates the directory defined by LSB_RLA_WORKDIR if it does not exist, then creates subdirectories for each host.
You should avoid using
/tmp
or any other directory that is automatically cleaned up by the system. Unless your installation has restrictions on the LSB_SHAREDIR directory, you should use the default:LSB_SHAREDIR/cluster_name/rla_workdir
On IRIX or TRIX, you should not use a CXFS file system for LSB_RLA_WORKDIR.
- On Linux hosts running HP MPI, set the full path to the HP vendor MPI library
libmpirm.so
.For example, if HP MPI is installed in
/opt/hpmpi
:LSF_VPLUGIN="/opt/hpmpi/lib/linux_ia32/libmpirm.so"- LSB_RLA_UPDATE=time_seconds
Specifies how often the LSF scheduler refreshes information from RLA.
Default: 600 seconds
- LSB_RLA_HOST_LIST="host_name ..."
On Linux/QsNet hosts, the LSF scheduler can contact the RLA running on any host for RMS allocation requests. LSB_RLA_HOST_LIST defines a list of hosts to restrict which RLAs the LSF scheduler contacts.
If LSB_RLA_HOST_LIST is configured, you must list at least one host per RMS partition for the RMS partition to be considered for job scheduling.
Listed hosts must be defined in
lsf.cluster.
cluster_name.- LSB_RLA_UPDATE=seconds
On Linux/QsNet hosts, specifies how often RLA should refresh its RMS information map.
Default: 600 seconds
- LSB_RMSACCT_DELAY=time_seconds
If set on Linux/QsNet hosts, RES waits the specified number of seconds before exiting to allow LSF and RMS job statistics to synchronize.
If LSB_RMSACCT_DELAY=0, RES waits forever until the database is up to date.
- LSB_RMS_MAXNUMNODES=integer
Maximum number of nodes in a Linux/QsNet system. Specifies a maximum value for the nodes argument to the topology scheduler options specified in:
Default: 1024
- LSB_RMS_MAXNUMRAILS=integer
Maximum number of rails in a Linux/QsNet system. Specifies a maximum value for the rails argument to the topology scheduler options specified in:
Default: 32
- LSB_RMS_MAXPTILE=integer
Maximum number of CPUs per node in a Linux/QsNet system. Specifies a maximum value for the
RMS[ptile]
argument to the topology scheduler options specified in:Default: 32
Before using your cluster
After installing LSF and setting up your server hosts:
- Log on to the LSF master host as root.
- Set your environment:
- Use
lsfstartup
to start the cluster.
For a large cluster, where cluster management software exists, you should use /etc/init.d lsf start instead of lsfstartup.
- Follow the steps in
lsf7.0_lsfinstall/
lsf_quick_admin.html
to verify that your cluster is operating correctly.- Have users run one of the shell environment files to switch their environment to the new cluster.
After testing your cluster, be sure all users includeLSF_CONFDIR/cshrc.lsf
orLSF_CONFDIR/profile.lsf
in their.cshrc
or.profile
. Follow the steps inlsf7.0_lsfinstall/lsf_quick_admin.html
for usingLSF_CONFDIR/cshrc.lsf
andLSF_CONFDIR/profile.lsf
to set up the Platform HPC environment for users.
After the new cluster is up and running, users can start submitting jobs to it.
[ Top ]
Upgrading Platform LSF HPC
If your cluster was installed or upgraded withlsfsetup
, DO NOT use these steps. Before upgrading Platform LSF HPC, upgrade your cluster to at least Platform LSF Version 6.0.
Before upgrading
- Back up your existing LSF_CONFDIR, LSB_CONFDIR, and LSB_SHAREDIR according to the procedures at your site.
- Get an LSF HPC Version 7 license and create a license file (
license.dat
).- Inactivate all queues to make sure that no new jobs will be dispatched during the upgrade.
After upgrading, remember to activate the queues again so pending jobs can be dispatched.
- For SGI cpuset hosts, make sure all running jobs are done (all queues are drained of running jobs).
What lsfinstall does for upgrade
lsfinstall
backs up the following configuration files for your current installation in LSF_CONFDIR:
- Configures
hpc_ibm
queue for IBM POE jobs and thehpc_ibm_tv
queue for debugging IBM POE jobs through Etnus TotalView:- Configures
hpc_linux
queue for LAM/MPI and MPICH-GM jobs andhpc_linux_tv
queue for debugging LAM/MPI and MPICH-GM jobs through Etnus TotalView:- Configures
rms
queue for RMS jobs running in LSF for Linux QsNet.LSB_SUB_COMMANDNAME (lsf.conf)
If LSB_SUB_COMMANDNAME=N is already defined inlsf.conf
,lsfinstall
does not change this parameter; you must manually set it to LSB_SUB_COMMANDNAME=Y to enable the LSF_SUB_COMMANDLINE environment variable required byesub
.
For SGI cpuset hosts,
lsfinstall
updates the following files:
lsb.modules
--adds theschmod_cpuset
external scheduler plugin module name to the PluginModule section, comments out theschmod_topology
module line.lsf.conf
Sets the following parameters in
lsf.conf
:
- LSF_ENABLE_EXTSCHEDULER=Y
LSF uses an external scheduler for cpuset allocation.
- LSB_CPUSET_BESTCPUS=Y
LSF schedules jobs based on the shortest CPU radius in the processor topology using a best-fit algorithm for cpuset allocation.
LSF_IRIX_BESTCPUS is obsolete.
Comments out the following obsolete parameters in
lsf.conf
, and sets the corresponding RLA configuration:
- LSF_TOPD_PORT=port_number, replaced by LSB_RLA_PORT=port_number, using the same value as LSF_TOPD_PORT.
Where port_number is the TCP port used for communication between the Platform LSF HPC topology adapter (RLA) and
sbatchd
.The default port number is 6883.
- LSF_TOPD_WORKDIR=directory parameter, replaced by LSB_RLA_WORKDIR=directory parameter, using the same value as LSF_TOPD_WORKDIR
Where directory is the location of the status files for RLA. Allows RLA to recover its original state when it restarts. When RLA first starts, it creates the directory defined by LSB_RLA_WORKDIR if it does not exist, then creates subdirectories for each host.
LSB_IRIX_NODESIZE is obsolete. If set in
lsf.conf
, it is ignored by the scheduler.lsf.shared
--defines the thecpuset
Boolean resource inlsf.shared
:Reusing install.config from your existing installation
You can reuse the
install.config
file from your existing installation to specify your installation options. Theinstall.config
file containing the options you specified for your original installation is located inLSF_TOP/
lsf_version/install/
.If you change
install.config
to add new hosts in LSF_ADD_SERVERS and LSF_ADD_CLIENTS, or new LSF administrators in LSF_ADMINS,lsfinstall
creates a newlsf.cluster.
cluster_name file.Run lsfinstall to upgrade
Make sure the following
install.config
variables are set for upgrade:
- ENABLE_HPC_CONFIG=Y enables configuration of Platform LSF HPC features
- LSF_TARDIR specifies the location of distribution packages for upgrade. For example:
LSF_TARDIR=/tmp
To migrate an existing Platform LSF Version 7 cluster to Platform LSF HPC. comment out LSF_TARDIR and make sure that no distribution tar files are in the directory where you run lsfinstall.
- Log on to the file server host as root.
- Download, uncompress, and extract
lsf7Update5_lsfinstall.tar.Z
to the distribution directory where you downloaded the LSF HPC product distribution tar files.- Change to the directory
lsf7Update5_lsfinstall/
.- Read
lsf7Update5_lsfinstall/install.config
orlsf7Update5_lsfinstall/slave.config
and decide which installation variables you need to set.- Edit
lsf7Update5_lsfinstall/install.config
orlsf7Update5_lsfinstall/slave.config
.
To enable configuration of Platform LSF HPC feaures, specifiy ENABLE_HPC_CONFIG=Y ininstall.config
.
- Run
lsfinstall
as root:#./lsfinstall -f install.config
Run hostsetup
Running hostsetup is optional on AIX and Linux. You must run hostsetup on SGI hosts (IRIX, TRIX, and Altix) and on HP-UX hosts.
- For SGI IRIX, TRIX, and Altix cpuset hosts,
hostsetup
adds thecpuset
Boolean resource to the HOSTS section oflsf.cluster.
cluster_name for each cpuset host.- For HP-UX pset hosts,
hostsetup
adds thepset
Boolean resource to the HOSTS section oflsf.cluster.
cluster_name for each pset host.- For Linux QsNet hosts,
hostsetup:
Use the
--boot="y"
option onhostsetup
to configure system scripts to automatically start and stop LSF HPC daemons at system startup or shutdown. You must runhostsetup
asroot
to use this option to modify the system scripts. The default is--boot="n"
.For complete
hostsetup
usage, enterhostsetup -h
.
- Log on to each LSF server host as
root
. Start with the LSF master host.- Run
hostsetup
on each LSF server host. For example:#cd /usr/share/hpc/7.0/install
#./hostsetup --top="/usr/share/hpc" --boot="y"
After upgrading
- Log on to the LSF master host as root.
- Set your environment:
- Follow the steps in
lsf7Update5_lsfinstall/lsf_quick_admin.html
to update your license.- Use the following commands to shut down the old LSF daemons:
#badmin hshutdown all
#lsadmin resshutdown all
#lsadmin limshutdown all
- Use the following commands to start Platform LSF HPC using the upgraded daemons:
#lsadmin limstartup all
#lsadmin resstartup all
#badmin hstartup all
- Follow the steps in
lsf7.0_lsfinstall/lsf_quick_admin.html
to verify that your upgraded cluster is operating correctly.- Use the following command to reactivate all LSF HPC queues after upgrading:
#badmin qact all
- Have users run one of the shell environment files to switch their environment to the new cluster.
After testing your cluster, be sure all users includeLSF_CONFDIR/cshrc.lsf
orLSF_CONFDIR/profile.lsf
in their.cshrc
or.profile
. Follow the steps inlsf7.0_lsfinstall/lsf_quick_admin.html
for usingLSF_CONFDIR/cshrc.lsf
andLSF_CONFDIR/profile.lsf
to set up the Platform HPC environment for users.
After your cluster is up and running, users can start submitting jobs to it.
[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: March 13, 2009
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.