[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
- About Platform LSF HPC and MPICH-GM
- Configuring LSF HPC to Work with MPICH-GM
- Submitting MPICH-GM Jobs
- Using AFS with MPICH-GM
[ Top ]
About Platform LSF HPC and MPICH-GM
MPICH is a freely available, portable implementation of the MPI Standard for message- passing libraries, developed jointly with Mississippi State University. MPICH is designed to provide high performance, portability, and a convenient programming environment.
MPICH-GM is used with high performance Myrinet networks. Myrinet is a high-speed network which allows OS-bypass communications in large clusters. MPICH-GM integrates with Platform LSF HPC so users can run parallel jobs on hosts with at least one free port.
Requirements
- MPICH version 1.2.6 or later
You should upgrade all your hosts to the same version of MPICH-GM.
- GM versions 1.5.1, and 1.6.3 or later
Assumptions
- MPICH-GM is installed and configured correctly
- The user's current working directory is part of a shared file system reachable by all hosts
Glossary
(Message Passing Interface) A message passing standard. It defines a message passing API useful for parallel and distributed applications.
A portable implementation of the MPI standard.
A message based communication system developed for Myrinet.
An MPI implementation based on MPICH for Myrinet.
(Parallel Application Manager) The supervisor of any parallel job.
(Parallel Job Launcher) Any executable script or binary capable of starting parallel tasks on all hosts assigned for a parallel job.
(Remote Execution Server) An LSF daemon residing on each host. It monitors and manages all LSF tasks on the host.
(TaskStarter) An executable responsible for starting a task on the local host and reporting the process ID and host name to the PAM.
For more information
- See the Myricom Web site at
www.myrinet.com
for software distribution and documentation on Myrinet clusters.- See the Mathematics and Computer Science Division (MCS) of Argonne National Laboratory (ANL) MPICH Web page at
www-unix.mcs.anl.gov/mpi/mpich/
for more information about MPICH.Files installed by lsfinstall
During installation,
lsfinstall
copies these files to the following directories:
These files... Are installed to... TaskStarter
LSF_BINDIR
pam
LSF_BINDIR
esub.mpich_gm
LSF_SERVERDIR
gmmpirun_wrapper
LSF_BINDIR
mpirun.lsf
LSF_BINDIR
pjllib.sh
LSF_BINDIR
Resources and parameters configured by lsfinstall
- External resources in
lsf.shared
:Begin Resource RESOURCE_NAME TYPE INTERVAL INCREASING DESCRIPTION ... mpich_gm Boolean () () (MPICH GM MPI) ... End ResourcesThe
mpich_gm
Boolean resource is used for mapping hosts with MPICH-GM available.
You should add thempich_gm
resource name under the RESOURCES column of the Host section oflsf.cluster.
cluster_name.
- Parameter to
lsf.conf
:LSB_SUB_COMMANDNAME=y[ Top ]
Configuring LSF HPC to Work with MPICH-GM
Configure GM port resources (optional)
If there are more processors on a node than there are available GM ports, you should configure the external static resource name
gm_ports
to limit the number of jobs that can launch on that node.Add the external static resource
gm_ports
inlsf.shared
to keep track of the number of free Myrinet ports available on a host:Begin Resource RESOURCENAME TYPE INTERVAL INCREASING RELEASE DESCRIPTION ... gm_ports Numeric () N N (number of free myrinet ports) ... End Resourcelsf.cluster.cluster_name
Edit the resource map in
lsf.cluster.
cluster_name to configure hosts in the cluster able to collectgm_ports
. For example, the following configures 13 GM ports available for GM 2.0 and 5 GM ports are available for mGM 1.x.Begin ResourceMap RESOURCENAME LOCATION ... gm_ports 13@[default] ... End ResourceMapConfigure the
gm_ports
resource as PER_SLOT in a ReservationUsage section inlsb.resources
:Begin ReservationUsage RESOURCE METHOD ... gm_port PER_SLOT ... End ReservationUsagegmmpirun_wrapper script
Modify the
gmmpirun_wrapper
script inLSF_BINDIR
so that thempirun.ch_gm
command in the scripts point to:MPIRUN_CMD="/path/mpirun.ch_gm"
where path is the path to the directory where the
mpirun.ch_gm
command is stored.lsf.conf (optional)
LSF_STRIP_DOMAIN
If the
gm_board_info
command returns host names that include domain names you cannot define LSF_STRIP_DOMAIN inlsf.conf
. If thegm_board_info
command returns host names without domain names, but LSF commands return host names that include domain names, you must define LSF_STRIP_DOMAIN inlsf.conf
.To improve performance and scalability for large parallel jobs, tune the following parameters as described in Tuning PAM Scalability and Fault Tolerance:
The user's environment can override these.
[ Top ]
Submitting MPICH-GM Jobs
bsub command
Use
bsub
to submit MPICH-GM jobs.bsub -a mpich_gm -n
number_cpusmpirun.lsf
[-pam "
pam_options"
] [mpi_options] job [job_options]
-a mpich_gm
tellsesub
the job is an MPICH-GM job and invokesesub.mpich_gm
.-n
number_cpus specifies the number of processors required to run the job
mpirun.lsf
reads the environment variable LSF_PJL_TYPE=mpich_gm set byesub.mpich_gm
, and generates the appropriatepam
command line to invoke MPICH-GM as the PJLFor example:
% bsub -a mpich_gm -n 3 mpirun.lsf /examples/cpi
A job named
cpi
will be dispatched and run on 3 CPUs in parallel.To limit the number of jobs using GM ports, specify a resource requirement in your job submission:
-R "rusage[gm_ports=1]
Submitting a job with a job script
You can use a wrapper script to call the MPICH-GM job launcher. You can submit a job using a job script as an embedded script or directly as a job, for example:
% bsub -a mpich_gm -n 4 < embedded_jobscript
% bsub -ampich_gm
-n 4
jobscriptYour job script must use
mpirun.lsf
in place of thempirun
command.For information on generic PJL wrapper script components, see Running Parallel Jobs.
See Administering Platform LSF for information about submitting jobs with job scripts.
[ Top ]
Using AFS with MPICH-GM
Complete the following steps only if you are planning to use AFS with MPICH-GM.
The MPICH-GM package contains an
esub.afs
file which combines theesub
for MPICH-GM and theesub
for AFS so that MPICH-GM and AFS can work together.Steps
- Install and configure LSF HPC for AFS.
- Edit
mpirun.ch_gm
. The location of this script is defined with the MPIRUN_CMD parameter in the scriptLSF_BINDIR/
gmmpirun_wrapper
.- Replace the following line:
exec($rsh,'-n',$_,$cmd_ln);with:
exec($lsrun,'-m',$_,'/bin/sh','-c',"$cmd_ln < /dev/null");
- Add the following line to
mpirun.ch_gm
before the line$rsh="rsh";
replacing $LSF_BINDIR by the actual path:$lsrun="$LSF_BINDIR/lsrun";
$rsh="rsh";For example:
$lsrun="/usr/local/lsf/7.0/linux2.4-glibc2.1- x86/bin/lsrun";
- Comment out the following line:
#$rsh="rsh";
- Replace the following line:
exec($rsh,$_,$cmdline);with:
exec($lsrun,'-m',$_,'/bin/sh','-c',$cmdline);
- Replace the following line:
exec($rsh,'-n',$_,$cmdline);with:
exec($lsrun,'-m',$_,'/bin/sh','-c',"$cmdline</dev/null");
- Replace the following line:
die "$rsh $_ $argv{$lnode}->[0]:$!\n"with:
die "$lsrun -m $_ sh -c $argv{$lnode}->[0]:$!\n"
- Save the
mpirun.ch_gm
file.
[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: March 13, 2009
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.