[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
[ Top ]
About Platform LSF HPC and LAM/MPI
LAM (Local Area Multicomputer) is an MPI programming environment and development system for heterogeneous computers on a network. With LAM, a dedicated cluster or an existing network computing infrastructure can act as one parallel computer solving one problem.
System requirements
Assumptions
- LAM/MPI is installed and configured correctly
- The user's current working directory is part of a shared file system reachable by all hosts
Glossary
(Local Area Multicomputer) An MPI programming environment and development system for heterogeneous computers on a network.
(Message Passing Interface) A message passing standard. It defines a message passing API useful for parallel and distributed applications.
(Parallel Application Manager) The supervisor of any parallel job.
(Parallel Job Launcher) Any executable script or binary capable of starting parallel tasks on all hosts assigned for a parallel job.
(Remote Execution Server) An LSF daemon residing on each host. It monitors and manages all LSF tasks on the host.
(TaskStarter) An executable responsible for starting a task on the local host and reporting the process ID and host name to the PAM.
Files installed by lsfinstall
During installation,
lsfinstall
copies these files to the following directories:
These files... Are installed to... TaskStarter
LSF_BINDIR
pam
LSF_BINDIR
esub.lammpi
LSF_SERVERDIR
lammpirun_wrapper
LSF_BINDIR
mpirun.lsf
LSF_BINDIR
pjllib.sh
LSF_BINDIR
Resources and parameters configured by lsfinstall
- External resources in
lsf.shared
:Begin Resource RESOURCE_NAME TYPE INTERVAL INCREASING DESCRIPTION ... lammpi Boolean () () (LAM MPI) ... End ResourcesThe
lammpi
Boolean resource is used for mapping hosts with LAM/MPI available.
You should add thelammpi
resource name under the RESOURCES column of the Host section oflsf.cluster.
cluster_name.
- Parameter to
lsf.conf
:LSB_SUB_COMMANDNAME=y[ Top ]
Configuring LSF HPC to work with LAM/MPI
System setup
- For troubleshooting LAM/MPI jobs, edit the
LSF_BINDIR/lammpirun_wrapper
script, and specify a log directory that all users can write to. For example:LOGDIR="/mylogs"
Do not use LSF_LOGDIR for this log directory.
- Add the LAM/MPI home directory to your path. The LAM/MPI home directory is the directory that you specified as the prefix during LAM/MPI installation.
- Add the path to the LAM/MPI commands to the $PATH variable in your shell startup files (
$HOME/.cshrc
or$HOME/.profile
).- Edit
lsf.cluster.
cluster_name and add thelammpi
resource for each host with LAM/MPI available. For example:Begin Host HOSTNAME model type server r1m mem swp RESOURCES ... hosta ! ! 1 3.5 () () (lammpi) ... End Host[ Top ]
Submitting LAM/MPI Jobs
bsub command
Use
bsub
to submit LAM/MPI jobs:bsub -a lammpi -n
number_cpus [-q
queue_name]mpirun.lsf
[-pam "
pam_options"
] [mpi_options] job [job_options]
-a lammpi
tellsesub
the job is a LAM/MPI job and invokesesub.lammpi
.-n
number_cpus specifies the number of processors required to run the job
-q
queue_name specifies a LAM/MPI queue that is configured to use the custom termination action. If no queue is specified, the
hpc_linux
queue is used.mpirun.lsf
reads the environment variable LSF_PJL_TYPE=lammpi set byesub.lammpi
, and generates the appropriatepam
command line to invoke LAM/MPI as the PJL
% bsub -a lammpi -n 3 -q hpc_linux mpirun.lsf /examples/cpi
A job named
cpi
is submitted to thehpc_linux
queue. It will be dispatched and run on 3 CPUs in parallel.% bsub -a lammpi -n 3 -R "select[mem>100] rusage[mem=100:duration=5]" -q hpc_linux mpirun.lsf /examples/cpi
A job named
cpi
is submitted to thehpc_linux
queue. It will be dispatched and run on 3 CPUs in parallel. Memory is reserved for 5 minutes.Submitting a job with a job script
A wrapper script is often used to call the LAM/MPI script. You can submit a job using a job script as an embedded script or directly as a job, for example:
% bsub -a lammpi -n 4 < embedded_jobscript
% bsub -alammpi -n 4
jobscriptYour job script must use
mpirun.lsf
in place of thempirun
command.For information on generic PJL wrapper script components, see Running Parallel Jobs.
See Administering Platform LSF for information about submitting jobs with job scripts.
Job placement with LAM/MPI jobs
The
mpirun -np
option is ignored. You should use the LSB_PJL_TASK_GEOMETRY environment variable for consistency with other Platform LSF HPC MPI integrations. LSB_PJL_TASK_GEOMETRY overrides thempirun -np
option.The environment variable LSB_PJL_TASK_GEOMETRY is checked for all parallel jobs. If LSB_PJL_TASK_GEOMETRY is set users submit a parallel job (a job that requests more than 1 slot), LSF attempts to shape LSB_MCPU_HOSTS accordingly.
Log files
For troubleshooting LAM/MPI jobs, define LOGDIR in the
LSF_BINDIR/lammpirun_wrapper
script. Log files (lammpirun_wrapper.job[
job_ID
].log
) are written to the LOGDIR directory. If LOGDIR is not defined, log messages are written to/dev/null
.For example, the log file for the job with job ID 123 is:
lammpirun_wrapper.job123.log[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: August 20, 2009
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.