Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Using Platform LSF HPC with MPICH-GM


Contents

[ Top ]


About Platform LSF HPC and MPICH-GM

MPICH is a freely available, portable implementation of the MPI Standard for message- passing libraries, developed jointly with Mississippi State University. MPICH is designed to provide high performance, portability, and a convenient programming environment.

MPICH-GM is used with high performance Myrinet networks. Myrinet is a high-speed network which allows OS-bypass communications in large clusters. MPICH-GM integrates with Platform LSF HPC so users can run parallel jobs on hosts with at least one free port.

Requirements

Assumptions

Glossary

MPI

(Message Passing Interface) A message passing standard. It defines a message passing API useful for parallel and distributed applications.

MPICH

A portable implementation of the MPI standard.

GM

A message based communication system developed for Myrinet.

MPICH-GM

An MPI implementation based on MPICH for Myrinet.

PAM

(Parallel Application Manager) The supervisor of any parallel job.

PJL

(Parallel Job Launcher) Any executable script or binary capable of starting parallel tasks on all hosts assigned for a parallel job.

RES

(Remote Execution Server) An LSF daemon residing on each host. It monitors and manages all LSF tasks on the host.

TS

(TaskStarter) An executable responsible for starting a task on the local host and reporting the process ID and host name to the PAM.

For more information

Files installed by lsfinstall

During installation, lsfinstall copies these files to the following directories:

These files... Are installed to...
TaskStarter
LSF_BINDIR
pam
LSF_BINDIR
esub.mpich_gm
LSF_SERVERDIR
gmmpirun_wrapper
LSF_BINDIR
mpirun.lsf
LSF_BINDIR
pjllib.sh
LSF_BINDIR

Resources and parameters configured by lsfinstall

[ Top ]


Configuring LSF HPC to Work with MPICH-GM

Configure GM port resources (optional)

If there are more processors on a node than there are available GM ports, you should configure the external static resource name gm_ports to limit the number of jobs that can launch on that node.

lsf.shared

Add the external static resource gm_ports in lsf.shared to keep track of the number of free Myrinet ports available on a host:

Begin Resource
RESOURCENAME  TYPE      INTERVAL INCREASING  RELEASE   DESCRIPTION
...
gm_ports      Numeric   ()       N           N   (number of free myrinet ports)
...
End Resource

lsf.cluster.cluster_name

Edit the resource map in lsf.cluster.cluster_name to configure hosts in the cluster able to collect gm_ports. For example, the following configures 13 GM ports available for GM 2.0 and 5 GM ports are available for mGM 1.x.

Begin ResourceMap
RESOURCENAME        LOCATION
...
gm_ports            13@[default]
...
End ResourceMap

lsb.resources

Configure the gm_ports resource as PER_SLOT in a ReservationUsage section in lsb.resources:

Begin ReservationUsage
RESOURCE      METHOD
...
gm_port       PER_SLOT
...
End ReservationUsage

gmmpirun_wrapper script

Modify the gmmpirun_wrapper script in LSF_BINDIR so that the mpirun.ch_gm command in the scripts point to:

MPIRUN_CMD="/path/mpirun.ch_gm"

where path is the path to the directory where the mpirun.ch_gm command is stored.

lsf.conf (optional)

LSF_STRIP_DOMAIN

If the gm_board_info command returns host names that include domain names you cannot define LSF_STRIP_DOMAIN in lsf.conf. If the gm_board_info command returns host names without domain names, but LSF commands return host names that include domain names, you must define LSF_STRIP_DOMAIN in lsf.conf.

Performance tuning

To improve performance and scalability for large parallel jobs, tune the following parameters as described in Tuning PAM Scalability and Fault Tolerance:

The user's environment can override these.

[ Top ]


Submitting MPICH-GM Jobs

bsub command

Use bsub to submit MPICH-GM jobs.

bsub -a mpich_gm -n number_cpus mpirun.lsf 
[-pam "pam_options"] [mpi_options] job [job_options]

For example:

% bsub -a mpich_gm -n 3 mpirun.lsf /examples/cpi

A job named cpi will be dispatched and run on 3 CPUs in parallel.

To limit the number of jobs using GM ports, specify a resource requirement in your job submission:

-R "rusage[gm_ports=1]

Submitting a job with a job script

You can use a wrapper script to call the MPICH-GM job launcher. You can submit a job using a job script as an embedded script or directly as a job, for example:

% bsub -a mpich_gm -n 4 < embedded_jobscript
% bsub -a mpich_gm -n 4 jobscript

Your job script must use mpirun.lsf in place of the mpirun command.

For information on generic PJL wrapper script components, see Running Parallel Jobs.

See Administering Platform LSF for information about submitting jobs with job scripts.

[ Top ]


Using AFS with MPICH-GM


Complete the following steps only if you are planning to use AFS with MPICH-GM.

The MPICH-GM package contains an esub.afs file which combines the esub for MPICH-GM and the esub for AFS so that MPICH-GM and AFS can work together.

Steps

  1. Install and configure LSF HPC for AFS.
  2. Edit mpirun.ch_gm. The location of this script is defined with the MPIRUN_CMD parameter in the script LSF_BINDIR/gmmpirun_wrapper.
  3. Replace the following line:
    exec($rsh,'-n',$_,$cmd_ln);
    

    with:

    exec($lsrun,'-m',$_,'/bin/sh','-c',"$cmd_ln < /dev/null");
    
  4. Add the following line to mpirun.ch_gm before the line $rsh="rsh"; replacing $LSF_BINDIR by the actual path:
    $lsrun="$LSF_BINDIR/lsrun";
    $rsh="rsh";
    

    For example:

    $lsrun="/usr/local/lsf/7.0/linux2.4-glibc2.1-
    x86/bin/lsrun";
    
  5. Comment out the following line:
    #$rsh="rsh";
    
  6. Replace the following line:
    exec($rsh,$_,$cmdline);
    

    with:

    exec($lsrun,'-m',$_,'/bin/sh','-c',$cmdline);
    
  7. Replace the following line:
    exec($rsh,'-n',$_,$cmdline);
    

    with:

    exec($lsrun,'-m',$_,'/bin/sh','-c',"$cmdline</dev/null");
    
  8. Replace the following line:
    die "$rsh $_ $argv{$lnode}->[0]:$!\n" 
    

    with:

    die "$lsrun -m $_ sh -c $argv{$lnode}->[0]:$!\n" 
    
  9. Save the mpirun.ch_gm file.

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: August 20, 2009
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.