Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Using Platform LSF with IBM POE


Contents

[ Top ]


Running IBM POE Jobs

The IBM Parallel Operating Environment (POE) interfaces with the Resource Manager to allow users to run parallel jobs requiring dedicated access to the high performance switch.

The LSF integration for IBM High-Performance Switch (HPS) systems provides support for submitting POE jobs from AIX hosts to run on IBM HPS hosts.

An IBM HPS system consists of multiple nodes running AIX. The system can be configured with a high-performance switch to allow high bandwidth and low latency communication between the nodes. The allocation of the switch to jobs as well as the division of nodes into pools is controlled by the HPS Resource Manager.


Run chown to change the owner of nrt_api to root, and then use chmod to set setuid bit (chmod u+s).

hpc_ibm queue for POE jobs

During installation, lsfinstall configures a queue in lsb.queues named hpc_ibm for running POE jobs. It defines requeue exit values to enable requeuing of POE jobs if some users submit jobs requiring exclusive access to the node.

The poejob script will exit with 133 if it is necessary to requeue the job. Other types of jobs should not be submitted to the same queue. Otherwise, they will get requeued if they happen to exit with 133.

Begin Queue
QUEUE_NAME   = hpc_ibm
PRIORITY     = 30
NICE         = 20
...
RES_REQ = select[ poe > 0 ]
REQUEUE_EXIT_VALUES = 133 134 135 
...
DESCRIPTION  = This queue is to run POE jobs ONLY.
End Queue

Configuring LSF to run POE jobs

Ensure that the HPS node names are the same as their host names. That is, st_status should return the same names for the nodes that lsload returns.

To set up POE jobs

1. Configure per-slot resource reservation (lsb.resources).

2. Optional. Enable exclusive mode (lsb.queues).

3. Optional. Define resource management pools (rmpool) and node locking queue threshold.

4. Optional. Define system partitions (spname).

5. Allocate switch adapter specific resources.

6. Optional. Tune PAM parameters.

7. Reconfigure to apply the changes.

1. Configure per-slot resource reservation (lsb.resources)

To support the IBM HPS architecture, LSF must reserve resources based on job slots. During installation, lsfinstall configures the ReservationUsage section in lsb.resources to reserve HPS resources on a per-slot basis.

Resource usage defined in the ReservationUsage section overrides the cluster-wide RESOURCE_RESERVE_PER_SLOT parameter defined in lsb.params if it also exists.

Begin ReservationUsage
RESOURCE           METHOD
adapter_windows    PER_SLOT
ntbl_windows       PER_SLOT
csss               PER_SLOT
css0               PER_SLOT
End ReservationUsage

2. Optional. Enable exclusive mode (lsb.queues)

To support the MP_ADAPTER_USE and -adapter_use POE job options, you must enable the LSF exclusive mode for each queue. To enable exclusive mode, edit lsb.queues and set EXCLUSIVE=Y:

Begin Queue
...
EXCLUSIVE=Y
...
End Queue

3. Optional. Define resource management pools (rmpool) and node locking queue threshold

If you schedule jobs based on resource management pools, you must configure rmpools as a static resource in LSF. Resource management pools are collections of SP2 nodes that together contain all available SP2 nodes without any overlap.

For example, to configure 2 resource management pools, p1 and p2, made up of 6 SP2 nodes (sp2n1, sp2n1, sp2n3, ..., sp2n6):

  1. Edit lsf.shared and add an external resource called pool. For example:
    Begin Resource
    RESOURCENAME TYPE    INTERVAL INCREASING DESCRIPTION
    ...
    pool         Numeric ()       ()         (sp2 resource mgmt 
    pool)
    lock         
    Numeric 60       Y          (IBM SP Node lock status)
    ...
    End Resource
    

    pool represents the resource management pool the node is in, and and lock indicates whether the switch is locked.

  2. Edit lsf.cluster.cluster_name and allocate the pool resource. For example:
    Begin ResourceMap
    RESOURCENAME  LOCATION
    ...
    pool          (p1@[sp2n1 sp2n2 sp2n3] p2@[sp2n4 sp2n5 
    sp2n6])
    ...
    End ResourceMap
    
  3. Edit lsb.queues and a threshold for the lock index in the hpc_ibm queue:
    Begin Queue
    NAME=hpc_ibm
    ...
    lock=0
    ...
    End Queue
    

    The scheduling threshold on the lock index prevents dispatching to nodes which are being used in exclusive mode by other jobs.

4. Optional. Define system partitions (spname)

If you schedule jobs based on system partition names, you must configure the static resource spname. System partitions are collections of HPS nodes that together contain all available HPS nodes without any overlap. For example, to configure two system partition names, spp1 and spp2, made up of 6 SP2 nodes (sp2n1, sp2n1, sp2n3, ..., sp2n6):

  1. Edit lsf.shared and add an external resource called spname. For example:
    Begin Resource
    RESOURCENAME TYPE    INTERVAL INCREASING DESCRIPTION
    ...
    spname       String  ()       ()         (sp2 sys partition 
    name)
    ...
    End Resource
    
  2. Edit lsf.cluster.cluster_name and allocate the spname resource. For example:
    Begin ResourceMap
    RESOURCENAME  LOCATION
    ...
    spname        (spp1@[sp2n1 sp2n3 sp2n5] spp2@[sp2n2 sp2n4 
    sp2n6])
    ...
    End ResourceMap
    

5. Allocate switch adapter specific resources

If you use a switch adapter, you must define specific resources in LSF. During installation, lsfinstall defines the following external resources in lsf.shared:

These resources are updated through elim.hpc.

Begin Resource
RESOURCENAME     TYPE    INTERVAL INCREASING DESCRIPTION
...
adapter_windows Numeric  30       N    (free adapter windows on css0 on IBM SP)
ntbl_windows    Numeric  30       N    (free ntbl windows on IBM HPS)
poe             Numeric  30       N    (poe availability)
css0            Numeric  30       N    (free adapter windows on css0 on IBM SP)
csss            Numeric  30       N    (free adapter windows on csss on IBM SP)
dedicated_tasks Numeric  ()       Y    (running dedicated tasks)
ip_tasks        Numeric  ()       Y    (running IP tasks)
us_tasks        Numeric  ()       Y    (running US tasks)
...
End Resource

You must edit lsf.cluster.cluster_name and allocate the external resources. For example, to configure a switch adapter for six SP2 nodes (sp2n1, sp2n1, sp2n3, ..., sp2n6):

Begin ResourceMap
RESOURCENAME      LOCATION
...
adapter_windows   [default]
ntbl_windows      [default]
css0              [default]
csss              [default]
dedicated_tasks   (0@[default])
ip_tasks          (0@[default])
us_tasks          (0@[default])
...
End ResourceMap

The adapter_windows and ntbl_windows resources are required for all POE jobs.

The other three resources are only required when you run IP and US jobs at the same time.

6. Optional. Tune PAM parameters

To improve performance and scalability for large POE jobs, tune the following lsf.conf parameters. The user's environment can override these.

7. Reconfigure to apply the changes

  1. Run badmin ckconfig to check the configuration changes.

    If any errors are reported, fix the problem and check the configuration again.

  2. Reconfigure the cluster:
    badmin reconfig
    Checking configuration files ...
    No errors found.
    Do you want to reconfigure? [y/n] y
    Reconfiguration initiated
    

    LSF checks for any configuration errors. If no fatal errors are found, you are asked to confirm reconfiguration. If fatal errors are found, reconfiguration is aborted.

POE ELIM (elim.hpc)

An external LIM (ELIM) for POE jobs is supplied with LSF.

On IBM HPS systems, ELIM uses the st_status or ntbl_status command to collect information from the Resource Manager.

PATH variable in elim

The ELIM searches the following path for the poe and st_status commands:

PATH="/usr/bin:/bin:/usr/local/bin:/local/bin:/sbin:/usr/sbin:/usr/ucb:/usr/sbi
n:
/usr/bsd:${PATH}"

If these commands are installed in a different directory, you must modify the PATH variable in LSF_SERVERDIR/elim.hpc to point to the correct directory.

POE esub (esub.poe)

The esub for POE jobs, esub.poe, is installed by lsfinstall. It is invoked using the -a poe option of bsub. By default, the POE esub sets the environment variable LSF_PJL_TYPE=poe. The job launcher, mpirun.lsf reads the environment variable LSF_PJL_TYPE=poe, and generates the appropriate pam command line to invoke POE to start the job.

LSF options

The value of the bsub -n option overrides the POE -procs option. If no -n is used, the esub sets default values with the variables LSB_SUB_NUM_PROCESSORS=1 and LSB_SUB_MAX_NUM_PROCESSORS=1.

POE options

If you specify -euilib us (US mode), then -euidevice must be css0 or csss (the HPS for interprocess communications.)

The -euidevice sn_all option is supported. The -euidevice sn_single option is ignored. POE jobs submitted with -euidevice sn_single use -euidevice sn_all.

POE PJL wrapper (poejob)

The POE PJL (Parallel Job Launcher) wrapper, poejob, parses the POE job options, and filters out those that have been set by LSF.

Submitting POE jobs

Use bsub to submit POE jobs, including parameters required for the application and POE. PAM launches POE and collects resource usage for all running tasks in the parallel job.

Syntax

bsub -a poe [bsub_options] mpirun.lsf program_name [program_options] 
[poe_options]

where:

-a poe

Invokes esub.poe.

Examples

Running US jobs

To submit an POE job in US mode, and runs on six processors:

bsub -a poe -n 6 mpirun.lsf my_prog -euilib us -euidevice css0

Running IP jobs

To run POE jobs in IP mode, MP_EUILIB (or -euilib) must be set to IP (Internet Protocol communication subsystem). For example:

bsub -a poe -n 6 mpirun.lsf my_prog -euilib ip ...

POE -procs option


The POE -procs option is ignored by esub.poe. Use the bsub -n option to specify the number of processors required for the job. The default if -n is not specified is 1.

Submitting POE jobs with a job script

A wrapper script is often used to call the POE script. You can submit a job using a job script as an embedded script or directly as a job, for example:

bsub -a -n 4 poe < embedded_jobscript
bsub -a -n 4 poe jobscript

For information on generic PJL wrapper script components, see Running Parallel Jobs.

See Administering Platform LSF for information about submitting jobs with job scripts.

IBM SP Switch2 support

The SP Switch2 switch should be correctly installed and operational. By default, LSF only supports homogeneous clusters of IBM SP PSSP 3.4 or PSSP 3.5 SP Swich2 systems.

To verify the version of PSSP, run:

lslpp -l | grep ssp.basic

Output should look something like:

lslpp -l | grep ssp.basic
ssp.basic      3.2.0.9  COMMITTED  SP System Support Package
ssp.basic      3.2.0.9  COMMITTED  SP System Support Package

To verify the switch type, run:

SDRGetObjects Adapter css_type

Switch type Value
SP_Switch_Adapter
2
SP_Switch_MX_Adapter
3
SP_Switch_MX2_Adapter
3
SP_Switch2_Adapter
5

SP_Switch2_Adapter indicates that you are using SP Switch2.

Use these values to configure the device_type variable in the script LSF_BINDIR/poejob. The default for device_type is 3.

IBM High Performance Switch (HPS) support

Running US jobs

Tasks of a parallel job running in US mode use the IBM pSeries High Performance Switch (HPS) exclusively for communication. HPS resources are referred to as network table windows. For US jobs to run, network table windows must be allocated ahead of the actual application startup.

You can run US jobs through LSF control (Load Leveler (LL) is not used). Job execution for US jobs has two stages:

  1. Load HPS network table windows using ntbl_api HPS support via The AIX Switch Network Interface (SNI)
  2. Optional. Start the application using the POE wrapper poe_w command

Running IP jobs

IP jobs do not require loading of network table windows. You just start poe or poe_w with the proper host name list file supplied.

How jobs start

Starting a parallel job on a pSeries HPS system is similar to starting jobs on an SP Switch2 system:

  1. Load a table file to connect network table windows allocated to a task
  2. Launch the task over network table windows connected
  3. Unload the same table file to disconnect the network table window allocated to the task

[ Top ]


Migrating IBM Load Leveler Job Scripts to Use LSF Options

You can integrate LSF with your POE jobs by modifying your job scripts to convert POE Load Leveler options to LSF options. After modifying your job scripts, your LSF job submission will be equivalent to a POE job submission:

bsub < jobscript becomes equivalent to Llsubmit jobCmdFile

The following POE options are handled differently when converting to LSF options:

US options

Use the following combinations of US options as a guideline for converting them to LSF options.

-cpu_use unique

-adapter_use dedicated -adapter_use shared
bsub -a poe -R "select[adapter_windows>0 && us_tasks==0] rusage[adapter_windows=1: us_tasks=1: dedicated_tasks=1]
bsub -a poe -R "select[adapter_windows>0 && dedicated_tasks==0]rusage[adapter_window s=1: us_tasks=1]"
  • Set MXJ to ! for the hosts on which these jobs will run
  • The slots can only run these jobs

-cpu_use multiple

-adapter_use dedicated -adapter_use shared
bsub -a poe -R "select[adapter_windows>0 && us_tasks=0] rusage[adapter_windows=1: us_tasks=1: dedicated_tasks=1]"
bsub -a poe -R "select[adapter_windows>0 && dedicated_tasks==0]"Rusage[adapter_windo ws=1:us_tasks=1]"
  • Set MXJ ( ) for the hosts on which these jobs will run
  • The hosts can only run these jobs

IP options

For IP jobs that do not use a switch, adapter_use does not apply. Use the following combinations of IP options as a guideline for converting them to LSF options.

-cpu_use unique

bsub -R "rusage[ip_tasks=1]"
  • Set MXJ to ! for the hosts on which these jobs will run
  • The slots can only run these jobs

-cpu_use multiple

bsub -R "rusage[ip_tasks=1]"
  • Set MXJ ( ) for the hosts on which these jobs will run
  • The hosts can only run these jobs

-nodes combinations

-nodes -tasks_per_nodes -nodes combination -nodes -procs
Cannot convert to LSF. You must use span[host=1]
bsub -n a*b -R "span[ptile=b]"
  • Only use if the poe options are:
poe -nodes a -tasks_per_nodes b -nodes b
bsub -n a*b -R "span[ptile=b]"
  • Only use if the poe options are:
poe -nodes a -tasks_per_nodes b -procs a*b

Load Leveler directives

Load Leveler job commands are handled as follows:

Load Leveler Command Ignored bsub option Special Handling
account_no
Y

Use LSF accounting.
arguments
Y

Place job arguments in the job command line
blocking

bsub -n with span[ptile]

all checkpoint commands
Y


class

bsub -P or -J

comment
Y


core_limit

bsub -C

cpu_limit

bsub -c or -n

data_limit

bsub -D

dependency

bsub -w

environment


Set in job script or in esub.poe
error

bsub -e

executable
Y

Enter the job name in the job script
file_limit

bsub -F

group
Y


hold

bsub -H

image_size

bsub -v or -M

initialdir
Y

The working directory is the current directory
input

bsub -i

job_cpu_limit

bsub -c

job_name

bsub -J

job_type
Y

Handled by esub.poe
max_processors

bsub -n min, max

min_processors

bsub -n min, max

network

bsub -R

node combinations

See -nodes combinations

notification


Set in lsf.conf
notify_user


Set in lsf.conf
output

bsub -o

parallel_path
Y


preferences

bsub -R "select[...]

queue

bsub -q

requirements

bsub -R and -m

resources

bsub -R
Set rusage for each task according to the Load Leveler equivalent
rss_limit

bsub -M

shell
Y


stack_limit

bsub -S

startdate

bsub -b

step_name
Y


task_geometry


Use the LSB_PJL_TASK_GEOMETRY environment variable to specify task geometry for your jobs. LSB_PJL_TASK_GEOMETRY overrides any mpirun n option.
total_tasks

bsub -n

user_priority

bsub -sp

wall_clock_limit

bsub -W

Simple job script modifications

The following example shows how to convert the POE options in a Load Leveler command file to LSF options in your job scripts for a non-shared US or IP job.

Assumptions

Example Load Leveler command file

This example uses following POE job script to run an executable named mypoejob:

#!/bin/csh
#@ shell = /bin/csh
#@ environment = ENVIRONMENT=BATCH; COPY_ALL;\
#  MP_EUILIB=us; MP_STDOUTMODE=ordered; MP_INFOLEVEL=0;
#@ network.MPI = switch,dedicated,US
#@ job_type = parallel
#@ job_name = batch-test
#@ output = $(job_name).log
#@ error  = $(job_name).log
#@ account_no = USER1
#@ node = 2
#@ tasks_per_node = 8
#@ node_usage = not_shared
#@ wall_clock_limit = 1:00:00
#@ class = batch
#@ notification = never
#@ queue
# ---------------------------------------------
# Copy required workfiles to $WORKDIR, which is set
# to /scr/$user under the large GPFS work filesystem,
# named /scr.
cp ~/TESTS/mpihello $WORKDIR/mpihello

# Change directory to $WORKDIR
cd $WORKDIR

# Execute program mypoejob
poe mypoejob
poe $WORKDIR/mpihello

# Copy output data from $WORKDIR to appropriate archive FS,
# since we are currently running within a volatile 
# "scratch" filesystem.

# Clean unneeded files from $WORKDIR after job ends.
rm -f $WORKDIR/mpihello
echo "Job completed at: `date`"

To convert POE options in a Load Leveler command file to LSF options

  1. Make sure the queue hpc_ibm is available in lsb.queues.
  2. Set the EXCLUSIVE parameter of the queue:
    EXCLUSIVE=Y
    
  3. Create the job script for the LSF job. For example:
    #!/bin/csh
    # mypoe_jobscript
    # Start script ---------
    #BSUB -a poe
    #BSUB -n 16
    #BSUB -x
    #BSUB -o batch_test.%J_%I.out
    #BSUB -e batch_test.%J_%I.err
    #BSUB -W 60
    #BSUB -J batch_test
    #BSUB -q hpc_ibm
    setenv ENVIRONMENT BATCH
    setenv MP_EUILIB=us
    # Copy required workfiles to $WORKDIR, which is set
    # to /scr/$user under the large GPFS work filesystem,
    # named /scr.
    cp ~/TESTS/mpihello $WORKDIR/mpihello
    
    # Change directory to $WORKDIR
    cd $WORKDIR
    
    # Execute program mypoejob
    mpirun.lsf mypoejob -euilib us
    mpirun.lsf $WORKDIR/mpihello -euilib us
    # Copy output data from $WORKDIR to appropriate archive FS,
    # since we are currently running within a volatile 
    # "scratch" filesystem.
    
    # Clean unneeded files from $WORKDIR after job ends.
    rm -f $WORKDIR/mpihello
    echo "Job completed at: `date`"
    # End script ---------
    
  4. Submit the job script as a redirected job, specifying the appropriate resource requirement string:
    bsub -R "select[adapter_windows>0] rusage[adapter_windows=1] span[ptile=8]" < 
    mypoe_jobscript
    

Comparing some of the converted options

POE LSF
#@ environment = ENVIRONMENT=BATCH; MP_EUILIB=us
setenv ENVIRONMENT BATCH
setenv MP_EUILIB=us
#@wall_clock_limit = 1:00:00
#BSUB - W 60
#@ output = $(job_name).log
#BSUB -o batch_test.%J_%I.out
#@ error = $(job_name).log
#BSUB -e batch_test.%J_%I.err
#@node =2
#@tasks_per_node =8
#BSUB -n 16 -R "span[ptile=8]"
# Execute programs:
poe mypoejob
poe $WORKDIR/mpihello
#Execute programs:
mpirun.lsf mypoejob -euilib us
mpirun.lsf $WORKDIR/mpihello -euilib us

Submitting the job

Compare the job script submission with the equivalent job submitted with all the LSF options on the command line:

bsub -x -a poe -q hpc_ibm -n 16 -R "select[adapter_windows>0] 
rusage[adapter_windows=1] span[ptile=8]" mpirun.lsf mypoejob -euilib us

To submit the same job as an IP job, substitute ip for us, and remove the select and rusage statements:

bsub -x -a poe -q hpc_ibm -n 16 -R "span[ptile=8]" mpirun.lsf mypoejob 
-euilib ip

To submit the job as a shared US or IP job, remove the bsub -x option from the job script or command line. This allows other jobs to run on the host your job is running on:

bsub -a poe -q hpc_ibm -n 16 -R "span[ptile=8]" mpirun.lsf mypoejob -euilib us

or

bsub -a poe -q hpc_ibm -n 16 -R "span[ptile=8]" mpirun.lsf mypoejob -euilib ip

Advanced job script modifications

If your environment runs any of the following:

your job scripts must use the us_tasks and dedicated_tasks LSF resources.

The following examples show how to convert the POE options in a Load Leveler command file to LSF options in your job scripts for several kinds of jobs.

-adapter_use dedicated and -cpu_use unique

Submitting the job

[ Top ]


Controlling Allocation and User Authentication for IBM POE Jobs

About POE authentication

Establishing authentication for POE jobs means ensuring that users are permitted to run parallel jobs on the nodes they intend to use. POE supports two types of user authentication:

When interactive remote login to HPS execution nodes is not allowed, you can still run parallel jobs under Parallel Environment (PE) through LSF. PE jobs under LSF on the system with restricted access to the execution nodes uses two wrapper programs to allow user authentication:

Enabling user authentication for POE jobs

To enable user authentication through the poe_w and pmd_w wrappers, you must set LSF_HPC_EXTENSIONS="LSB_POE_AUTHENTICATION" in /etc/lsf.conf.

Enforcing node and CPU allocation for POE jobs

To enable POE Allocation control, use LSF_HPC_EXTENSIONS="LSB_POE_ALLOCATION" in /etc/lsf.conf. poe_w enforces the LSF allocation decision from mbatchd.

For US jobs, swtbl_api and ntbl_api validates network table windows data files with mbatchd. For IP and US jobs, poe_wrapper validates the POE host file with the information from mbatchd. If the information does not match with the information from mbatchd, the job is terminated.

When LSF_HPC_EXTENSIONS="LSB_POE_ALLOCATION" is set:

Validation rules

Configuring POE allocation and authentication support

Configure services

  1. Register pmv4lsf (pmv3lsf) service with inetd:
    1. Add the following line to /etc/inetd.conf:
      pmv4lsf   stream  tcp    nowait  root   /etc/pmdv4lsf pmdv4lsf
      
    2. Make a symbolic link from pmd_w to /etc/pmdv4lsf.

      For example:

      # ln -s $LSF_BINDIR/pmd_w /etc/pmdv4lsf
      
      


      Both $LSF_BINDIR and /etc must be owned by root for the symbolic link to work. Symbolic links are not allowed under /etc on some AIX 5.3 systems, so you may need to copy $LSF_BINDIR/pmd_w to /etc/pmdv4lsf:
      cp -f $LSF_BINDIR/pmd_w /etc/pmdv4lsf

    3. Add pmv4lsf to /etc/services.

      For example:

      pmv4lsf           6128/tcp   #pmd wrapper
      
  2. Add poelsf service to /etc/services.

    The port defined for this service will be used by pmd_w and poe_w for communication with each other.

    poelsf      6129/tcp   #pmd_w - poe_w communication port
    
  3. Run one of the following commands to restart inetd:
    # refresh -s inetd
    # kill -1 "inetd_pid"
    

Configure parameters

  1. Create /etc/lsf.conf file if does not exist already and add the following parameter:
    LSF_HPC_EXTENSIONS="LSB_POE_ALLOCATION LSB_POE_AUTHENTICATION"
    
  2. (Optional) Two optional parameters can be added to the lsf.conf file:
    • LSF_POE_TIMEOUT_BIND--time in seconds for poe_w to keep trying to set up a server socket to listen on.

      Default: 120 seconds.

    • LSF_POE_TIMEOUT_SELECT--time in seconds for poe_w to wait for connections from pmd_w.

      Default: 160 seconds.


Both LSF_POE_TIMEOUT_BIND and LSF_POE_TIMEOUT_SELECT can also be set as environment variables for poe_w to read.

Example job scripts

For IP jobs

For the following job script:

# mypoe_jobscript
#!/bin/sh
#BSUB -o out.%J
#BSUB -n 2
#BSUB -m "hostA"
#BSUB -a poe

export MP_EUILIB=ip

mpirun.lsf ./hmpis

Submit the job script as a redirected job, specifying the appropriate resource requirement string:

bsub -R "select[poe>0]" < mypoe_jobscript

For US jobs:

For the following job script:

# mypoe_jobscript
#!/bin/sh
#BSUB -o out.%J
#BSUB -n 2
#BSUB -m "hostA"
#BSUB -a poe

export MP_EUILIB=us

mpirun.lsf ./hmpis

Submit the job script as a redirected job, specifying the appropriate resource requirement string:

bsub -R "select[ntbl_windows>0] rusage[ntbl_windows=1] span[ptile=1]" < 
mypoe_jobscript

Limitations

[ Top ]


Submitting IBM POE Jobs over InfiniBand

Platform LSF installation adds a shared nrt_windows resource to run and monitor POE jobs over the InfiniBand interconnect.

lsb.shared

Begin Resource
RESOURCENAME     TYPE    INTERVAL INCREASING DESCRIPTION
...
poe             Numeric  30       N    (poe availability)
dedicated_tasks Numeric  ()       Y    (running dedicated 
tasks)
ip_tasks        Numeric  ()       Y    (running IP tasks)
us_tasks        Numeric  ()       Y    (running US tasks)
nrt_windows     Numeric  30       N    (free nrt windows on 
IBM poe over IB)
...
End Resource

lsf.cluster.cluster_name

Begin ResourceMap
RESOURCENAME        LOCATION
poe                 [default]
nrt_windows     [default]              
dedicated_tasks     (0@[default])
ip_tasks     (0@[default])
us_tasks     (0@[default])
End ResourceMap

Job Submission

Run bsub -a poe to submit an IP mode job:

bsub -a poe mpirun.lsf job job_options -euilib ip poe_options

Run bsub -a poe to submit a US mode job:

bsub -a poe mpirun.lsf job job_options -euilib us poe_options

If some of the AIX hosts do not have InfiniBand support (for example, hosts that still use HPS), you must explicitly tell LSF to exclude those hosts:

bsub -a poe -R "select[nrt_windows>0]" mpirun.lsf job job_options poe_options

Job monitoring

Run lsload to display the nrt_windows and poe resources:

lsload -l
HOST_NAME  status r15s  r1m  r15m ut pg  io ls it tmp   swp mem nrt_windows poe
hostA      ok     0.0   0.0  0.0  1% 8.1  4 1  0  1008M 4090M 6976M 128.0   1.0
hostB      ok     0.0   0.0  0.0  0% 0.7  1 0  0  1006M 4092M 7004M 128.0   1.0

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: January 10, 2011
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2011 Platform Computing Corporation. All rights reserved.