Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Using Platform LSF HPC with HP-UX Processor Sets


LSF HPC makes use of HP-UX processor sets (psets) to create an efficient execution environment that allows a mix of users and jobs to coexist in the HP Superdome cell- based architecture.

Contents

[ Top ]


About HP-UX Psets

HP-UX processor sets (psets) are available as an optional software product for HP-UX 11i Superdome multiprocessor systems. A pset is a set of active processors group for the exclusive access of the application assigned to the set. A pset manages processor resources among applications and users.

The operating system restricts applications to run only on the processors in their assigned psets. Processes bound to a pset can only run on the CPUs belonging to that pset, so applications assigned to different psets do not contend for processor resources.

A newly created pset initially has no processors assigned to it.

Dynamic application binding

Each running application in the system is bound to some pset, which defines the processors that the application can run on.

Scheduling allocation domain

A pset defines a scheduling allocation domain that restricts applications to run only on the processors in its assigned pset.

System default pset

At system startup, the HP-UX system is automatically configured with one system default pset to which all enabled processors are assigned. Processor 0 is always assigned to the default pset. All users in the system can access the default pset.

For more information

See the HP-UX 11i system administration documentation for information about defining and managing psets.

How LSF HPC uses psets

Processor isolation

On HP-UX 11i Superdome multiprocessor systems, psets can be created and deallocated dynamically out of available machine resources. The pset provides processor isolation, so that a job requiring a specific number of CPUs only run on those CPUs.

Processor distance

Processor distance is a value used to measure how fast the process running on one processor access local memory of another processor. The bigger the value is, the slower memory access is. For example, the processor distance of two processes within one cell is less than that of two processes between cells.

When creating a pset for the job, LSF uses a best-fit algorithm for pset allocation to choose processors as close as possible to each other. LSF attempts to choose the set of processors with the smallest processor distance.

Pset creation and deallocation

LSF HPC makes use of HP-UX processor sets (psets) to create an efficient execution environment that allows a mix of users and jobs to coexist in the HP Superdome cell- based architecture.

When a job is submitted, LSF:

After the job finishes, LSF destroys the pset. If no host meets the CPU requirements, the job remains pending until processors become available to allocate the pset.

CPU 0 in the default pset 0 is always considered last for a job, and cannot be taken out of pset 0, since all system processes are running on it. LSF cannot create a pset with CPU 0; it only uses the default pset if it cannot create a pset without CPU 0.

LSF HPC topology adapter for psets (RLA)

RLA runs on each HP-UX11i host. It is started and monitored by sbatchd. RLA provides services for external clients, including pset scheduler plugin and sbatchd to:

RLA maintains a status file in the directory defined by LSB_RLA_WORKDIR in lsf.conf, which keeps track of job pset allocation information. When RLA starts, it reads the status file and recovers the current status.

Assumptions and limitations

Account mapping

User-level account and system account mapping are not supported. If a user account does not exist on the remote host, LSF cannot create a pset for it.

Resizable jobs

Jobs running in a pset cannot be resized.

Resource reservation

By default, job start time is not accurately predicted for pset jobs with topology options, so the forecast start time shown by bjobs -l is optimistic. LSF HPC may incorrectly indicate that the job can start at a certain time, when it actually cannot start until some time after the indicated time.

For a more accuration start-time estimate, you should configure time-based slot reservation. With time-based reservation, a set of pending jobs will get future allocation and estimated start time.

See Administering Platform LSF for more information about time-based slot reservation.

Chunk jobs

Jobs submitted to a chunk job queue are not chunked together, but run outside of a pset as a normal LSF job.

Preemption

Suspending and resuming jobs

When a job is suspended with bstop, all CPUs in the pset are released and reassigned back to the default pset (pset 0). Before resuming the job LSF reallocates the pset and rebinds all job processes to the job pset.

Pre-execution and post-execution

Job pre-execution programs run within the job pset, since they are part of the job. Post- execution programs run outside of the job pset.

[ Top ]


Configuring LSF HPC with HP-UX Psets

Automatic configuration at installation

lsb.modules

During installation, lsfinstall adds the schmod_pset external scheduler plugin module name to the PluginModule section of lsb.modules:

Begin PluginModule
SCH_PLUGIN              RB_PLUGIN           SCH_DISABLE_PHASES 
schmod_default               ()                      () 
schmod_fcfs                  ()                      () 
schmod_fairshare             ()                      () 
schmod_limit                 ()                      () 
schmod_preemption            ()                      () 
...
schmod_pset                  ()                      () 
End PluginModule


The schmod_pset plugin name must be configured after the standard LSF plugin names in the PluginModule list.

See the Platform LSF Configuration Reference for more information about lsb.modules.

lsf.conf

During installation, lsfinstall sets the following parameters in lsf.conf:

lsf.shared

During installation, the Boolean resource pset is defined in lsf.shared:

Begin Resource
RESOURCENAME     TYPE      INTERVAL  INCREASING   DESCRIPTION
...
pset             Boolean   ()        ()           (PSET)
...
End Resource


You should add the pset resource name under the RESOURCES column of the Host section of lsf.cluster.cluster_name. Hosts without the pset resource specified are not considered for scheduling pset jobs.

lsb.hosts

For each pset host, lsfinstall enables "!" in the MXJ column of the HOSTS section of lsb.hosts for the HPPA11 host type.

For example:

Begin Host
HOST_NAME MXJ   r1m     pg    ls    tmp  DISPATCH_WINDOW  # Keywords
#hostA     () 3.5/4.5   15/   12/15  0      ()            # Example
default    !    ()      ()    ()     ()     ()            
HPPA11     !    ()      ()    ()     ()     ()            #pset host
End Host

lsf.cluster.cluster_name

For each pset host, hostsetup adds the pset Boolean resource to the HOST section of lsf.cluster.cluster_name.

Configuring default and mandatory pset options

Use the DEFAULT_EXTSCHED and MANDATORY_EXTSCHED queue paramters in lsb.queues to configure default and mandatory pset options.

DEFAULT_EXTSCHED=PSET[topology]

where topology is:

[CELLS=num_cells | PTILE=cpus_per_cell] [;CELL_LIST=cell_list]

Specifies default pset topology scheduling options for the queue.

-extsched options on the bsub command override any conflicting queue-level options set by DEFAULT_EXTSCHED.

For example, if the queue specifies:

DEFAULT_EXTSCHED=PSET[PTILE=2]

and a job is submitted with no topology requirements requesting 6 CPUs (bsub -n 6), a pset is allocated using 3 cells with 2 CPUs in each cell.

If the job is submitted:

bsub -n 6 -ext "PSET[PTILE=3]" myjob

The pset option in the command overrides the DEFAULT_EXTSCHED, so a pset is allocated using 2 cells with 3 CPUs in each cell.

MANDATORY_EXTSCHED=PSET[topology]

Specifies mandatory pset topology scheduling options for the queue.

MANDATORY_EXTSCHED options override any conflicting job-level options set by -extsched options on the bsub command.

For example, if the queue specifies:

MANDATORY_EXTSCHED=PSET[CELLS=2]

and a job is submitted with no topology requirements requesting 6 CPUs (bsub n 6), a pset is allocated using 2 cells with 3 CPUs in each cell.

If the job is submitted:

bsub -n 6 -ext "PSET[CELLS=3]" myjob

MANDATORY_EXTSCHED overrides the pset option in the command, so a pset is allocated using 2 cells with 3 CPUs in each cell.

Use the CELL_LIST option in MANDATORY_EXTSCHED to restrict the cells available for allocation to pset jobs. For example, if the queue specifies:

MANDATORY_EXTSCHED=PSET[CELL_LIST=1-7]

job psets can only use cells 1 to 7; cell 0 is not used for pset jobs.

[ Top ]


Using LSF HPC with HP-UX Psets

Specifying pset topology options

To specify processor topology scheduling policy options for pset jobs, use:

If LSB_PSET_BIND_DEFAULT is set in lsf.conf, and no pset options are specified for the job, Platform LSF HPC binds the job to the default pset 0. If LSB_PSET_BIND_DEFAULT is not set, Platform LSF HPC must still attach the job to a pset, and so binds the job to the same pset being used by the Platform LSF HPC daemons.

For more information about job operations, see Administering Platform LSF.

For more information about bsub, see the Platform LSF Command Reference.

Syntax

-ext[sched] "PSET[topology]"

where topology is:

[CELLS=num_cells | PTILE=cpus_per_cell][;CELL_LIST=cell_list]


You can specify either one CELLS or one PTILE option in the same PSET[] option, not both.

Priority of topology scheduling options

The options set by -extsched can be combined with the queue-level MANDATORY_EXTSCHED or DEFAULT_EXTSCHED parameters. If -extsched and MANDATORY_EXTSCHED set the same option, the MANDATORY_EXTSCHED setting is used. If -extsched and DEFAULT_EXTSCHED set the same options, the -extsched setting is used.

topology scheduling options are applied in the following priority order of level from highest to lowest:

  1. Queue-level MANDATORY_EXTSCHED options override ...
  2. Job level -ext options, which override ...
  3. Queue-level DEFAULT_EXTSCHED options

For example, if the queue specifies:

DEFAULT_EXTSCHED=PSET[CELLS=2]

and the job is submitted with:

bsub -n 4 -ext "PSET[PTILE=1]" myjob

The pset option in the job submission overrides the DEFAULT_EXTSCHED, so the job will run in a pset allocated using 4 cells, honoring the job-level PTILE option.

If the queue specifies:

MANDATORY_EXTSCHED=PSET[CELLS=2]

and the job is submitted with:

bsub -n 4 -ext "PSET[PTILE=1]" myjob

The job will run on 2 cells honoring the cells option in MANDATORY_EXTSCHED.

Partitioning the system for specific jobs (CELL_LIST)

Use the bsub -ext "PSET[CELL_LIST=cell_list]" option to partition a large Superdome machine. Instead of allocating CPUs from the entire machine, LSF creates a pset containing only the cells specified in the cell list.

Non-existent cells are ignored during scheduling, but the job can be dispatched as long as enough cells are available to satisfy the job requirements. For example, in a cluster with both 32-CPU and 64-CPU machines and a cell list specification CELL_LIST=1-15, jobs can use cells 1-7 on the 32-CPU machine, and cells 1-15 on the 64-CPU machine.

CELL_LIST and CELLS

You can use CELL_LIST with the PSET[CELLS=num_cells] option. The number of requested cells in the cell list must be less than or equal to the number of cells in the CELLS option; otherwise, the job remains pending.

CELL_LIST and PTILE

You can use CELL_LIST with the PSET[PTILE=cpus_per_cell] option. The PTILE option allows the job pset to spread across several cells. The number of required cells equals the number of requested processors divided by the PTILE value. The resulting number of cells must be less than or equal to the number of cells in the cell list; otherwise, the job remains pending.

For example, the following is a correct specification:

bsub -n 8 -ext "PSET[PTILE=2;CELL_LIST=1-4]" myjob

The job requests 8 CPUs spread over 4 cells (8/2=4), which is equal to the 4 cells requested in the CELL_LIST option.

Viewing pset allocations for jobs

bjobs -l

After a pset job starts to run, use bjobs -l to display the job pset ID. For example, if LSF creates pset 23 on hostA for job 329, bjobs shows:

bjobs -l 329

Job <329>, User <user1>, Project <default>, Status <RUN>, Queue <normal>, Ext
                     sched <PSET[]>, Command <sleep 60>
Thu Jan 22 12:04:31: Submitted from host <hostA>, CWD <$HOME>, 2 Processors 
                     Requested;
Thu Jan 22 12:04:38: Started on 2 Hosts/Processors <2*hostA>, Execution Home
                     </home/user1>, Execution CWD </home/user1>;
Thu Jan 22 12:04:38: psetid=hostA:23;
Thu Jan 22 12:04:39: Resource usage collected.
                     MEM: 1 Mbytes;  SWAP: 2 Mbytes;  NTHREAD: 1
                     PGID: 18440;  PIDs: 18440 


 SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -  
 loadStop    -     -     -     -       -     -    -     -     -      -      -  

 EXTERNAL MESSAGES:
 MSG_ID FROM       POST_TIME      MESSAGE                          ATTACHMENT 
 0        -            -             -                                  -     
 1      user1      Jan 22 12:04   PSET[]


The pset ID string for bjobs does not change after the job is dispatched.

bhist

Use bhist to display historical information about pset jobs:

bhist -l 329

Job <329>, User <user1>, Project <default>, Extsched <PSET[]>, Command <sleep
                     60>
Thu Jan 22 12:04:31: Submitted from host <hostA>, to Queue <normal>, CWD <$H
                     OME>, 2 Processors Requested;
Thu Jan 22 12:04:38: Dispatched to 2 Hosts/Processors <2*hostA>;
Thu Jan 22 12:04:38: psetid=hostA:23;
Thu Jan 22 12:04:39: Starting (Pid 18440);
Thu Jan 22 12:04:39: Running with execution home </home/user1>, Execution CWD
                     </home/user1>, Execution Pid <18440>;
Thu Jan 22 12:05:39: Done successfully. The CPU time used is 0.1 seconds;
Thu Jan 22 12:05:40: Post job process done successfully;

Summary of time in seconds spent in various states by  Thu Jan 22 12:05:40
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  7        0        61       0        0        0        68

bacct

Use bacct to display accounting information about pset jobs:

bacct -l 329
Accounting information about jobs that are: 
  - submitted by all users.
  - accounted on all projects.
  - completed normally or exited
  - executed on all hosts.
  - submitted to all queues.
  - accounted on all service classes.
------------------------------------------------------------------------------

Job <331>, User <user1>, Project <default>, Status <DONE>, Queue <normal>, Co
                     mmand <sleep 60>
Thu Jan 22 18:23:14: Submitted from host <hostA>, CWD <$HOME>;
Thu Jan 22 18:23:23: Dispatched to <hostA>;
Thu Jan 22 18:23:23: psetid=hostA:23;
Thu Jan 22 18:24:24: Completed <done>.

Accounting information about this job:
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP
      0.12        9             70     done         0.0017     1M      2M
------------------------------------------------------------------------------

SUMMARY:      ( time unit: second ) 
 Total number of done jobs:       1      Total number of exited jobs:     0
 Total CPU time consumed:       0.1      Average CPU time consumed:     0.1
 Maximum CPU time of a job:     0.1      Minimum CPU time of a job:     0.1
 Total wait time in queues:     9.0
 Average wait time in queue:    9.0
 Maximum wait time in queue:    9.0      Minimum wait time in queue:    9.0
 Average turnaround time:        70 (seconds/job)
 Maximum turnaround time:        70      Minimum turnaround time:        70
 Average hog factor of a job:  0.00 ( cpu time / turnaround time )
 Maximum hog factor of a job:  0.00      Minimum hog factor of a job:  0.00

Examples

The following examples assume a 4-CPU/cell HP Superdome system with no other jobs running:

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: August 20, 2009
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.