Knowledge Center         Contents    Previous  Next    Index  
Platform Computing Corp.

Job Priorities

Contents

User-Assigned Job Priority

User-assigned job priority provides controls that allow users to order their jobs in a queue. Job order is the first consideration to determine job eligibility for dispatch. Jobs are still subject to all scheduling policies regardless of job priority. Jobs with the same priority are ordered first come first served.

The job owner can change the priority of their own jobs. LSF and queue administrators can change the priority of all jobs in a queue.

User-assigned job priority is enabled for all queues in your cluster, and can be configured with automatic job priority escalation to automatically increase the priority of jobs that have been pending for a specified period of time.

Considerations

The btop and bbot commands move jobs relative to other jobs of the same priority. These commands do not change job priority.

In this section

Configure job priority

  1. To configure user-assigned job priority edit lsb.params and define MAX_USER_PRIORITY. This configuration applies to all queues in your cluster.
  2. Use bparams -l to display the value of MAX_USER_PRIORITY.
Syntax
MAX_USER_PRIORITY=max_priority 

Where:

max_priority

Specifies the maximum priority a user can assign to a job. Valid values are positive integers. Larger values represent higher priority; 1 is the lowest.

LSF and queue administrators can assign priority beyond max_priority for jobs they own.

Example
MAX_USER_PRIORITY=100 

Specifies that 100 is the maximum job priority that can be specified by a user.

Specify job priority

Syntax
bsub -sp priority
bmod [-sp priority | -spn] job_ID 

Where:

-sp priority

Specifies the job priority. Valid values for priority are any integers between 1 and MAX_USER_PRIORITY (displayed by bparams -l). Incorrect job priorities are rejected.

LSF and queue administrators can specify priorities beyond MAX_USER_PRIORITY for jobs they own.

-spn

Sets the job priority to the default priority of MAX_USER_PRIORITY/2 (displayed by bparams -l).

View job priority information

  1. Use the following commands to view job history, the current status and system configurations:
bhist -l job_ID

Displays the history of a job including changes in job priority.

bjobs -l [job_ID]

Displays the current job priority and the job priority at submission time. Job priorities are changed by the job owner, LSF and queue administrators, and automatically when automatic job priority escalation is enabled.

bparams -l

Displays values for:

Automatic Job Priority Escalation

Automatic job priority escalation automatically increases job priority of jobs that have been pending for a specified period of time. User-assigned job priority (see User-Assigned Job Priority) must also be configured.

As long as a job remains pending, LSF automatically increases the job priority beyond the maximum priority specified by MAX_USER_PRIORITY. Job priority is not increased beyond the value of max_int on your system.

Pending job resize allocation requests for resizable jobs inherit the job priority from the original job. When the priority of the allocation request gets adjusted, the priority of the original job is adjusted as well. The job priority of a running job is adjusted when there is an associated resize request for allocation growth. bjobs displays the updated job priority.

If necessary, a new pending resize request is regenerated after the job gets dispatched. The new job priority is used.

For requeued and rerun jobs, the dynamic priority value is reset. For migrated jobs, the existing dynamic priority value is carried forward. The priority is recalculated based on the original value.

Configure job priority escalation

  1. To configure job priority escalation edit lsb.params and define JOB_PRIORITY_OVER_TIME.
  2. User-assigned job priority must also be configured,

  3. Use bparams -l to display the values of JOB_PRIORITY_OVER_TIME.
Syntax
JOB_PRIORITY_OVER_TIME=increment/interval 

Where:

increment

Specifies the value used to increase job priority every interval minutes. Valid values are positive integers.

interval

Specifies the frequency, in minutes, to increment job priority. Valid values are positive integers.

Example
JOB_PRIORITY_OVER_TIME=3/20 

Specifies that every 20 minute interval increment to job priority of pending jobs by 3.

Absolute Job Priority Scheduling

Absolute job priority scheduling (APS) provides a mechanism to control the job dispatch order to prevent job starvation.

When configured in a queue, APS sorts pending jobs for dispatch according to a job priority value calculated based on several configurable job-related factors. Each job priority weighting factor can contain subfactors. Factors and subfactors can be independently assigned a weight.

APS provides administrators with detailed yet straightforward control of the job selection process.

Scheduling priority factors

To calculate the job priority, APS divides job-related information into several categories. Each category becomes a factor in the calculation of the scheduling priority. You can configure the weight, limit, and grace period of each factor to get the desired job dispatch order.

LSF sums the value of each factor based on the weight of each factor.

Factor weight

The weight of a factor expresses the importance of the factor in the absolute scheduling priority. The factor weight is multiplied by the value of the factor to change the factor value. A positive weight increases the importance of the factor, and a negative weight decreases the importance of a factor. Undefined factors have a weight of 0, which causes the factor to be ignored in the APS calculation.

Factor limit

The limit of a factor sets the minimum and maximum absolute value of each weighted factor. Factor limits must be positive values.

Factor grace period

Each factor can be configured with a grace period. The factor only counted as part of the APS value when the job has been pending for a long time and it exceeds the grace period.

Factors and subfactors

Factors
Subfactors
Metric
FS (user based fairshare factor)
The existing fairshare feature tunes the dynamic user priority
The fairshare factor automatically adjusts the APS value based on dynamic user priority.
FAIRSHARE must be defined in the queue. The FS factor is ignored for non-fairshare queues.
The FS factor is influenced by the following fairshare parameters in lsb.params:
  • CPU_TIME_FACTOR
  • RUN_TIME_FACTOR
  • RUN_JOB_FACTOR
  • HIST_HOURS
RSRC (resource factors)
PROC
Requested processors is the max of bsub -n min, max, the min of bsub -n min, or the value of PROCLIMIT in lsb.queues.
MEM
Total real memory requested (in MB).
Memory requests appearing to the right of a || symbol in a usage string are ignored in the APS calculation.
For multi-phase memory reservation, the APS value is based on the first phase of reserved memory.
SWAP
Total swap space requested (in MB).
As with MEM, swap space requests appearing to the right of a || symbol in a usage string are ignored.
WORK (job attributes)
JPRIORITY
The job priority specified by:
  • Default specified by MAX_USER_PRIORITY in lsb.params
  • Users with bsub -sp or bmod -sp
  • Automatic priority escalation with JOB_PRIORITY_OVER_TIME in lsb.params
QPRIORITY
The priority of the submission queue.
ADMIN
 
Administrators use bmod -aps to set this subfactor value for each job. A positive value increases the APS. A negative value decreases the APS. The ADMIN factor is added to the calculated APS value to change the factor value.
The ADMIN factor applies to the entire job. You cannot configure separate weight, limit, or grace period factors. The ADMIN factor takes effect as soon as it is set.

Where LSF gets the job information for each factor

Factor or subfactor
Gets job information from...
MEM
The value for jobs submitted with -R "rusage[mem]"
For compound resource requirements submitted with -R "n1*{rusage[mem1]} + n2*{rusage[mem2]}" the value of MEM depends on whether resources are reserved per slot.
  • If RESOURCE_RESERVE_PER_SLOT=N, then MEM=mem1+mem2
  • If RESOURCE_RESERVE_PER_SLOT=Y, then MEM=n1*mem1+n2*mem2
SWAP
The value for jobs submitted with -R "rusage[swp]"
For compound resource requirements, SWAP is determined in the same manner as MEM.
PROC
The value of n for jobs submitted with bsub -n (min, max), or the value of PROCLIMIT in lsb.queues
JPRIORITY
The dynamic priority of the job, updated every scheduling cycle and escalated by interval defined in JOB_PRIORITY_OVER_TIME defined in lsb.params
QPRIORITY
The priority of the job submission queue
FS
The fairshare priority value of the submission user

Enable absolute priority scheduling

Configure APS_PRIORITY in an absolute priority queue in lsb.queues.

APS_PRIORITY=WEIGHT[[factor, value] [subfactor, value]...]...] LIMIT[[factor, value] [subfactor, value]...]...] GRACE_PERIOD[[factor, value] [subfactor, value]...]...]

Pending jobs in the queue are ordered according to the calculated APS value.

If weight of a subfactor is defined, but the weight of parent factor is not defined, the parent factor weight is set as 1.

The WEIGHT and LIMIT factors are floating-point values. Specify a value for GRACE_PERIOD in seconds (values), minutes (valuem), or hours (valueh).

The default unit for grace period is hours.

For example, the following sets a grace period of 10 hours for the MEM factor, 10 minutes for the JPRIORITY factor, 10 seconds for the QPRIORITY factor, and 10 hours (default) for the RSRC factor:

GRACE_PERIOD[[MEM,10h] [JPRIORITY, 10m] [QPRIORITY,10s] [RSRC, 10]] 

You cannot specify zero (0) for the WEIGHT, LIMIT, and GRACE_PERIOD of any factor or subfactor.

APS queues cannot configure cross-queue fairshare (FAIRSHARE_QUEUES) or host-partition fairshare.

Modify the system APS value (bmod)

The absolute scheduling priority for a newly submitted job is dynamic. Job priority is calculated and updated based on formula specified by APS_PRIORITY in the absolute priority queue. Administrators can use bmod to manually override the calculated APS value.

Run bmod -apsn job_ID to undo the previous bmod -aps setting.

Assign a static system priority and ADMIN factor value

Administrators can use using bmod -aps "system=value" to assign a static job priority for a pending job. The value cannot be zero (0).

In this case, job's absolute priority is not calculated. The system APS priority is guaranteed to be higher than any calculated APS priority value. Jobs with higher system APS settings have priority over jobs with lower system APS settings.

The system APS value set by bmod -aps is preserved after mbatchd reconfiguration or mbatchd restart.

Use the ADMIN factor to adjust the APS value

Administrators can use bmod -aps "admin=value" to change the calculated APS value for a pending job. The ADMIN factor is added to the calculated APS value to change the factor value. The absolute priority of the job is recalculated. The value cannot be zero (0).

A bmod -aps command always overrides the last bmod -aps commands

The ADMIN APS value set by bmod -aps is preserved after mbatchd reconfiguration or mbatchd restart.

Example bmod output

The following commands change the APS values for jobs 313 and 314:

bmod -aps "system=10" 313 
Parameters of job <313> are being changed 
bmod -aps "admin=10.00" 314 
Parameters of job <314> are being changed 
View modified APS values

Use bjobs -aps to see the effect of the changes:

bjobs -aps 
JOBID   USER   STAT   QUEUE  FROM_HOST EXEC_HOST   JOB_NAME   SUBMIT_TIME    APS 
313    user1   PEND   owners hostA                    myjob  Feb 12 01:09    (10) 
321    user1   PEND   owners hostA                    myjob  Feb 12 01:09      - 
314    user1   PEND   normal hostA                    myjob  Feb 12 01:08 109.00 
312    user1   PEND   normal hostA                    myjob  Feb 12 01:08  99.00 
315    user1   PEND   normal hostA                    myjob  Feb 12 01:08  99.00 
316    user1   PEND   normal hostA                    myjob  Feb 12 01:08  99.00 

Use bjobs -l to show APS values modified by the administrator:

bjobs -l 
Job <313>, User <user1>, Project <default>, Service Class 
<SLASamples>, Status <RUN>, Queue <normal>, Command <myjob>, System 
Absolute Priority <10> 
Job <314>, User <user1>, Project <default>, Status <PEND>, Queue 
<normal>, Command <myjob>, Admin factor value <10> 

Use bhist -l to see historical information about administrator changes to APS values. For example, after running these commands:

  1. bmod -aps "system=10" 108
  2. bmod -aps "admin=20" 108
  3. bmod -apsn 108

bhist -l shows the sequence changes to job 108:

bhist -l 
Job <108>, User <user1>, Project <default>, Command <sleep 10000> 
Tue Feb 13 15:15:26: Submitted from host <HostB>, to Queue <normal>, CWD 
</scratch/user1>; 
Tue Feb 13 15:15:40: Parameters of Job are changed: 
   Absolute Priority Scheduling factor string changed to : system=10; 
Tue Feb 13 15:15:48: Parameters of Job are changed: 
   Absolute Priority Scheduling factor string changed to : admin=20; 
Tue Feb 13 15:15:58: Parameters of Job are changed: 
   Absolute Priority Scheduling factor string deleted; 
Summary of time in seconds spent in various states by  Tue Feb 13 15:16:02 
  PEND    PSUSP    RUN    USUSP    SSUSP    UNKWN    TOTAL 
   36       0       0       0        0        0        36 

Configure APS across multiple queues

Use QUEUE_GROUP in an absolute priority queue in lsb.queues to configure APS across multiple queues.

When APS is enabled in the queue with APS_PRIORITY, the FAIRSHARE_QUEUES parameter is ignored. The QUEUE_GROUP parameter replaces FAIRSHARE_QUEUES, which is obsolete in LSF 7.0.

Example 1

You want to schedule jobs from the normal queue and the short queue, factoring the job priority (weight 1) and queue priority (weight 10) in the APS value:

Begin Queue 
QUEUE_NAME   = normal 
PRIORITY     = 30 
NICE         = 20 
APS_PRIORITY = WEIGHT [[JPRIORITY, 1] [QPRIORITY, 10]] 
QUEUE_GROUP = short 
DESCRIPTION  = For normal low priority jobs, running only if hosts 
are lightly loaded. 
End Queue 
... 
Begin Queue    
QUEUE_NAME   = short 
PRIORITY     = 20 
NICE         = 20 
End Queue 

The APS value for jobs from the normal queue and the short queue are: calculated as:

APS_PRIORITY = 1 * (1 * job_priority + 10 * queue_priority) 

The first 1 is the weight of the WORK factor; the second 1 is the weight of the job priority subfactor; the 10 is the weight of queue priority subfactor.

If you want the job priority to increase based on the pending time, you must configure JOB_PRIORITY_OVER_TIME parameter in the lsb.params.

Example 2

Extending example 1, you want to add user-based fairshare with a weight of 100 to the APS value in the normal queue:

Begin Queue 
QUEUE_NAME   = normal 
PRIORITY     = 30 
NICE         = 20 
FAIRSHARE = USER_SHARES [[user1, 5000] [user2, 5000] [others, 1]] 
APS_PRIORITY = WEIGHT [[JPRIORITY, 1] [QPRIORITY, 10] [FS, 100]] 
QUEUE_GROUP = short 
DESCRIPTION  = For normal low priority jobs, running only if hosts 
are lightly loaded. 
End Queue 

The APS value is now calculated as

APS_PRIORITY = 1 * (1 * job_priority + 10 * queue_priority) + 100 * user_priority 
Example 3

Extending example 2, you now to add swap space to the APS value calculation. The APS configuration changes to:

APS_PRIORITY = WEIGHT [[JPRIORITY, 1] [QPRIORITY, 10] [FS, 100] [SWAP, -10]] 

And the APS value is now calculated as

APS_PRIORITY = 1 * (1 * job_priority + 10 * queue_priority) + 100 * user_priority + 1 * 
(-10 * SWAP) 

View pending job order by the APS value

Run bjobs -aps to see APS information for pending jobs in the order of absolute scheduling priority. The order that the pending jobs are displayed is the order in which the jobs are considered for dispatch.

The APS value is calculated based on the current scheduling cycle, so jobs are not guaranteed to be dispatched in this order.

Pending jobs are ordered by APS value. Jobs with system APS values are listed first, from highest to lowest APS value. Jobs with calculated APS values are listed next ordered from high to low value. Finally, jobs not in an APS queue are listed. Jobs with equal APS values are listed in order of submission time.

If queues are configured with the same priority, bjobs -aps may not show jobs in the correct expected dispatch order. Jobs may be dispatched in the order the queues are configured in lsb.queues. You should avoid configuring queues with the same priority.

Example bjobs -aps output

The following example uses this configuration;

bjobs -aps was run at 14:41:

bjobs -aps 
JOBID     USER     STAT    QUEUE    FROM_HOST    JOB_NAME     SUBMIT_TIME    APS 
  15     User2     PEND    priority    HostB       myjob     Dec 21 14:30     - 
  22     User1     PEND    Short       HostA       myjob     Dec 21 14:30     (60) 
   2     User1     PEND    Short       HostA       myjob     Dec 21 11:00     360 
  12     User2     PEND    normal      HostB       myjob     Dec 21 14:30     355 
   4     User1     PEND    Short       HostA       myjob     Dec 21 14:00     270 
   5     User1     PEND    Idle        HostA       myjob     Dec 21 14:01      - 

For job 2, APS = 10 * 20 + 1 * (50 + 220 * 5 /10) = 360

For job 12, APS = 10 *30 + 1 * (50 + 10 * 5/10) = 355

For job 4, APS = 10 * 20 + 1 * (50 + 40 * 5 /10) = 270

View APS configuration for a queue

Run bqueues -l to see the current APS information for a queue:

bqueues -l normal 
 
QUEUE: normal 
  -- No description provided.  This is the default queue. 
 
PARAMETERS/STATISTICS 
PRIO NICE STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN SSUSP USUSP  RSV 
500   20  Open:Active       -    -    -    -     0     0     0     0     0    0 
 
SCHEDULING PARAMETERS 
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem 
 loadSched   -     -     -     -       -     -    -     -     -      -      - 
 loadStop    -     -     -     -       -     -    -     -     -      -      - 
 
SCHEDULING POLICIES:  FAIRSHARE  APS_PRIORITY 
APS_PRIORITY: 
                   WEIGHT FACTORS    LIMIT FACTORS    GRACE PERIOD 
  FAIRSHARE            10000.00                -              - 
  RESOURCE            101010.00                -          1010h 
    PROCESSORS           -10.01                -              - 
    MEMORY              1000.00         20010.00             3h 
    SWAP               10111.00                -              - 
  WORK                     1.00                -              - 
    JOB PRIORITY     -999999.00         10000.00          4131s 
    QUEUE PRIORITY     10000.00            10.00              - 
 
USER_SHARES:  [user1, 10] 
 
SHARE_INFO_FOR: normal/ 
 USER/GROUP   SHARES  PRIORITY  STARTED  RESERVED  CPU_TIME  RUN_TIME 
user1       10       3.333      0        0         0.0        0 
 
USERS: all 
HOSTS:  all 
REQUEUE_EXIT_VALUES:  10 

Feature interactions

Fairshare

The default user-based fairshare can be a factor in APS calculation by adding the FS factor to APS_PRIORITY in the queue.

FCFS

APS overrides the job sort result of FCFS.

SLA scheduling

APS cannot be used together with SLA scheduling.

Job requeue

All requeued jobs are treated as newly submitted jobs for APS calculation. The job priority, system, and ADMIN APS factors are reset on requeue.

Rerun jobs

Rerun jobs are not treated the same as requeued jobs. A job typically reruns because the host failed, not through some user action (like job requeue), so the job priority is not reset for rerun jobs.

Job migration

Suspended (bstop) jobs and migrated jobs (bmig) are always scheduled before pending jobs. For migrated jobs, LSF keeps the existing job priority information.

If LSB_REQUEUE_TO_BOTTOM and LSB_MIG2PEND are configured in lsf.conf, the migrated jobs keep their APS information. When LSB_REQUEUE_TO_BOTTOM and LSB_MIG2PEND are configured, the migrated jobs need to compete with other pending jobs based on the APS value. If you want to reset the APS value, the you should use brequeue, not bmig.

Resource reservation

The resource reservation is based on queue policies. The APS value does not affect current resource reservation policy.

Preemption

The preemption is based on queue policies. The APS value does not affect the current preemption policy.

Chunk jobs

The first chunk job to be dispatched is picked based on the APS priority. Other jobs in the chunk is picked based on the APS priority and the default chunk job scheduling policies.

The following job properties must be the same for all chunk jobs:

Backfill scheduling

Not affected.

Advance reservation

Not affected.

Resizable jobs

For new resizable job allocation requests, the resizable job inherits the APS value from the original job. The subsequent calculations use factors as follows:

Factor or sub-factor
Behavior
FAIRSHARE
Resizable jobs submitting into fairshare queues or host partitions are subject to fairshare scheduling policies. The dynamic priority of the user who submitted the job is the most important criterion. LSF treats pending resize allocation requests as a regular job and enforces the fairshare user priority policy to schedule them.
The dynamic priority of users depends on:
  • Their share assignment
  • The slots their jobs are currently consuming
  • The resources their jobs consumed in the past
  • The adjustment made by the fairshare plugin (libfairshareadjust.*)
Resizable job allocation changes affect the user priority calculation if RUN_JOB_FACTOR is greater than zero (0). Resize add requests increase number of slots in use and decrease user priority. Resize release requests decrease number of slots in use, and increase user priority. The faster a resizable job grows, the lower the user priority is, the less likely a pending allocation request can get more slots.
MEM
Use the value inherited from the original job
PROC
Use the MAX value of the resize request
SWAP
Use the value inherited from the original job
JPRIORITY
Use the value inherited from the original job. If the automatic job priority escalation is configured, the dynamic value is calculated as described in Automatic Job Priority Escalation.
For a requeued and rerun resizable jobs, the JPRIORITY is reset, and the new APS value is calculated with the new JPRIORITY.
For migrated resizable job, the JPRIORITY is carried forward, and the new APS value is calculated with the JPRIORITY continued from the original value.
QPRIORITY
Use the value inherited from the original job
ADMIN
Use the value inherited from the original job


Platform Computing Inc.
www.platform.com
Knowledge Center         Contents    Previous  Next    Index