Knowledge Center         Contents    Previous  Next    Index  
Platform Computing Corp.

Resource Preemption

Contents

About Resource Preemption

Preemptive Scheduling and Resource Preemption

Resource preemption is a special type of preemptive scheduling. It is similar to job slot preemption.

Job Slot Preemption and Resource Preemption

If you enable preemptive scheduling, job slot preemption is always enabled. Resource preemption is optional. With resource preemption, you can configure preemptive scheduling based on other resources in addition to job slots.

Types of Resource Preemption

License Preemption

If you have configured a custom resource to manage software application licenses that are shared throughout the cluster (Network Floating Licenses), you can use preemptive scheduling to make these licenses more available to high-priority queues.

The license resource can be either static (network floating licenses managed within LSF) or dynamic and decreasing (network floating licenses outside of LSF control and measured with an ELIM).

Other Resources

Resource preemption works for any custom shared numeric resource (except increasing dynamic resources) so its use is not restricted to managing licenses. To preempt on a host-based resource, such as memory, you could configure a custom resource "shared" on only one host.

Multiple Resource Preemption

If multiple resources are required, LSF can preempt multiple jobs, until sufficient resources are available. For example, one or more jobs might be preempted for a job that needs:

Using Resource Preemption

To allow your job to participate in resource preemption, you must use resource reservation to reserve the preemption resource (the cluster might be configured so that this occurs automatically). For dynamic resources, you must specify a duration also.

Resource reservation is part of resource requirement, which can be specified at the job level or at the queue level or application level.

You can use a task file to associate specific resource requirements with specific applications.

Dynamic Resources

Specify duration

If the preemption resource is dynamic, you must specify the duration part of the resource reservation string when you submit a preempting or preemptable job.

Resources outside the control of LSF

If an ELIM is needed to determine the value of a dynamic resource (such as the number of software licenses available), LSF preempts jobs as necessary, then waits for ELIM to report that the resources are available before starting the high-priority job. By default, LSF waits 300 seconds (5 minutes) for resources to become available. This time can be increased (PREEMPTION_WAIT_TIME in lsb.params).

If the preempted jobs do not release the resources, or the resources have been intercepted by a non-LSF user, the ELIM does not report any more of the resource becoming available, and LSF might preempt more jobs to get the resources.

Requirements for Resource Preemption

Custom Job Controls for Resource Preemption

Why you have to customize LSF

By default, LSF's preemption action is to send a suspend signal (SIGSTOP) to stop the application. Some applications do not release resources when they get SIGSTOP. If this happens, the preemption resource does not become available, and the preempting job is not successful.

You modify LSF's default preemption behavior to make the application release the preemption resource when a job is preempted.

Customizing the SUSPEND action

Ask your application vendor what job control signals or actions cause your application to suspend a job and release the preemption resources. You need to replace the default SUSPEND action (the SIGSTOP signal) with another signal or script that works properly with your application when it suspends the job. For example, your application might be able to catch SIGTSTP instead of SIGSTOP.

By default, LSF sends SIGCONT to resume suspended jobs. You should find out if this causes your application to take the resources back when it resumes the job (for example, if it checks out a license again). If not, you need to modify the RESUME action also.

Whatever changes you make to the SUSPEND job control affects all suspended jobs in the queue, including preempted jobs, jobs that are suspended because of load thresholds, and jobs that you suspend using LSF commands. Similarly, changes made to the RESUME job control also affect the whole queue.

Killing Preempted Jobs

If you want to use resource preemption, but cannot get your application to release or take back the resource, you can configure LSF to kill the low-priority job instead of suspending it. This method is less efficient because when you kill a job, you lose all the work, and you have to restart the job from the beginning.

Resource Preemption Steps

To make resource preemption useful, you may need to work through all of these steps.

  1. Read.
  2. Before you set up resource preemption, you should understand the following:

  3. Plan.
  4. When you plan how to set up resource preemption, consider:

  5. Write the ELIM.
  6. Configure LSF.
  7. a lsb.queues

  8. Reconfigure LSF to make your changes take effect.
  9. Operate.
  10. Track.
  11. Use bparams -l to view information about preemption configuration in your cluster.

Configure Resource Preemption

  1. Configure preemptive scheduling (PREEMPTION in lsb.queues).
  2. Configure the preemption resources (PREEMPTABLE_RESOURCES in lsb.params).
  3. Job slots are the default preemption resource. To define additional resources to use with preemptive scheduling, set PREEMPTABLE_RESOURCES in lsb.params, and specify the names of the custom resources as a space-separated list.

  4. Customize the preemption action.
  5. Preemptive scheduling uses the SUSPEND and RESUME job control actions to suspend and resume preempted jobs. For resource preemption, it is critical that the preempted job releases the resource. You must modify LSF default job controls to make resource preemption work.

  6. Optional. Configure the preemption wait time.
  7. To specify how long LSF waits for the ELIM to report that the resources are available, set PREEMPTION_WAIT_TIME in lsb.params and specify the number of seconds to wait. You cannot specify any less than the default time (300 seconds).

    For example, to make LSF wait for 8 minutes, specify

    PREEMPTION_WAIT_TIME=480 
    

License Preemption Example

Configuration

This example uses LicenseA as name of preemption resource.

lsf.shared

Add the resource to the Resource section.

Begin Resource 
RESOURCENAME TYPE    INTERVAL INCREASING DESCRIPTION 
LicenseA     Numeric 60       N          (custom application) 
... 
End Resource 
lsf.cluster.cluster_name

Add the resource to the ResourceMap section

Begin ResourceMap 
RESOURCENAME LOCATION 
LicenseA     [all] 
... 
End ResourceMap 
lsb.params

Add the resource to the list of preemption resources.

... 
PREEMPTABLE_RESOURCES = LicenseA 
... 
lsb.queues

Define a higher priority queue to be a PREEMPTIVE queue by adding one line in the queue definition.

Begin Queue 
QUEUE_NAME=high 
PRIORITY=40 
... 
PREEMPTION=PREEMPTIVE 
DESCRIPTION=jobs may preempt jobs in lower-priority queues 
... 
End Queue 

Configure a job control action in a lower priority queue, let SIGTSTP be sent when the SUSPEND action is called, we assume your application can catch the signal SIGTSTP and release the resource (license) it used, then suspend itself. You should also make sure that your application can catch the signal SIGCONT while it is suspended, and consume (check out) the resource (license) again.

Begin Queue 
QUEUE_NAME=low 
PRIORITY=20 
... 
JOB_CONTROLS=SUSPEND[SIGTSTP] RESUME[SIGCONT] TERMINATE[SIGTERM] 
DESCRIPTION=jobs preempted by jobs in higher-priority queues 
... 
End Queue 
ELIM

Write an ELIM to report the current available number of Application A licenses. This ELIM starts on the master host.

Operation

Check how many LicenseA resources are available

Check the number of LicenseA existing in the cluster by using bhosts -s LicenseA. In this example, 2 licenses are available.

bhosts -s LicenseA 
RESOURCE     TOTAL RESERVED LOCATION 
LicenseA     2     0.0      hostA hostB ... 
Using up all LicenseA resources

Submit 2 jobs to a low-priority queue to consume those 2 licenses.

bsub -J first -q low -R "rusage[LicenseA=1:duration=2]" your_app 
bsub -J second -q low -R "rusage[LicenseA=1:duration=2]" your_app 

After a while, those jobs are running and the LicenseA resource is used up.

bjobs 
JOBID USER STAT QUEUE FROM_HOS EXEC_HOST JOB_NAME SUBMIT_TIME 
201   you  RUN  low   hostx    hostA     /first   Aug 23 15:42 
202   you  RUN  low   hostx    hostB     /second  Aug 23 15:43 
bhosts -s LicenseA 
RESOURCE     TOTAL RESERVED LOCATION 
LicenseA     0     2.0      hostA hostB ... 
Preempting a job for the LicenseA resource

Submit a job to a high-priority queue to preempt a job from a low-priority queue for the resource LicenseA.

bsub -J third -q high -R "rusage[LicenseA=1:duration=2]" your_app 

After a while, the third job is running and the second job is suspended.

bjobs 
JOBID USER STAT  QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 
203   you  RUN   high  hostx     hostA     /third   Aug 23 15:48 
201   you  RUN   low   hostx     hostA     /first   Aug 23 15:42 
202   you  SSUSP low   hostx     hostB     /second  Aug 23 15:43 
bhosts -s LicenseA  
RESOURCE     TOTAL RESERVED LOCATION 
LicenseA     0     2.0      hostA hostB ... 

Memory Preemption Example

Configuration

This example uses pre_mem as the name of the preemption resource.

lsf.shared

Add the resource to the Resource section.

Begin Resource 
RESOURCENAME TYPE    INTERVAL INCREASING      DESCRIPTION 
pre_mem      Numeric 60       N    (external memory usage reporter) 
... 
End Resource 
lsf.cluster.cluster_name

Add the resource to the "ResourceMap" section.

Begin ResourceMap 
RESOURCENAME LOCATION 
pre_mem      ([hostA] [hostB] ... [hostX])  
#List the hosts where you want memory preemption to occur. 
...
End ResourceMap 
lsb.params

Add the resource to the list of preemption resources.

... 
PREEMPTABLE_RESOURCES=pre_mem 
... 
lsb.queues

Define a higher-priority queue to be the PREEMPTIVE queue by adding one line in the queue definition.

Begin Queue 
QUEUE_NAME=high 
PRIORITY=40 
... 
PREEMPTION=PREEMPTIVE 
DESCRIPTION=preempt jobs in lower-priority queues 
... 
End Queue 

Configure a job control action in a lower-priority queue, and let SIGTSTP be sent when the SUSPEND action is called. This assumes your application can catch the signal SIGTSTP and release (free) the resource (memory) it used, then suspend itself. You should also make sure that your application can catch the signal SIGCONT while it is suspended, and consume the resource (memory) again.

Begin Queue 
QUEUE_NAME=low 
PRIORITY=20 
... 
JOB_CONTROLS=SUSPEND[SIGTSTP] RESUME[SIGCONT] TERMINATE[SIGTERM] 
DESCRIPTION=jobs may be preempted by jobs in higher-priority queues 
... 
End Queue 
ELIM

This is an example of an ELIM that reports the current value of pre_mem. This ELIM starts on all the hosts that have the pre_mem resource.

#!/bin/sh 
host=`hostname` 
while : 
do 
lsload > /dev/null 2>&1 
if [ $? != 0 ] ; then exit 1 
fi 
memStr=`lsload -I mem -w $host|grep $host|awk '{print $3}'|sed 's/M//'`
reportStr="1 ""pre_mem ""$memStr" 
echo "$reportStr \c" 
sleep 60 
done 

Operation

Check how many pre_mem resources are available

Check the number of pre_mem existing on hostA by using bhosts -s pre_mem to display how much memory is available. In this example, 110 MB of memory is available on hostA.

bhosts -s pre_mem  
RESOURCE  TOTAL RESERVED LOCATION 
pre_mem   110   0.0      hostA 
pre_mem   50    0.0      hostB 
... 
Using up some pre_mem resources

Submit 1 job to a low-priority queue to consume 100 MB pre_mem. Assume the application mem_app consumes 100 MB memory after it starts.

bsub -J first -q low -R "rusage[pre_mem=100:duration=2]" mem_app 

After a while, the first job is running and the pre_mem is reduced.

bjobs 
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 
301   you  RUN  low   hostx     hostA     /first   Aug 23 16:42 
bhosts -s pre_mem  
RESOURCE TOTAL RESERVED LOCATION 
pre_mem  10    100.0      hostA 
pre_mem  50      0.0      hostB 
... 
Preempting the job for pre_mem resources

Submit a job to a high-priority queue to preempt a job from low-priority queue to get the resource pre_mem.

bsub -J second -q high -R "rusage[pre_mem=100:duration=2]" mem_app 

After a while, the second job is running and the first job was suspended.

bjobs 
JOBID USER STAT  QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 
302   you  RUN   high  hostx     hostA     /second  Aug 23 16:48 
301   you  SSUSP low   hostx     hostA     /first   Aug 23 16:42 
bhosts -s pre_mem  
RESOURCE TOTAL RESERVED LOCATION 
pre_mem  10    100.0      hostA 
pre_mem  50      0.0      hostB 
... 

Platform Computing Inc.
www.platform.com
Knowledge Center         Contents    Previous  Next    Index