Knowledge Center Contents Previous Next Index |
Resource Preemption
Contents
- About Resource Preemption
- Requirements for Resource Preemption
- Custom Job Controls for Resource Preemption
- Resource Preemption Steps
- Configure Resource Preemption
- License Preemption Example
- Memory Preemption Example
About Resource Preemption
Preemptive Scheduling and Resource Preemption
Resource preemption is a special type of preemptive scheduling. It is similar to job slot preemption.
Job Slot Preemption and Resource Preemption
If you enable preemptive scheduling, job slot preemption is always enabled. Resource preemption is optional. With resource preemption, you can configure preemptive scheduling based on other resources in addition to job slots.
Types of Resource Preemption
License Preemption
If you have configured a custom resource to manage software application licenses that are shared throughout the cluster (Network Floating Licenses), you can use preemptive scheduling to make these licenses more available to high-priority queues.
The license resource can be either static (network floating licenses managed within LSF) or dynamic and decreasing (network floating licenses outside of LSF control and measured with an ELIM).
Other Resources
Resource preemption works for any custom shared numeric resource (except increasing dynamic resources) so its use is not restricted to managing licenses. To preempt on a host-based resource, such as memory, you could configure a custom resource "shared" on only one host.
Multiple Resource Preemption
If multiple resources are required, LSF can preempt multiple jobs, until sufficient resources are available. For example, one or more jobs might be preempted for a job that needs:
- Multiple job slots
- Multiple licenses for one software application
- Multiple resources, such as a job slot, a license, and memory
- More of a resource than can be obtained by preempting just one job
Using Resource Preemption
To allow your job to participate in resource preemption, you must use resource reservation to reserve the preemption resource (the cluster might be configured so that this occurs automatically). For dynamic resources, you must specify a duration also.
Resource reservation is part of resource requirement, which can be specified at the job level or at the queue level or application level.
You can use a task file to associate specific resource requirements with specific applications.
Dynamic Resources
Specify duration
If the preemption resource is dynamic, you must specify the duration part of the resource reservation string when you submit a preempting or preemptable job.
Resources outside the control of LSF
If an ELIM is needed to determine the value of a dynamic resource (such as the number of software licenses available), LSF preempts jobs as necessary, then waits for ELIM to report that the resources are available before starting the high-priority job. By default, LSF waits 300 seconds (5 minutes) for resources to become available. This time can be increased (PREEMPTION_WAIT_TIME in
lsb.params
).If the preempted jobs do not release the resources, or the resources have been intercepted by a non-LSF user, the ELIM does not report any more of the resource becoming available, and LSF might preempt more jobs to get the resources.
Requirements for Resource Preemption
- Resource preemption depends on all these conditions:
- The preemption resources must be configured (PREEMPTABLE_RESOURCES in
lsb.params
).- Jobs must reserve the correct amount of the preemption resource, using resource reservation (the rusage part of the resource requirement string).
- For dynamic preemption resources, jobs must specify the duration part of the resource reservation string.
- Jobs that use the preemption resource must be spread out among multiple queues of different priority, and preemptive scheduling must be configured so that preemption can occur among these queues (preemption can only occur if jobs are in different queues).
- Only a releaseable resource can be a preemption resource. LSF must be configured to release the preemption resource when the job is suspended (RELEASE=Y in
lsf.shared
, which is the default). You must configure this no matter what your preemption action is.- LSF's preemption behavior must be modified. By default, LSF's default preemption action does not allow an application to release any resources, except for job slots and static shared resources.
Custom Job Controls for Resource Preemption
Why you have to customize LSF
By default, LSF's preemption action is to send a suspend signal (SIGSTOP) to stop the application. Some applications do not release resources when they get SIGSTOP. If this happens, the preemption resource does not become available, and the preempting job is not successful.
You modify LSF's default preemption behavior to make the application release the preemption resource when a job is preempted.
Customizing the SUSPEND action
Ask your application vendor what job control signals or actions cause your application to suspend a job and release the preemption resources. You need to replace the default SUSPEND action (the SIGSTOP signal) with another signal or script that works properly with your application when it suspends the job. For example, your application might be able to catch SIGTSTP instead of SIGSTOP.
By default, LSF sends SIGCONT to resume suspended jobs. You should find out if this causes your application to take the resources back when it resumes the job (for example, if it checks out a license again). If not, you need to modify the RESUME action also.
Whatever changes you make to the SUSPEND job control affects all suspended jobs in the queue, including preempted jobs, jobs that are suspended because of load thresholds, and jobs that you suspend using LSF commands. Similarly, changes made to the RESUME job control also affect the whole queue.
Killing Preempted Jobs
If you want to use resource preemption, but cannot get your application to release or take back the resource, you can configure LSF to kill the low-priority job instead of suspending it. This method is less efficient because when you kill a job, you lose all the work, and you have to restart the job from the beginning.
- You can configure LSF to kill and requeue suspended jobs (use brequeue as the SUSPEND job control in lsb.queues). This kills all jobs suspended in the queue, not just preempted jobs.
- You can configure LSF to kill preempted jobs instead of suspending them (TERMINATE_WHEN=PREEMPT in lsb.queues). In this case, LSF does not restart the preempted job, you have to resubmit it manually.
Resource Preemption Steps
To make resource preemption useful, you may need to work through all of these steps.
- Read.
Before you set up resource preemption, you should understand the following:
- Preemptive Scheduling
- Resource Preemption
- Resource Reservation
- Customizing Resources
- Customizing Job Controls
- Plan.
When you plan how to set up resource preemption, consider:
- Custom job controls: Find out what signals or actions you can use with your application to control the preemption resource when you suspend and resume jobs.
- Existing cluster configuration: Your design might be based on preemptive queues or custom resources that are already configured in your cluster.
- Requirements for resource preemption: Your design must be able to work. For example, if the application license is the preemption resource, you cannot set up one queue for each type of application, because preemption occurs between different queues. If a host-based resource such as memory is the preemption resource, you cannot set up only one queue for each host, because preemption occurs when 2 jobs are competing for the same resource.
- Write the ELIM.
- Configure LSF.
a
lsb.queues
- Set PREEMPTION in at least one queue (to PREEMPTIVE in a high-priority queue, or to PREEMPTABLE in a low-priority queue).
- Set JOB_CONTROLS (or TERMINATE_WHEN) in the low-priority queues. Optional. Set RES_REQ to automatically reserve the custom resource.
b
lsf.shared
Define the custom resource in the Resource section.
c
lsb.params
- Set PREEMPTABLE_RESOURCES and specify the custom resource.
- Optional. Set PREEMPTION_WAIT_TIME to specify how many seconds to wait for dynamic resources to become available.
- Optional. Set PREEMPT_JOBTYPE to enable preemption of exclusive and backfill jobs. Specify one or both of the keywords EXCLUSIVE and BACKFILL. By default, exclusive and backfill jobs are only preempted if the exclusive low priority job is running on a host that is different than the one used by the preemptive high priority job.
d
lsf.cluster.
cluster_name
Define how the custom resource is shared in the ResourceMap section.
e
lsf.task.
cluster_name
Optional. Configure the RemoteTasks section to automatically reserve the custom resource.
- Reconfigure LSF to make your changes take effect.
- Operate.
- Use resource reservation to reserve the preemption resource (this might be configured to occur automatically). For dynamic resources, you must specify a duration as well as a quantity.
- Distribute jobs that use the preemption resource in way that allows preemption to occur between queues (this should happen as a result of the cluster design).
- Track.
Use
bparams -l
to view information about preemption events in your cluster.Configure Resource Preemption
- Configure preemptive scheduling (PREEMPTION in
lsb.queues
).- Configure the preemption resources (PREEMPTABLE_RESOURCES in
lsb.params
).Job slots are the default preemption resource. To define additional resources to use with preemptive scheduling, set PREEMPTABLE_RESOURCES in lsb.params, and specify the names of the custom resources as a space-separated list.
- Customize the preemption action.
Preemptive scheduling uses the SUSPEND and RESUME job control actions to suspend and resume preempted jobs. For resource preemption, it is critical that the preempted job releases the resource. You must modify LSF default job controls to make resource preemption work.
- Suspend using a custom job control.
- To modify the default suspend action, set JOB_CONTROLS in
lsb.queues
and use replace the SUSPEND job control with a script or a signal that your application can catch. Do this for all queues where there could be preemptable jobs using the preemption resources.- For example, if your application vendor tells you to use the SIGTSTP signal, set JOB_CONTROLS in
lsb.queues
and use SIGTSTP as the SUSPEND job control:JOB_CONTROLS = SUSPEND [SIGTSTP]- Kill jobs with
brequeue
.- To kill and requeue preempted jobs instead of suspending them, set JOB_CONTROLS in
lsb.queues
and use brequeue as the SUSPEND job control:JOB_CONTROLS = SUSPEND [brequeue $LSB_JOBID]- Do this for all queues where there could be preemptable jobs using the preemption resources. This kills a preempted job, and then requeues it so that it has a chance to run and finish sucessfully.
- Kill jobs with TERMINATE_WHEN.
- To kill preempted jobs instead of suspending them, set TERMINATE_WHEN in lsb.queues to PREEMPT. Do this for all queues where there could be preemptable jobs using the preemption resources.
- If you do this, the preempted job does not get to run unless you resubmit it.
- Optional. Configure the preemption wait time.
To specify how long LSF waits for the ELIM to report that the resources are available, set PREEMPTION_WAIT_TIME in
lsb.params
and specify the number of seconds to wait. You cannot specify any less than the default time (300 seconds).For example, to make LSF wait for 8 minutes, specify
PREEMPTION_WAIT_TIME=480License Preemption Example
Configuration
This example uses
LicenseA
as name of preemption resource.lsf.shared
Add the resource to the Resource section.
Begin Resource RESOURCENAME TYPE INTERVAL INCREASING DESCRIPTION LicenseA Numeric 60 N (custom application) ... End Resourcelsf.cluster.cluster_name
Add the resource to the ResourceMap section
Begin ResourceMap RESOURCENAME LOCATION LicenseA [all] ... End ResourceMaplsb.params
Add the resource to the list of preemption resources.
... PREEMPTABLE_RESOURCES = LicenseA ...lsb.queues
Define a higher priority queue to be a PREEMPTIVE queue by adding one line in the queue definition.
Begin Queue QUEUE_NAME=high PRIORITY=40 ... PREEMPTION=PREEMPTIVE DESCRIPTION=jobs may preempt jobs in lower-priority queues ... End QueueConfigure a job control action in a lower priority queue, let SIGTSTP be sent when the SUSPEND action is called, we assume your application can catch the signal SIGTSTP and release the resource (license) it used, then suspend itself. You should also make sure that your application can catch the signal SIGCONT while it is suspended, and consume (check out) the resource (license) again.
Begin Queue QUEUE_NAME=low PRIORITY=20 ... JOB_CONTROLS=SUSPEND[SIGTSTP] RESUME[SIGCONT] TERMINATE[SIGTERM] DESCRIPTION=jobs preempted by jobs in higher-priority queues ... End QueueELIM
Write an ELIM to report the current available number of Application A licenses. This ELIM starts on the master host.
Operation
Check how many LicenseA resources are available
Check the number of
LicenseA
existing in the cluster by usingbhosts -s LicenseA
. In this example, 2 licenses are available.bhosts -s LicenseA
RESOURCE TOTAL RESERVED LOCATION LicenseA 2 0.0 hostA hostB ...Using up all LicenseA resources
Submit 2 jobs to a low-priority queue to consume those 2 licenses.
bsub -J first -q low -R "rusage[LicenseA=1:duration=2]" your_app
bsub -J second -q low -R "rusage[LicenseA=1:duration=2]" your_app
After a while, those jobs are running and the LicenseA resource is used up.
bjobs
JOBID USER STAT QUEUE FROM_HOS EXEC_HOST JOB_NAME SUBMIT_TIME 201 you RUN low hostx hostA /first Aug 23 15:42 202 you RUN low hostx hostB /second Aug 23 15:43bhosts -s LicenseA
RESOURCE TOTAL RESERVED LOCATION LicenseA 0 2.0 hostA hostB ...Preempting a job for the LicenseA resource
Submit a job to a high-priority queue to preempt a job from a low-priority queue for the resource
LicenseA
.bsub -J third -q high -R "rusage[LicenseA=1:duration=2]" your_app
After a while, the third job is running and the second job is suspended.
bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 203 you RUN high hostx hostA /third Aug 23 15:48 201 you RUN low hostx hostA /first Aug 23 15:42 202 you SSUSP low hostx hostB /second Aug 23 15:43bhosts -s LicenseA
RESOURCE TOTAL RESERVED LOCATION LicenseA 0 2.0 hostA hostB ...Memory Preemption Example
Configuration
This example uses
pre_mem
as the name of the preemption resource.lsf.shared
Add the resource to the Resource section.
Begin Resource RESOURCENAME TYPE INTERVAL INCREASING DESCRIPTION pre_mem Numeric 60 N (external memory usage reporter) ... End Resourcelsf.cluster.cluster_name
Add the resource to the "ResourceMap" section.
Begin ResourceMap RESOURCENAME LOCATION pre_mem ([hostA] [hostB] ... [hostX]) #List the hosts where you want memory preemption to occur. ... End ResourceMaplsb.params
Add the resource to the list of preemption resources.
... PREEMPTABLE_RESOURCES=pre_mem ...lsb.queues
Define a higher-priority queue to be the PREEMPTIVE queue by adding one line in the queue definition.
Begin Queue QUEUE_NAME=high PRIORITY=40 ... PREEMPTION=PREEMPTIVE DESCRIPTION=preempt jobs in lower-priority queues ... End QueueConfigure a job control action in a lower-priority queue, and let SIGTSTP be sent when the SUSPEND action is called. This assumes your application can catch the signal SIGTSTP and release (free) the resource (memory) it used, then suspend itself. You should also make sure that your application can catch the signal SIGCONT while it is suspended, and consume the resource (memory) again.
Begin Queue QUEUE_NAME=low PRIORITY=20 ... JOB_CONTROLS=SUSPEND[SIGTSTP] RESUME[SIGCONT] TERMINATE[SIGTERM] DESCRIPTION=jobs may be preempted by jobs in higher-priority queues ... End QueueELIM
This is an example of an ELIM that reports the current value of
pre_mem
. This ELIM starts on all the hosts that have thepre_mem
resource.#!/bin/sh host=`hostname` while : do lsload > /dev/null 2>&1 if [ $? != 0 ] ; then exit 1 fi memStr=`lsload -I mem -w $host|grep $host|awk '{print $3}'|sed 's/M//'` reportStr="1 ""pre_mem ""$memStr" echo "$reportStr \c" sleep 60 doneOperation
Check how many pre_mem resources are available
Check the number of
pre_mem
existing onhostA
by usingbhosts -s pre_mem
to display how much memory is available. In this example, 110 MB of memory is available onhostA
.bhosts -s pre_mem
RESOURCE TOTAL RESERVED LOCATION pre_mem 110 0.0 hostA pre_mem 50 0.0 hostB ...Using up some pre_mem resources
Submit 1 job to a low-priority queue to consume 100 MB
pre_mem
. Assume the applicationmem_app
consumes 100 MB memory after it starts.bsub -J first -q low -R "rusage[pre_mem=100:duration=2]" mem_app
After a while, the first job is running and the pre_mem is reduced.
bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 301 you RUN low hostx hostA /first Aug 23 16:42bhosts -s pre_mem
RESOURCE TOTAL RESERVED LOCATION pre_mem 10 100.0 hostA pre_mem 50 0.0 hostB ...Preempting the job for pre_mem resources
Submit a job to a high-priority queue to preempt a job from low-priority queue to get the resource
pre_mem
.bsub -J second -q high -R "rusage[pre_mem=100:duration=2]" mem_app
After a while, the second job is running and the first job was suspended.
bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 302 you RUN high hostx hostA /second Aug 23 16:48 301 you SSUSP low hostx hostA /first Aug 23 16:42bhosts -s pre_mem
RESOURCE TOTAL RESERVED LOCATION pre_mem 10 100.0 hostA pre_mem 50 0.0 hostB ...
Platform Computing Inc.
www.platform.com |
Knowledge Center Contents Previous Next Index |