Knowledge Center Contents Previous Next Index |
Pre-Execution and Post-Execution Commands
Jobs can be submitted with optional pre- and post-execution commands. A pre- or post-execution command is an arbitrary command to run before the job starts or after the job finishes. Pre- and post-execution commands are executed in a separate environment from the job.
Contents
About Pre-Execution and Post-Execution Commands
Each batch job can be submitted with optional pre- and post-execution commands. Pre- and post-execution commands can be any executable command lines to be run before a job is started or after a job finishes.
Some batch jobs require resources that LSF does not directly support. For example, appropriate pre- and/or post-execution commands can be used to handle various situations:
- Reserving devices like tape drives
- Creating and deleting scratch directories for a job
- Customized scheduling
- Checking availability of software licenses
- Assigning jobs to run on specific processors on SMP machines
- Transferring data files needed for job processing
- Modifying system configuration files before and after running a job
By default, the pre- and post-execution commands are run under the same user ID, environment, and home and working directories as the batch job. If the command is not in your normal execution path, the full path name of the command must be specified.
For parallel jobs, the command is run on the first selected host.
The command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.
Pre-execution commands
Pre-execution commands support job starting decisions which cannot be configured directly in LSF. LSF supports job-level, queue-level, and application-level (
lsb.applications
) pre-execution.The pre-execution command returns information to LSF using its exit status. When a pre-execution command is specified, the job is held in the queue until the specified pre-execution command returns exit status zero (0).
If the pre-execution command exits with non-zero status, the batch job is not dispatched. The job goes back to the PEND state, and LSF tries to dispatch another job to that host. While the job is pending, other jobs can proceed ahead of the waiting job. The next time LSF tries to dispatch jobs this process is repeated.
If the pre-execution command exits with a value of 99, the job does not go back to the PEND state, it exits. This gives you flexibility to abort the job if the pre-execution command fails.
LSF assumes that the pre-execution command runs without side effects. For example, if the pre-execution command reserves a software license or other resource, you must not reserve the same resource more than once for the same batch job.
Post-execution commands
If a post-execution command is specified, then the command is run after the job is finished regardless of the exit state of the job.
Post-execution commands are typically used to clean up some state left by the pre-execution and the job execution. LSF supports job-level, queue-level, and application-level (
lsb.applications
) post-execution.Job-level commands
The
bsub -E
option specifies an arbitrary command to run before starting the batch job. When LSF finds a suitable host on which to run a job, the pre-execution command is executed on that host. If the pre-execution command runs successfully, the batch job is started.The
bsub -Ep
option specifies job-level post-execution commands to run on the execution host after the job finishes.Queue-level and application-level commands
In some situations (for example, license checking), it is better to specify a queue-level or application-level pre-execution command instead of requiring every job be submitted with the
-E
option ofbsub
.Queue-level pre-execution commands run
before
application-level pre-execution commands. Job level pre-execution commands (bsub -E
) override application-level pre-execution commands.Application level pre-execution commands run on the execution host before the job associated with the application profile is dispatched on an execution host.
When a job finishes, the application-level post-execution commands run, followed by queue-level post-execution commands if any.
Application-level post-execution commands run on the execution host after the job associated with the application profile has finished running on the execution host. They also run if the PRE_EXEC command exits with a 0 exit status, but the job execution environment failed to be set up.
Post-execution job states
Some jobs may not be considered complete until some post-job processing is performed. For example, a job may need to exit from a post-execution job script, clean up job files, or transfer job output after the job completes.
By default, the DONE or EXIT job states do not indicate whether post-processing is complete, so jobs that depend on processing may start prematurely. Use the
post_done
andpost_err
keywords on thebsub -w
command to specify job dependency conditions for job post-processing. The corresponding job states POST_DONE and POST_ERR indicate the state of the post-processing.The
bhist
command displays the POST_DONE and POST_ERR states. The resource usage of post-processing is not included in the job resource usage.After the job completes, you cannot perform any job control on the post-processing. Post-processing exit codes are not reported to LSF.
Configuring Pre- and Post-Execution Commands
Pre-execution commands can be configured at the job level, in queues, or in application profiles.
Post-execution commands can be configured at the job level, in queues or in application profiles.
Order of command execution
Pre-execution commands run in the following order:
- The queue-level command
- The application-level or job-level command. If you specify a command at both the application and job levels, the job-level command overrides the application-level command; the application-level command is ignored.
Post-execution commands run in the following order:
- The application-level or job-level command. If you specify a command at both the application and job levels, the job-level command overrides the application-level command; the application-level command is ignored.
- The queue-level command
If both application-level (
POST_EXEC
inlsb.applications
) and job-level post-execution commands are specified, job level post-execution overrides application-level post-execution commands.
If a post-execution command is specified at the ... Then the commands execute in the order of ... Queue, application, and job levels Queue and application levels Queue and job levels
Job-level commands
Job-level pre-execution and post-execution commands require no configuration. Use the
bsub -E
option to specify an arbitrary command to run before the job starts. Use thebsub -Ep
option to specify an arbitrary command to run after the job finishes running.Example
The following example shows a batch job that requires a tape drive. The user program
tapeCheck
exits with status zero if the specified tape drive is ready:
bsub -E "/usr/share/bin/tapeCheck /dev/rmt01" myJob
Queue-level commands
Use the PRE_EXEC and POST_EXEC keywords in the queue definition (
lsb.queues
) to specify pre- and post-execution commands.The following points should be considered when setting up pre- and post-execution commands at the queue level:
- If the pre-execution command exits with a non-zero exit code, then it is considered to have failed and the job is requeued to the head of the queue. This feature can be used to implement customized scheduling by having the pre-execution command fail if conditions for dispatching the job are not met.
- Other environment variables set for the job are also set for the pre- and post-execution commands.
- When a job is dispatched from a queue which has a post-execution command, LSF remembers the post-execution command defined for the queue from which the job is dispatched. If the job is later switched to another queue or the post-execution command of the queue is changed, LSF still runs the original post-execution command for this job.
- When the post-execution command is run, the environment variable, LSB_JOBEXIT_STAT, is set to the exit status of the job. See the man page for the
wait
(2) command for the format of this exit status.- The post-execution command is also run if a job is requeued because the job's execution environment fails to be set up, or if the job exits with one of the queue's REQUEUE_EXIT_VALUES. The LSB_JOBPEND environment variable is set if the job is requeued. If the job's execution environment could not be set up, LSB_JOBEXIT_STAT is set to 0.
- Running of post-execution commands upon restart of a rerunnable job may not always be desirable; for example, if the post-exec removes certain files, or does other cleanup that should only happen if the job finishes successfully. Use LSB_DISABLE_RERUN_POST_EXEC=Y in
lsf.conf
to prevent the post-exec from running when a job is rerun.- If both queue and job-level pre-execution commands are specified, the job-level pre-execution is run after the queue-level pre-execution command.
- If both application-level and job-level post-execution commands are specified, job level post-execution overrides application-level post-execution commands. Queue-level post-execution commands run after application-level post-execution and job-level post-execution commands
Example
The following queue specifies the pre-execution command
/usr/share/lsf/pri_prexec
and the post-execution command/usr/share/lsf/pri_postexec
.Begin Queue QUEUE_NAME = priority PRIORITY = 43 NICE = 10PRE_EXEC = /usr/share/lsf/pri_prexec
POST_EXEC = /usr/share/lsf/pri_postexec
End QueueSee the
lsb.queues
template file for additional queue examples.Application-level commands
Use the PRE_EXEC and POST_EXEC keywords in the application profile definition (
lsb.applications
) to specify pre- and post-execution commands.The following points should be considered when setting up pre- and post-execution commands at the application level:
- When a job finishes, the application-level post-execution commands run, followed by queue-level post-execution commands if any.
- Environment variables set for the job are also set for the pre- and post-execution commands.
- Queue-level pre-execution commands run
before
application-level pre-execution commands. Job level pre-execution commands (bsub -E
) override application-level pre-execution commands.- When a job is submitted to an application profile that has a pre-execution command, the system will remember the post-execution command defined for the application profile from which the job is dispatched. If the job is later moved to another application profile or the post-execution command of the application profile is changed, the original post-execution command will be run.
- When the post-execution command is run, the environment variable LSB_JOBEXIT_STAT is set to the exit status of the job. Refer to the man page for
wait
(2) for the format of this exit status.- The post-execution command is also run if a job is requeued because the job's execution environment fails to be set up or if the job exits with one of the application profile's REQUEUE_EXIT_VALUES. The environment variable LSB_JOBPEND is set if the job is requeued. If the job's execution environment could not be set up, LSB_JOBEXIT_STAT is set to 0 (zero).
- If the pre-execution command exits with a non-zero exit code, it is considered to have failed, and the job is requeued to the head of the queue. Use this feature to implement customized scheduling by having the pre-execution command fail if conditions for dispatching the job are not met.
Example
Begin Application NAME = catia DESCRIPTION = CATIA V5 CPULIMIT = 24:0/hostA # 24 hours of host hostA FILELIMIT = 20000 DATALIMIT = 20000 # jobs data segment limit CORELIMIT = 20000 PROCLIMIT = 5 # job processor limitPRE_EXEC = /usr/share/lsf/catia_prexec
POST_EXEC = /usr/share/lsf/catia_postexec
REQUEUE_EXIT_VALUES = 55 34 78 End ApplicationSee the
lsb.applications
template file for additional application profile examples.Pre- and post-execution on UNIX and Linux
The entire contents of the configuration line of the pre- and post-execution commands are run under
/bin/sh -c
, so shell features can be used in the command.For example, the following is valid:
PRE_EXEC = /usr/share/lsf/misc/testq_pre >> /tmp/pre.out POST_EXEC = /usr/share/lsf/misc/testq_post | grep -v "Hey!"The pre- and post-execution commands are run in
/tmp
.Standard input and standard output and error are set to
/dev/null
. The output from the pre- and post-execution commands can be explicitly redirected to a file for debugging purposes.The PATH environment variable is set to:
PATH='/bin /usr/bin /sbin /usr/sbin'Pre- and post-execution on Windows
The pre- and post-execution commands are run under
cmd.exe /c
.
note:
For pre- and post-execution commands that execute on a Windows Server 2003, x64 Edition platform, users must have "Read" and "Execute" privileges forcmd.exe
.Standard input and standard output and error are set to NULL. The output from the pre- and post-execution commands can be explicitly redirected to a file for debugging purposes.
Setting a pre- and post-execution user ID
By default, both the pre- and post-execution commands are run as the job submission user. Use the LSB_PRE_POST_EXEC_USER parameter in
lsf.sudoers
to specify a different user ID for queue-level and application-level pre- and post-execution commands.Example
If the pre- or post-execution commands perform privileged operations that require root permission, specify:
LSB_PRE_POST_EXEC_USER=rootSee the
Platform LSF Configuration Reference
for information about thelsf.sudoers
file.Including job post-execution in job finish status reporting
By default, LSF releases resources for a job as soon as the job is finished and when
sbatchd
reports job finish status (DONE or EXIT) tombatchd
. Post-execution processing is not considered part of job processing. This makes it possible for a new job to be started before post-execution processing for a previous job is complete.There are situations where you do not want the first job's post-execution affecting the second job's execution. Or the execution of a second job might crucially depend on the completion of post-execution of the previous job.
In other cases, you may want to include job post-execution in job accounting processes, or if the post-exacerbation is CPU intensive, you might not want a second job running at the same time. Finally, system configuration required by the original job may be changed or removed by a new job, which could prevent the first job from finishing normally.
To enable all associated processing to complete before LSF reports job finish status, configure JOB_INCLUDE_POSTPROC=Y in an application profile in
lsb.applications
or cluster wide inlsb.params
.When JOB_INCLUDE_POSTPROC is set:
sbatchd
sends both job finish status (DONE or EXIT) and post-execution status (POST_DONE or POST_ERR) tombatchd
at the same time- The job remains in RUN state and holds its job slot until the job post-execution processing has finished
- Jobs can now depend on the completion of post-execution processing
bjobs
,bhist
, andbacct
will show the same time for both job finish and post-execution finish- Job requeue will happen after post-execution processing, not when the job finishes
For job history and job accounting, the job CPU time and run time will also include the post-execution CPU time and run time.
Limitations and side-effects
Job query commands (
bjobs
,bhist
) show that the job remains in RUN state until the post-execution processing is finished, even though the job itself has finished. job control commands (bstop
,bkill
,bresume
) will have no effect.Rerunnable jobs may rerun after they have actually finished because the host became unavailable before post-execution processing finished, but the mbatchd considers the job still in RUN state.
Job preemption is delayed until post-execution processing is finished.
Post-execution on SGI cpusets
Post-execution processing on SGI cpusets behave differently from previous releases. If JOB_INCLUDE_POSTPROC=Y is specified in
lsb.applications
or cluster wide inlsb.params
, post-execution processing is not attached to the job cpuset, and Platform LSF does not release the cpuset until post-execution processing has finished.Preventing job overlap on hosts
You can use JOB_INCLUDE_POSTPROC to ensure that there is no execution overlap among running jobs. For example, you may have pre-execution processing to create a user execution environment at the desktop (mount a disc for the user, create rlogin permissions, etc.) Then you configure post-execution processing to clean up the user execution environment set by the pre-exec.
If the post-execution for one job is still running when a second job is dispatched, pre-execution processing that sets up the user environment for the next job may not be able to run correctly because the previous job's environment has not yet been cleaned up by its post-exec.
You should configure jobs to run exclusively to prevent the actual jobs from not overlapping, but in this case, you also need to configure post-execution to be included in job finish status reporting.
Setting a post-execution timeout
Configure JOB_POSTPROC_TIMEOUT in an application profile in
lsb.applications
or cluster wide inlsb.params
to control how long post-execution processing is allowed to run.JOB_POSTPROC_TIMEOUT specifies a timeout in minutes for job post-execution processing. If post-execution processing takes longer than the timeout,
sbatchd
reports the post-execution has failed (POST_ERR status), and kills the process group of the job's post-execution processes.The specified timeout must be greater than zero.
If JOB_INCLUDE_POSTPROC is enabled in the application profile or cluster wide in
lsb.params
, andsbatchd
kills the post-execution processes because the timeout has been reached, the CPU time of the post-execution processing is set to 0, and the job CPU time will not include the CPU time of the post-execution processing.Controlling how many times pre-execution commands are retried
By default, if job pre-execution fails, LSF retries the job automatically.
Configure MAX_PREEXEC_RETRY, LOCAL_MAX_PREEXEC_RETRY, or REMOTE_MAX_PREEXEC_RETRY to limit the number of times LSF retries job pre-execution. Pre-execution retry is configured cluster-wide (
lsb.params
), at the queue level (lsb.queues
), and at the application level (lsb.applications
). Pre-execution retry configured inlsb.applications
overrideslsb.queues
, andlsb.queues
overrideslsb.params
configuration.
Platform Computing Inc.
www.platform.com |
Knowledge Center Contents Previous Next Index |