Pre-execution and post-execution processing

The pre- and post-execution processing feature provides a way to run commands on the execution host prior to and after completion of LSF jobs. Use pre-execution commands to set up an execution host with the required directories, files, software licenses, environment, and user permissions. Use post-execution commands to define post-job processing such as cleaning up job files or transferring job output.

Contents

  • About pre- and post-execution processing

  • Scope

  • Configuration to enable pre- and post-execution processing

  • Pre- and post-execution processing behavior

  • Configuration to modify pre- and post-execution processing

  • Pre- and post-execution processing commands

About pre- and post-execution processing

You can use the pre- and post-execution processing feature to run commands before a batch job starts or after it finishes. Typical uses of this feature include the following:
  • Reserving resources such as tape drives and other devices not directly configurable in LSF

  • Making job-starting decisions in addition to those directly supported by LSF

  • Creating and deleting scratch directories for a job

  • Customizing scheduling based on the exit code of a pre-execution command

  • Checking availability of software licenses

  • Assigning jobs to run on specific processors on SMP machines

  • Transferring data files needed for job execution

  • Modifying system configuration files before and after job execution

  • Using a post-execution command to clean up a state left by the pre-execution command or the job

Pre-execution and post-execution commands can be defined at the queue, application, and job levels.

The command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).

When JOB_INCLUDE_POSTPROC is defined in an application profile, a job is considered in RUN state while the job is in post exec stage (which is DONE state for regular jobs). When the job is also resizable, job grow requests are ignored. However job shrink requests can be processed. For either case, LSF does not invoke the job resized notification command.

Default behavior (feature not enabled)

With pre- and post-execution processing enabled at the queue or application level

The following example illustrates how pre- and post-execution processing works for setting the environment prior to job execution and for transferring resulting files after the job runs.

Any executable command line can serve as a pre-execution or post-execution command. By default, the commands run under the same user account, environment, home directory, and working directory as the job. For parallel jobs, the commands run on the first execution host.

Scope


Applicability

Details

Operating system

  • UNIX

  • Windows

  • A mix of UNIX and Windows hosts

Dependencies

  • UNIX and Windows user accounts must be valid on all hosts in the cluster and must have the correct permissions to successfully run jobs.

  • On a Windows Server 2003, x64 Edition platform, users must have read and execute privileges for cmd.exe.

Limitations

  • Applies to batch jobs only (jobs submitted using the bsub command)


Configuration to enable pre- and post-execution processing

The pre- and post-execution processing feature is enabled by defining at least one of the parameters PRE_EXEC or POST_EXEC at the application or queue level, or by using the -E option of the bsub command to specify a pre-execution command. In some situations, specifying a queue-level or application-level pre-execution command can have advantages over requiring users to use bsub -E. For example, license checking can be set up at the queue or application level so that users do not have to enter a pre-execution command every time they submit a job.


Configuration file

Parameter and syntax

Behavior

lsb.queues

PRE_EXEC=command

  • Enables pre-execution processing at the queue level.

  • The pre-execution command runs on the execution host before the job starts.

  • If the PRE_EXEC command exits with a non-zero exit code, LSF requeues the job to the front of the queue.

  • The PRE_EXEC command uses the same environment variable values as the job.

POST_EXEC=command

  • Enables post-execution processing at the queue level.

  • The POST_EXEC command uses the same environment variable values as the job.

  • The post-execution command for the queue remains associated with the job. The original post-execution command runs even if the job is requeued or if the post-execution command for the queue is changed after job submission.

  • Before the post-execution command runs, LSB_JOBEXIT_STAT is set to the exit status of the job. The success or failure of the post-execution command has no effect on LSB_JOBEXIT_STAT.

  • The post-execution command runs after the job finishes, even if the job fails.

  • Specify the environment variable $USER_POSTEXEC to allow UNIX users to define their own post-execution commands.

lsb.applications

PRE_EXEC=command

  • Enables pre-execution processing at the application level.

  • The pre-execution command runs on the execution host before the job starts.

  • If the PRE_EXEC command exits with a non-zero exit code, LSF requeues the job to the front of the queue.

  • The PRE_EXEC command uses the same environment variable values as the job.

POST_EXEC=command

  • Enables post-execution processing at the application level.

  • The POST_EXEC command uses the same environment variable values as the job.

  • The post-execution command for the application profile remains associated with the job. The original post-execution command runs even if the job is moved to a different application profile or is requeued, or if the post-execution command for the original application profile is changed after job submission.

  • Before the post-execution command runs, LSB_JOBEXIT_STAT is set to the exit status of the job. The success or failure of the post-execution command has no effect on LSB_JOBEXIT_STAT.

  • The post-execution command runs after the job finishes, even if the job fails.

  • Specify the environment variable $USER_POSTEXEC to allow UNIX users to define their own post-execution commands.


Pre- and post-execution processing behavior

Pre- and post-execution processing applies to both UNIX and Windows hosts.

Host type

Environment

UNIX

  • The pre- and post-execution commands run in the /tmp directory under /bin/sh -c, which allows the use of shell features in the commands. The following example shows valid configuration lines: PRE_EXEC= /usr/share/lsf/misc/testq_pre >> /tmp/pre.out POST_EXEC= /usr/share/lsf/misc/testq_post | grep -v "Testing..."

  • LSF sets the PATH environment variable to PATH='/bin /usr/bin /sbin /usr/sbin'

  • The stdin, stdout, and stderr are set to /dev/null

Windows

  • The pre- and post-execution commands run under cmd.exe /c

  • The standard input, standard output, and standard error are set to NULL

  • The PATH is determined by the setup of the LSF Service


Note:

If the pre-execution or post-execution command is not in your usual execution path, you must specify the full path name of the command.

Order of command execution

Pre-execution commands run in the following order:
  1. The queue-level command

  2. The application-level or job-level command. If you specify a command at both the application and job levels, the job-level command overrides the application-level command; the application-level command is ignored.


If a pre-execution command is specified at the …

Then the commands execute in the order of …

Queue, application, and job levels

  1. Queue level

  2. Job level

Queue and application levels

  1. Queue level

  2. Application level

Queue and job levels

  1. Queue level

  2. Job level

Application and job levels

  1. Job level


Post-execution commands run in the following order:
  1. The application-level command

  2. The queue-level command

  3. The job-level command

If both application-level (POST_EXEC in lsb.applications) and job-level post-execution commands are specified, job level post-execution overrides application-level post-execution commands.

If a post-execution command is specified at the …

Then the commands execute in the order of …

Queue, application, and job levels

  1. Job level

  2. Queue level

Queue and application levels

  1. Application level

  2. Queue level

Queue and job levels

  1. Job level

  2. Queue level


Pre-execution command behavior

A pre-execution command returns information to LSF by means of the exit status. LSF holds the job in the queue until the specified pre-execution command returns an exit code of zero (0). If the pre-execution command exits with a non-zero value, the job pends until LSF tries again to dispatch it. While the job remains in the PEND state, LSF dispatches other jobs to the execution host.

If the pre-execution command exits with a value of 99, the job exits without pending. This allows you to cancel the job if the pre-execution command fails.

You must ensure that the pre-execution command runs without side effects; that is, you should define a pre-execution command that does not interfere with the job itself. For example, if you use the pre-execution command to reserve a resource, you cannot also reserve the same resource as part of the job submission.

LSF users can specify a pre-execution command at job submission. LSF first finds a suitable host on which to run the job and then runs the pre-execution command on that host. If the pre-execution command runs successfully and returns an exit code of zero, LSF runs the job.

Post-execution command behavior

A post-execution command runs after the job finishes, regardless of the exit state of the job. Once a post-execution command is associated with a job, that command runs even if the job fails. You cannot configure the post-execution command to run only under certain conditions.

The resource usage of post-execution processing is not included in the job resource usage calculation, and post-execution command exit codes are not reported to LSF.

If POST_EXEC=$USER_POSTEXEC in either lsb.applications or lsb.queues, UNIX users can define their own post-execution commands:
setenv USER_POSTEXEC /path_name
where the path name for the post-execution command is an absolute path.

If POST_EXEC=$USER_POSTEXEC and …

Then …

The user defines the USER_POSTEXEC environment variable

  • LSF runs the post-execution command defined by the environment variable USER_POSTEXEC

  • After the user-defined command runs, LSF reports successful completion of post-execution processing

  • If the user-defined command fails, LSF reports a failure of post-execution processing

The user does not define the USER_POSTEXEC environment variable

  • LSF reports successful post-execution processing without actually running a post-execution command


Important:

Do not allow users to specify a post-execution command when the pre- and post-execution commands are set to run under the root account.

Configuration to modify pre- and post-execution processing

Configuration parameters modify various aspects of pre- and post-execution processing behavior by:
  • Preventing a new job from starting until post-execution processing has finished

  • Controlling the length of time post-execution processing can run

  • Specifying a user account under which the pre- and post-execution commands run

  • Controlling how many times pre-execution retries

Configuration to modify when new jobs can start

When a job finishes, sbatchd reports a job finish status of DONE or EXIT to mbatchd. This causes LSF to release resources associated with the job, allowing new jobs to start on the execution host before post-execution processing from a previous job has finished.

In some cases, you might want to prevent the overlap of a new job with post-execution processing. Preventing a new job from starting prior to completion of post-execution processing can be configured at the application level or at the job level.

At the job level, the bsub -w option allows you to specify job dependencies; the keywords post_done and post_err cause LSF to wait for completion of post-execution processing before starting another job.

At the application level:

File

Parameter and syntax

Description

lsb.applications

lsb.params

JOB_INCLUDE_POSTPROC=Y

  • Enables completion of post-execution processing before LSF reports a job finish status of DONE or EXIT

  • Prevents a new job from starting on a host until post-execution processing is finished on that host


  • sbatchd sends both job finish status (DONE or EXIT) and post-execution processing status (POST_DONE or POST_ERR) to mbatchd at the same time

  • The job remains in the RUN state and holds its job slot until post-execution processing has finished

  • Job requeue happens (if required) after completion of post-execution processing, not when the job itself finishes

  • For job history and job accounting, the job CPU and run times include the post-execution processing CPU and run times

  • The job control commands bstop, bkill, and bresume have no effect during post-execution processing

  • If a host becomes unavailable during post-execution processing for a rerunnable job, mbatchd sees the job as still in the RUN state and reruns the job

  • LSF does not preempt jobs during post-execution processing

Configuration to modify the post-execution processing time

Controlling the length of time post-execution processing can run is configured at the application level.

File

Parameter and syntax

Description

lsb.applications

lsb.params

JOB_POSTPROC_TIMEOUT=minutes

  • Specifies the length of time, in minutes, that post-execution processing can run.

  • The specified value must be greater than zero.

  • If post-execution processing takes longer than the specified value, sbatchd reports post-execution failure—a status of POST_ERR—and kills the process group of the job’s post-execution processes. This kills the parent process only.

  • If JOB_INCLUDE_POSTPROC=Y and sbatchd kills the post-execution process group, post-execution processing CPU time is set to zero, and the job’s CPU time does not include post-execution CPU time.


Configuration to modify the pre- and post-execution processing user account

Specifying a user account under which the pre- and post-execution commands run is configured at the system level. By default, both the pre- and post-execution commands run under the account of the user who submits the job.

File

Parameter and syntax

Description

lsf.sudoers

LSB_PRE_POST_EXEC_USER=user_name

  • Specifies the user account under which pre- and post-execution commands run (UNIX only)

  • This parameter applies only to pre- and post-execution commands configured at the application and queue levels; pre-execution commands defined at the job level with bsub -E run under the account of the user who submits the job

  • If the pre-execution or post-execution commands perform privileged operations that require root permissions on UNIX hosts, specify a value of root

  • You must edit the lsf.sudoers file on all UNIX hosts within the cluster and specify the same user account


Configuration to control how many times pre-execution retries

By default, if job pre-execution fails, LSF retries the job automatically. The job remains in the queue and pre-execution is retried 5 times by default, to minimize any impact to performance and throughput.

Limiting the number of times LSF retries job pre-execution is configured cluster-wide (lsb.params), at the queue level (lsb.queues), and at the application level (lsb.applications). pre-execution retry in lsb.applications overrides lsb.queues, and lsb.queues overrides lsb.params configuration.


Configuration file

Parameter and syntax

Behavior

lsb.params

LOCAL_MAX_PREEXEC_RETRY=integer

  • Controls the maximum number of times to attempt the pre-execution command of a job on the local cluster.

  • Specify an integer greater than 0

    By default, the number of retries is unlimited.

MAX_PREEXEC_RETRY=integer

  • Controls the maximum number of times to attempt the pre-execution command of a job on the remote cluster.

  • Specify an integer greater than 0

    By default, the number of retries is 5.

REMOTE_MAX_PREEXEC_RETRY=integer

  • Controls the maximum number of times to attempt the pre-execution command of a job on the remote cluster.

    Equivalent to MAX_PREEXEC_RETRY

  • Specify an integer greater than 0

    By default, the number of retries is 5.

lsb.queues

LOCAL_MAX_PREEXEC_RETRY=integer

  • Controls the maximum number of times to attempt the pre-execution command of a job on the local cluster.

  • Specify an integer greater than 0

    By default, the number of retries is unlimited.

MAX_PREEXEC_RETRY=integer

  • Controls the maximum number of times to attempt the pre-execution command of a job on the remote cluster.

  • Specify an integer greater than 0

    By default, the number of retries is 5.

REMOTE_MAX_PREEXEC_RETRY=integer

  • Controls the maximum number of times to attempt the pre-execution command of a job on the remote cluster.

    Equivalent to MAX_PREEXEC_RETRY

  • Specify an integer greater than 0

    By default, the number of retries is 5.

lsb.applications

LOCAL_MAX_PREEXEC_RETRY=integer

  • Controls the maximum number of times to attempt the pre-execution command of a job on the local cluster.

  • Specify an integer greater than 0

    By default, the number of retries is unlimited.

MAX_PREEXEC_RETRY=integer

  • Controls the maximum number of times to attempt the pre-execution command of a job on the remote cluster.

  • Specify an integer greater than 0

    By default, the number of retries is 5.

REMOTE_MAX_PREEXEC_RETRY=integer

  • Controls the maximum number of times to attempt the pre-execution command of a job on the remote cluster.

    Equivalent to MAX_PREEXEC_RETRY

  • Specify an integer greater than 0

    By default, the number of retries is 5.


When pre-execution retry is configured, if a job pre-execution fails and exits with non-zero value, the number of pre-exec retries is set to 1. When the pre-exec retry limit is reached, the job is suspended with PSUSP status.

The number of times that pre-execution is retried includes queue-level, application-level, and job-level pre-execution command specifications. When pre-execution retry is configured, a job will be suspended when the sum of its queue-level pre-exec retry times + application-level pre-exec retry times is greater than the value of the pre-execution retry parameter or if the sum of its queue-level pre-exec retry times + job-level pre-exec retry times is greater than the value of the pre-execution retry parameter.

The pre-execution retry limit is recovered when LSF is restarted and reconfigured. LSF replays the pre-execution retry limit in the PRE_EXEC_START or JOB_STATUS events in lsb.events.

Pre- and post-execution processing commands

Commands for submission

The bsub -E option specifies a pre-execution command. Post-execution commands cannot be specified using bsub; post-execution processing can only be defined at the queue and application levels.

The bsub -w option allows you to specify job dependencies that cause LSF to wait for completion of post-execution processing before starting another job.


Command

Description

bsub -E command

  • Defines the pre-execution command at the job level.

bsub -w 'post_done(job_id | "job_name")'

  • Specifies the job dependency condition required to prevent a new job from starting on a host until post-execution processing on that host has finished without errors.

bsub -w 'post_err(job_id | "job_name")'

  • Specifies the job dependency condition required to prevent a new job from starting on a host until post-execution processing on that host has exited with errors.


Commands to monitor


Command

Description

bhist

bhist -l

  • Displays the host status of all hosts, specific hosts, or specific host groups, including the POST_DONE and POST_ERR states, if the user submitted the job with the -w option of bsub.

  • The CPU and run times shown do not include resource usage for post-execution processing unless the parameter JOB_INCLUDE_POSTPROC is defined in lsb.applications or lsb.params.

  • Displays the job exit code and reason if the pre-exec retry limit is exceeded.

bjobs -l

  • Displays information about pending, running, and suspended jobs. During post-execution processing, the job status will be RUN if the parameter JOB_INCLUDE_POSTPROC is defined in lsb.applications or lsb.params.

  • The resource usage shown does not include resource usage for post-execution processing.

  • Displays the job exit code and reason if the pre-exec retry limit is exceeded.

bacct

  • Displays accounting statistics for finished jobs.

  • The CPU and run times shown do not include resource usage for post-execution processing, unless the parameter JOB_INCLUDE_POSTPROC is defined in lsb.applications or lsb.params.


Commands to control


Command

Description

bmod -E command

  • Changes the pre-execution command at the job level.

bmod -w 'post_done(job_id | "job_name")'

  • Specifies the job dependency condition required to prevent a new job from starting on a host until post-execution processing on that host has finished without errors.

bmod -w 'post_err(job_id | "job_name")'

  • Specifies the job dependency condition required to prevent a new job from starting on a host until post-execution processing on that host has exited with errors.


Commands to display configuration


Command

Description

bapp -l

  • Displays information about application profiles configured in lsb.applications, including the values defined for PRE_EXEC, POST_EXEC, JOB_INCLUDE_POSTPROC, JOB_POSTPROC_TIMEOUT, LOCAL_MAX_PREEXEC_RETRY, MAX_PREEXEC_RETRY, and REMOTE_MAX_PREEXEC_RETRY.

bparams

  • Displays the value of parameters defined in lsb.params, including the values defined for LOCAL_MAX_PREEXEC_RETRY, MAX_PREEXEC_RETRY, and REMOTE_MAX_PREEXEC_RETRY.

bqueues -l

  • Displays information about queues configured in lsb.queues, including the values defined for PRE_EXEC and POST_EXEC, LOCAL_MAX_PREEXEC_RETRY, MAX_PREEXEC_RETRY, and REMOTE_MAX_PREEXEC_RETRY.


Use a text editor to view the lsf.sudoers configuration file.