Configuration to enable job submission and execution controls

This feature is enabled by the presence of at least one esub or one eexec executable in the directory specified by the parameter LSF_SERVERDIR in lsf.conf. LSF does not include a default esub or eexec; you should write your own executables to meet the job requirements of your site.


Executable file

UNIX naming convention

Windows naming convention

esub

LSF_SERVERDIR/esub.application

LSF_SERVERDIR\esub.application.exe

LSF_SERVERDIR\esub.application.bat

eexec

LSF_SERVERDIR/eexec

LSF_SERVERDIR\eexec.exe

LSF_SERVERDIR\eexec.bat


The name of your esub should indicate the application with which it runs. For example: esub.fluent.

Restriction:

The name esub.user is reserved. Do not use the name esub.user for an application-specific esub.

Valid file names contain only alphanumeric characters, underscores (_), and hyphens (-).

Once the LSF_SERVERDIR contains one or more esub executables, users can specify the esub executables associated with each job they submit. If an eexec exists in LSF_SERVERDIR, LSF invokes that eexec for all jobs submitted to the cluster.

The following esub executables are provided as separate packages, available from Platform Computing Inc. upon request:
  • esub.openmpi : OpenMPI job submission

  • esub.pvm: PVM job submission

  • esub.poe : POE job submission

  • esub.ls_dyna : LS-Dyna job submission

  • esub.fluent : FLUENT job submission

  • esub.afs or esub.dce: AFS or DCE security

  • esub.lammpi LAM/MPI job submission

  • esub.mpich_gm : MPICH-GM job submission

  • esub.intelmpi: Intel® MPI job submission

  • esub.bproc: Beowulf Distributed Process Space (BProc) job submission

  • esub.mpich2: MPICH2 job submission

  • esub.mpichp4: MPICH-P4 job submission

  • esub.mvapich: MVAPICH job submission

  • esub.tv, esub.tvlammpi, esub.tvmpich_gm, esub.tvpoe: TotalView® debugging for various MPI applications.

Environment variables used by esub

When you write an esub, you can use the following environment variables provided by LSF for the esub execution environment:

LSB_SUB_PARM_FILE

Points to a temporary file that LSF uses to store the bsub options entered in the command line. An esub reads this file at job submission and either accepts the values, changes the values, or rejects the job. Job submission options are stored as name-value pairs on separate lines with the format option_name=value.

For example, if a user submits the following job,

bsub -q normal -x -P myproject -R "r1m rusage[mem=100]" -n 90 myjob

The LSB_SUB_PARM_FILE contains the following lines:
LSB_SUB_QUEUE="normal"
LSB_SUB_EXLUSIVE=Y
LSB_SUB_RES_REQ="r1m usage[mem=100]"
LSB_SUB_PROJECT_NAME="myproject"
LSB_SUB_COMMAND_LINE="myjob"
LSB_SUB_NUM_PROCESSORS=90
LSB_SUB_MAX_NUM_PROCESSORS=90

An esub can change any or all of the job options by writing to the file specified by the environment variable LSB_SUB_MODIFY_FILE.

The temporary file pointed to by LSB_SUB_PARM_FILE stores the following information:

Option

bsub or bmod option

data type

Description

LSB_SUB_ADDITIONAL

-a

string

String that contains the application name or names of the esub executables requested by the user.
Restriction:

This is the only option that an esub cannot change or add at job submission.

LSB_SUB_BEGIN_TIME

-b

integer

Begin time, in seconds since 00:00:00 GMT, Jan. 1, 1970

LSB_SUB_CHKPNT_DIR

-k

string

Checkpoint directory

The file path of the checkpoint directory can contain up to 4000 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.

LSB_SUB_COMMAND_LINE

bsub job command argument

string

LSB_SUB_COMMANDNAME must be set in lsf.conf to enable esub to use this variable.

LSB_SUB_CHKPNT_PERIOD

-k

integer

Checkpoint period

LSB_SUB_DEPEND_COND

-w

string

Dependency condition

LSB_SUB_ERR_FILE

-e, -eo

string

Standard error file name

LSB_SUB_EXCLUSIVE

-x

boolean

Exclusive execution, specified by "Y"

LSB_SUB_EXTSCHED_PARAM

-ext

string

External scheduler options

LSB_SUB_HOLD

-H

boolean

Hold job

LSB_SUB_HOST_SPEC

-c or -w

string

Host specifier, limits the CPU time or RUN time.

LSB_SUB_HOSTS

-m

string

List of requested execution host names

LSB_SUB_IN_FILE

-i, -io

string

Standard input file name

LSB_SUB_INTERACTIVE

-I

boolean

Interactive job, specified by "Y"

LSB_SUB_LOGIN_SHELL

-L

string

Login shell

LSB_SUB_JOB_DESCRIPTION

-Jd

string

Job description

LSB_SUB_JOB_NAME

-J

string

Job name

LSB_SUB_JOB_WARNING_ACTION

-wa

string

Job warning action

LSB_SUB_JOB_ACTION_WARNING_TIME

-wt

integer

Job warning time period

LSB_SUB_MAIL_USER

-u

string

Email address to which LSF sends job-related messages

LSB_SUB_MAX_NUM_PROCESSORS

-n

integer

Maximum number of processors requested

LSB_SUB_MODIFY

bmod

boolean

Indicates that bmod invoked esub, specified by "Y".

LSB_SUB_MODIFY_ONCE

bmod

boolean

Indicates that the job options specified at job submission have already been modified by bmod, and that bmod is invoking esub again, specified by "Y".

LSB_SUB_NOTIFY_BEGIN

-B

boolean

LSF sends an email notification when the job begins, specified by "Y".

LSB_SUB_NOTIFY_END

-N

boolean

LSF sends an email notification when the job ends, specified by "Y".

LSB_SUB_NUM_PROCESSORS

-n

integer

Minimum number of processors requested.

LSB_SUB_OTHER_FILES

bmod -f

integer

Indicates the number of files to be transferred. The value is SUB_RESET if bmod is being used to reset the number of files to be transferred.

The file path of the directory can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the director and file name.

LSB_SUB_OTHER_FILES_number

bsub -f

integer

The number indicates the particular file transfer value in the specified file transfer expression.

For example, for bsub -f "a > b" -f "c < d", the following would be defined:

LSB_SUB_OTHER_FILES=2

LSB_SUB_OTHER_FILES_0="a > b"

LSB_SUB_OTHER_FILES_1="c < d"

LSB_SUB_OUT_FILE

-o, -oo

string

Standard output file name.

LSB_SUB_PRE_EXEC

-E

string

Pre-execution command.

The file path of the directory can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.

LSB_SUB_PROJECT_NAME

-P

string

Project name.

LSB_SUB_PTY

-Ip

boolean

An interactive job with PTY support, specified by "Y"

LSB_SUB_PTY_SHELL

-Is

boolean

An interactive job with PTY shell support, specified by "Y"

LSB_SUB_QUEUE

-q

string

Submission queue name

LSB_SUB_RERUNNABLE

-r

boolean

"Y" specifies a rerunnable job

"N" specifies a nonrerunnable job (specified with bsub -rn). The job is not rerunnable even it was submitted to a rerunable queue or application profile

For bmod -rn, the value is SUB_RESET.

LSB_SUB_RES_REQ

-R

string

Resource requirement string—does not support multiple resource requirement strings

LSB_SUB_RESTART

brestart

boolean

"Y" indicates to esub that the job options are associated with a restarted job.

LSB_SUB_RESTART_FORCE

brestart -f

boolean

"Y" indicates to esub that the job options are associated with a forced restarted job.

LSB_SUB_RLIMIT_CORE

-C

integer

Core file size limit

LSB_SUB_RLIMIT_CPU

-c

integer

CPU limit

LSB_SUB_RLIMIT_DATA

-D

integer

Data size limit

For AIX, if the XPG_SUS_ENV=ON environment variable is set in the user's environment before the process is executed and a process attempts to set the limit lower than current usage, the operation fails with errno set to EINVAL. If the XPG_SUS_ENV environment variable is not set, the operation fails with errno set to EFAULT.

LSB_SUB_RLIMIT_FSIZE

-F

integer

File size limit

LSB_SUB_RLIMIT_PROCESS

-p

integer

Process limit

LSB_SUB_RLIMIT_RSS

-M

integer

Resident size limit

LSB_SUB_RLIMIT_RUN

-W

integer

Wall-clock run limit in seconds. (Note this is not in minutes, unlike the run limit specified by bsub -W)

LSB_SUB_RLIMIT_STACK

-S

integer

Stack size limit

LSB_SUB_RLIMIT_THREAD

-T

integer

Thread limit

LSB_SUB_TERM_TIME

-t

integer

Termination time, in seconds, since 00:00:00 GMT, Jan. 1, 1970

LSB_SUB_TIME_EVENT

-wt

string

Time event expression

LSB_SUB_USER_GROUP

-G

string

User group name

LSB_SUB_WINDOW_SIG

-s

boolean

Window signal number

LSB_SUB2_JOB_GROUP

-g

string

Submits a job to a job group

LSB_SUB2_LICENSE_PROJECT

-Lp

string

Platform License Scheduler project name

LSB_SUB2_IN_FILE_SPOOL

-is

string

Spooled input file name

LSB_SUB2_JOB_CMD_SPOOL

-Zs

string

Spooled job command file name

LSB_SUB2_JOB_PRIORITY

-sp

integer

Job priority

For bmod -spn, the value is SUB_RESET.

LSB_SUB2_SLA

-sla

string

SLA scheduling options

LSB_SUB2_USE_RSV

-U

string

Advance reservation ID

LSB_SUB3_ABSOLUTE_PRIORITY

bmod -aps

bmod -apsn

string

For bmod -aps, the value equal to the APS string given. For bmod -apsn, the value is SUB_RESET.

LSB_SUB3_AUTO_RESIZABLE

-ar

boolean

Job autoresizable attribute. LSB_SUB3_AUTO_RESIZABLE=Y if bsub -ar or bmod -ar is specified.

LSB_SUB3_AUTO_RESIABLE=SUB_RESET if bmod -arn is used.

LSB_SUB3_APP

-app

string

Application profile name

For bmod -appn, the value is SUB_RESET.

LSB_SUB3_CWD

-cwd

string

Current working directory

LSB_SUB3_ INIT_CHKPNT_PERIOD

-k init

integer

Initial checkpoint period

LSB_SUB_INTERACTIVE

LSB_SUB3_INTERACTIVE_SSH

bsub -IS

boolean

The session of the interactive job is encrypted with SSH.

LSB_SUB_INTERACTIVE

LSB_SUB_PTY

LSB_SUB3_INTERACTIVE_SSH

bsub –ISp

boolean

If LSB_SUB_INTERACTIVE is specified by "Y", LSB_SUB_PTY is specified by "Y", and LSB_SUB3_INTERACTIVE_SSH is specified by "Y", the session of interactive job with PTY support is encrypted by SSH.

LSB_SUB_INTERACTIVE

LSB_SUB_PTY

LSB_SUB_PTY_SHELL

LSB_SUB3_INTERACTIVE_SSH

bsub –ISs

boolean

If LSB_SUB_INTERACTIVE is specified by "Y", LSB_SUB_PTY is specified by "Y", LSB_SUB_PTY_SHELL is specified by "Y", and LSB_SUB3_INTERACTIVE_SSH is specified by "Y", the session of interactive job with PTY shell support is encrypted by SSH.

LSB_SUB3_JOB_REQUEUE

-Q

string

String format parameter containing the job requeue exit values

For bmod -Qn, the value is SUB_RESET.

LSB_SUB3_MIG

-mig

-mign

integer

Migration threshold

LSB_SUB3_POST_EXEC

-Ep

string

Run the specified post-execution command on the execution host after the job finishes.

The file path of the directory can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.

LSB_SUB3_RESIZE_NOTIFY_CMD

-rnc

string

Job resize notification command.

LSB_SUB3_RESIZE_NOTIFY_CMD=<cmd> if bsub -rnc or bmod -rnc is specified.

LSB_SUB3_RESIZE_NOTIFY_CMD=SUB_RESET if bmod -rnc is used.

LSB_SUB3_RUNTIME_ESTIMATION

-We

integer

Runtime estimate in seconds. (Note this is not in minutes, unlike the runtime estimate specified by bsub -We)

LSB_SUB3_RUNTIME_ESTIMATION_ACC

-We+

integer

Runtime estimate that is the accumulated run time plus the runtime estimate

LSB_SUB3_RUNTIME_ESTIMATION_PERC

-Wep

integer

Runtime estimate in percentage of completion

LSB_SUB3_USER_SHELL_LIMITS

-ul

boolean

Pass user shell limits to execution host

LSB_SUB_INTERACTIVE

LSB_SUB3_XJOB_SSH

bsub -IX

boolean

If both are set to "Y", the session between the X-client and X-server as well as the session between the execution host and submission host are encrypted with SSH.


LSB_SUB_MODIFY_FILE

Points to the file that esub uses to modify the bsub job option values stored in the LSB_SUB_PARM_FILE. You can change the job options by having your esub write the new values to the LSB_SUB_MODIFY_FILE in any order, using the same format shown for the LSB_SUB_PARM_FILE. The value SUB_RESET, integers, and boolean values do not require quotes. String parameters must be entered with quotes around each string, or space-separated series of strings.

When your esub runs at job submission, LSF checks the LSB_SUB_MODIFY_FILE and applies changes so that the job runs with the revised option values.

Restriction:

LSB_SUB_ADDITIONAL is the only option that an esub cannot change or add at job submission.

LSB_SUB_MODIFY_ENVFILE

Points to the file that esub uses to modify the user environment variables with which the job is submitted (not specified by bsub options). You can change these environment variables by having your esub write the values to the LSB_SUB_MODIFY_ENVFILE in any order, using the format variable_name=value, or variable_name="string".

LSF uses the LSB_SUB_MODIFY_ENVFILE to change the environment variables on the submission host. When your esub runs at job submission, LSF checks the LSB_SUB_MODIFY_ENVFILE and applies changes so that the job is submitted with the new environment variable values. LSF associates the new user environment with the job so that the job runs on the execution host with the new user environment.

LSB_SUB_ABORT_VALUE
Indicates to LSF that a job should be rejected. For example, if you want LSF to reject a job, your esub should contain the line
exit $LSB_SUB_ABORT_VALUE
Restriction:

When an esub exits with the LSB_SUB_ABORT_VALUE, esub must not write to LSB_SUB_MODIFY_FILE or to LSB_SUB_MODIFY_ENVFILE.

If multiple esubs are specified and one of the esubs exits with a value of LSB_SUB_ABORT_VALUE, LSF rejects the job without running the remaining esubs and returns a value of LSB_SUB_ABORT_VALUE.

LSB_INVOKE_CMD

Specifies the name of the LSF command that most recently invoked an external executable.

Environment variables used by eexec

When you write an eexec, you can use the following environment variables in addition to all user-environment or application-specific variables.
LS_EXEC_T
Indicates the stage or type of job execution. LSF sets LS_EXEC_T to:
  • START at the beginning of job execution

  • END at job completion

  • CHKPNT at job checkpoint start

LS_JOBPID

Stores the process ID of the LSF process that invoked eexec. If eexec is intended to monitor job execution, eexec must spawn a child and then have the parent eexec process exit. The eexec child should periodically test that the job process is still alive using the LS_JOBPID variable.