External job submission and execution controls

The job submission and execution controls feature enables you to use external, site-specific executables to validate, modify, and reject jobs, transfer data, and modify the job execution environment. By writing external submission (esub) and external execution (eexec) binaries or scripts, you can, for example, prevent the overuse of resources, specify execution hosts, or set required environment variables based on the job submission options.

Contents

  • About job submission and execution controls

  • Scope

  • Configuration to enable job submission and execution controls

  • Job submission and execution controls behavior

  • Configuration to modify job submission and execution controls

  • Job submission and execution controls commands

About job submission and execution controls

The job submission and execution controls feature uses the executables esub and eexec to control job options and the job execution environment.

External submission (esub)

An esub is an executable that you write to meet the job requirements at your site. The following are some of the things that you can use an esub to do:
  • Validate job options

  • Change the job options specified by a user

  • Change user environment variables on the submission host (at job submission only)

  • Reject jobs (at job submission only)

  • Pass data to stdin of eexec

When a user submits a job using bsub or modifies a job using bmod, LSF runs the esub executable(s) on the submission host before accepting the job. If the user submitted the job with options such as -R to specify required resources or -q to specify a queue, an esub can change the values of those options to conform to resource usage policies at your site.

Note:

When compound resource requirements are used at any level, an esub can create job-level resource requirements which overwrite most application-level and queue-level resource requirements. -R merge rules are explained in detail in Administering Platform LSF.

An esub can also change the user environment on the submission host prior to job submission so that when LSF copies the submission host environment to the execution host, the job runs on the execution host with the values specified by the esub. For example, an esub can add user environment variables to those already associated with the job.

Use of esub not enabled

With esub enabled

An esub executable is typically used to enforce site-specific job submission policies and command-line syntax by validating or pre-parsing the command line. The file indicated by the environment variable LSB_SUB_PARM_FILE stores the values submitted by the user. An esub reads the LSB_SUB_PARM_FILE and then accepts or changes the option values or rejects the job. Because an esub runs before job submission, using an esub to reject incorrect job submissions improves overall system performance by reducing the load on the master batch daemon (mbatchd).

An esub can be used to:
  • Reject any job that requests more than a specified number of CPUs

  • Change the submission queue for specific user accounts to a higher priority queue

  • Check whether the job specifies an application and, if so, submit the job to the correct application profile

Note:

If an esub executable fails, the job will still be submitted to LSF.

Multiple esub executables

LSF provides a master external submission executable (LSF_SERVERDIR/mesub) that supports the use of application-specific esub executables. Users can specify one or more esub executables using the -a option of bsub or bmod. When a user submits or modifies a job or when a user restarts a job that was submitted or modified with the -a option included, mesub runs the specified esub executables.

An LSF administrator can specify one or more mandatory esub executables by defining the parameter LSB_ESUB_METHOD in lsf.conf. If a mandatory esub is defined, mesub runs the mandatory esub for all jobs submitted to LSF in addition to any esub executables specified with the -a option.

The naming convention is esub.application. LSF always runs the executable named "esub" (without .application) if it exists in LSF_SERVERDIR.
Note:

All esub executables must be stored in the LSF_SERVERDIR directory defined in lsf.conf.

The mesub runs multiple esub executables in the following order:
  1. The mandatory esub or esubs specified by LSB_ESUB_METHOD in lsf.conf

  2. Any executable with the name "esub" in LSF_SERVERDIR

  3. One or more esubs in the order specified by bsub -a

Example of multiple esub execution

An esub runs only once, even if it is specified by both the bsub -a option and the parameter LSB_ESUB_METHOD.

External execution (eexec)

An eexec is an executable that you write to control the job environment on the execution host.

Use of eexec not enabled

With eexec enabled

The following are some of the things that you can use an eexec to do:
  • Set up the user environment variables on the execution host

  • Monitor job state or resource usage

  • Receive data from stdout of esub

  • Run a shell script to create and populate environment variables needed by jobs

  • Monitor the number of tasks running on a host and raise a flag when this number exceeds a pre-determined limit

  • Pass DCE credentials and AFS tokens using a combination of esub and eexec executables; LSF functions as a pipe for passing data from the stdout of esub to the stdin of eexec

An eexec can change the user environment variable values transferred from the submission host so that the job runs on the execution host with a different environment.

For example, if you have a mixed UNIX and Windows cluster, the submission and execution hosts might use different operating systems. In this case, the submission host environment might not meet the job requirements when the job runs on the execution host. You can use an eexec to set the correct user environment between the two operating systems.

Typically, an eexec executable is a shell script that creates and populates the environment variables required by the job. An eexec can also monitor job execution and enforce site-specific resource usage policies.

The following are some of the things that you can use an eexec to do:
  • Set up the user environment variables on the execution host

  • Monitor job state or resource usage

  • Receive data from stdout of esub

  • Run a shell script to create and populate environment variables needed by jobs

  • Monitor the number of tasks running on a host and raise a flag when this number exceeds a pre-determined limit

  • Pass DCE credentials and AFS tokens using a combination of esub and eexec executables; LSF functions as a pipe for passing data from the stdout of esub to the stdin of eexec

If an eexec executable exists in the directory specified by LSF_SERVERDIR, LSF invokes that eexec for all jobs submitted to the cluster. By default, LSF runs eexec on the execution host before the job starts. The job process that invokes eexec waits for eexec to finish before continuing with job execution.

Unlike a pre-execution command defined at the job, queue, or application levels, an eexec:
  • Runs at job start, finish, or checkpoint

  • Allows the job to run without pending if eexec fails; eexec has no effect on the job state

  • Runs for all jobs, regardless of queue or application profile

Scope


Applicability

Details

Operating system

  • UNIX and Linux

  • Windows

Security

  • Data passing between esub on the submission host and eexec on the execution host is not encrypted.

Job types

  • Batch jobs submitted with bsub or modified by bmod.

  • Batch jobs restarted with brestart.

  • Interactive tasks submitted with lsrun and lsgrun (eexec only).

Dependencies

  • UNIX and Windows user accounts must be valid on all hosts in the cluster, or the correct type of account mapping must be enabled.
    • For a mixed UNIX/Windows cluster, UNIX/Windows user account mapping must be enabled.

    • For a cluster with a non-uniform user name space, between-host account mapping must be enabled.

    • For a MultiCluster environment with a non-uniform user name space, cross-cluster user account mapping must be enabled.

  • User accounts must have the correct permissions to successfully run jobs.

  • An eexec that requires root privileges to run on UNIX, must be configured to run as the root user.

Limitations

  • Only an esub invoked by bsub can change the job environment on the submission host. An esub invoked by bmod or brestart cannot change the environment.

  • Any esub messages provided to the user must be directed to standard error, not to standard output. Standard output from any esub is automatically passed to eexec.

  • An eexec can handle only one standard output stream from an esub as standard input to eexec. You must make sure that your eexec handles standard output from correctly if any esub writes to standard output.

  • The esub/eexec combination cannot handle daemon authentication. To configure daemon authentication, you must enable external authentication, which uses the eauth executable.


Configuration to enable job submission and execution controls

This feature is enabled by the presence of at least one esub or one eexec executable in the directory specified by the parameter LSF_SERVERDIR in lsf.conf. LSF does not include a default esub or eexec; you should write your own executables to meet the job requirements of your site.


Executable file

UNIX naming convention

Windows naming convention

esub

LSF_SERVERDIR/esub.application

LSF_SERVERDIR\esub.application.exe

LSF_SERVERDIR\esub.application.bat

eexec

LSF_SERVERDIR/eexec

LSF_SERVERDIR\eexec.exe

LSF_SERVERDIR\eexec.bat


The name of your esub should indicate the application with which it runs. For example: esub.fluent.

Restriction:

The name esub.user is reserved. Do not use the name esub.user for an application-specific esub.

Valid file names contain only alphanumeric characters, underscores (_), and hyphens (-).

Once the LSF_SERVERDIR contains one or more esub executables, users can specify the esub executables associated with each job they submit. If an eexec exists in LSF_SERVERDIR, LSF invokes that eexec for all jobs submitted to the cluster.

The following esub executables are provided as separate packages, available from Platform Computing Inc. upon request:
  • esub.openmpi : OpenMPI job submission

  • esub.pvm: PVM job submission

  • esub.poe : POE job submission

  • esub.ls_dyna : LS-Dyna job submission

  • esub.fluent : FLUENT job submission

  • esub.afs or esub.dce: AFS or DCE security

  • esub.lammpi LAM/MPI job submission

  • esub.mpich_gm : MPICH-GM job submission

  • esub.intelmpi: Intel® MPI job submission

  • esub.bproc: Beowulf Distributed Process Space (BProc) job submission

  • esub.mpich2: MPICH2 job submission

  • esub.mpichp4: MPICH-P4 job submission

  • esub.mvapich: MVAPICH job submission

  • esub.tv, esub.tvlammpi, esub.tvmpich_gm, esub.tvpoe: TotalView® debugging for various MPI applications.

Environment variables used by esub

When you write an esub, you can use the following environment variables provided by LSF for the esub execution environment:

LSB_SUB_PARM_FILE

Points to a temporary file that LSF uses to store the bsub options entered in the command line. An esub reads this file at job submission and either accepts the values, changes the values, or rejects the job. Job submission options are stored as name-value pairs on separate lines with the format option_name=value.

For example, if a user submits the following job,

bsub -q normal -x -P myproject -R "rlm rusage[mem=100]" -n 90 myjob

The LSB_SUB_PARM_FILE contains the following lines:
LSB_SUB_QUEUE="normal"
LSB_SUB_EXLUSIVE=Y
LSB_SUB_RES_REQ="rlm usage[mem=100]"
LSB_SUB_PROJECT_NAME="myproject"
LSB_SUB_COMMAND_LINE="myjob"
LSB_SUB_NUM_PROCESSORS=90
LSB_SUB_MAX_NUM_PROCESSORS=90

An esub can change any or all of the job options by writing to the file specified by the environment variable LSB_SUB_MODIFY_FILE.

The temporary file pointed to by LSB_SUB_PARM_FILE stores the following information:

Option

bsub or bmod option

data type

Description

LSB_SUB_ADDITIONAL

-a

string

String that contains the application name or names of the esub executables requested by the user.
Restriction:

This is the only option that an esub cannot change or add at job submission.

LSB_SUB_BEGIN_TIME

-b

integer

Begin time, in seconds since 00:00:00 GMT, Jan. 1, 1970

LSB_SUB_CHKPNT_DIR

-k

string

Checkpoint directory

The file path of the checkpoint directory can contain up to 4000 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.

LSB_SUB_COMMAND_LINE

bsub job command argument

string

LSB_SUB_COMMANDNAME must be set in lsf.conf to enable esub to use this variable.

LSB_SUB_CHKPNT_PERIOD

-k

integer

Checkpoint period

LSB_SUB_DEPEND_COND

-w

string

Dependency condition

LSB_SUB_ERR_FILE

-e, -eo

string

Standard error file name

LSB_SUB_EXCLUSIVE

-x

boolean

Exclusive execution, specified by "Y"

LSB_SUB_EXTSCHED_PARAM

-ext

string

External scheduler options

LSB_SUB_HOLD

-H

boolean

Hold job

LSB_SUB_HOST_SPEC

-c or -w

string

Host specifier, limits the CPU time or RUN time.

LSB_SUB_HOSTS

-m

string

List of requested execution host names

LSB_SUB_IN_FILE

-i, -io

string

Standard input file name

LSB_SUB_INTERACTIVE

-I

boolean

Interactive job, specified by "Y"

LSB_SUB_LOGIN_SHELL

-L

string

Login shell

LSB_SUB_JOB_DESCRIPTION

-Jd

string

Job description

LSB_SUB_JOB_NAME

-J

string

Job name

LSB_SUB_JOB_WARNING_ACTION

-wa

string

Job warning action

LSB_SUB_JOB_ACTION_WARNING_TIME

-wt

integer

Job warning time period

LSB_SUB_MAIL_USER

-u

string

Email address to which LSF sends job-related messages

LSB_SUB_MAX_NUM_PROCESSORS

-n

integer

Maximum number of processors requested

LSB_SUB_MODIFY

bmod

boolean

Indicates that bmod invoked esub, specified by "Y".

LSB_SUB_MODIFY_ONCE

bmod

boolean

Indicates that the job options specified at job submission have already been modified by bmod, and that bmod is invoking esub again, specified by "Y".

LSB_SUB_NOTIFY_BEGIN

-B

boolean

LSF sends an email notification when the job begins, specified by "Y".

LSB_SUB_NOTIFY_END

-N

boolean

LSF sends an email notification when the job ends, specified by "Y".

LSB_SUB_NUM_PROCESSORS

-n

integer

Minimum number of processors requested.

LSB_SUB_OTHER_FILES

bmod -f

integer

Indicates the number of files to be transferred. The value is SUB_RESET if bmod is being used to reset the number of files to be transferred.

The file path of the directory can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the director and file name.

LSB_SUB_OTHER_FILES_number

bsub -f

integer

The number indicates the particular file transfer value in the specified file transfer expression.

For example, for bsub -f "a > b" -f "c < d", the following would be defined:

LSB_SUB_OTHER_FILES=2

LSB_SUB_OTHER_FILES_0="a > b"

LSB_SUB_OTHER_FILES_1="c < d"

LSB_SUB_OUT_FILE

-o, -oo

string

Standard output file name.

LSB_SUB_PRE_EXEC

-E

string

Pre-execution command.

The file path of the directory can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.

LSB_SUB_PROJECT_NAME

-P

string

Project name.

LSB_SUB_PTY

-Ip

boolean

An interactive job with PTY support, specified by "Y"

LSB_SUB_PTY_SHELL

-Is

boolean

An interactive job with PTY shell support, specified by "Y"

LSB_SUB_QUEUE

-q

string

Submission queue name

LSB_SUB_RERUNNABLE

-r

boolean

"Y" specifies a rerunnable job

"N" specifies a nonrerunnable job (specified with bsub -rn). The job is not rerunnable even it was submitted to a rerunable queue or application profile

For bmod -rn, the value is SUB_RESET.

LSB_SUB_RES_REQ

-R

string

Resource requirement string—does not support multiple resource requirement strings

LSB_SUB_RESTART

brestart

boolean

"Y" indicates to esub that the job options are associated with a restarted job.

LSB_SUB_RESTART_FORCE

brestart -f

boolean

"Y" indicates to esub that the job options are associated with a forced restarted job.

LSB_SUB_RLIMIT_CORE

-C

integer

Core file size limit

LSB_SUB_RLIMIT_CPU

-c

integer

CPU limit

LSB_SUB_RLIMIT_DATA

-D

integer

Data size limit

For AIX, if the XPG_SUS_ENV=ON environment variable is set in the user's environment before the process is executed and a process attempts to set the limit lower than current usage, the operation fails with errno set to EINVAL. If the XPG_SUS_ENV environment variable is not set, the operation fails with errno set to EFAULT.

LSB_SUB_RLIMIT_FSIZE

-F

integer

File size limit

LSB_SUB_RLIMIT_PROCESS

-p

integer

Process limit

LSB_SUB_RLIMIT_RSS

-M

integer

Resident size limit

LSB_SUB_RLIMIT_RUN

-W

integer

Wall-clock run limit

LSB_SUB_RLIMIT_STACK

-S

integer

Stack size limit

LSB_SUB_RLIMIT_THREAD

-T

integer

Thread limit

LSB_SUB_TERM_TIME

-t

integer

Termination time, in seconds, since 00:00:00 GMT, Jan. 1, 1970

LSB_SUB_TIME_EVENT

-wt

string

Time event expression

LSB_SUB_USER_GROUP

-G

string

User group name

LSB_SUB_WINDOW_SIG

-s

boolean

Window signal number

LSB_SUB2_JOB_GROUP

-g

string

Submits a job to a job group

LSB_SUB2_LICENSE_PROJECT

-Lp

string

LSF License Scheduler project name

LSB_SUB2_IN_FILE_SPOOL

-is

string

Spooled input file name

LSB_SUB2_JOB_CMD_SPOOL

-Zs

string

Spooled job command file name

LSB_SUB2_JOB_PRIORITY

-sp

integer

Job priority

For bmod -spn, the value is SUB_RESET.

LSB_SUB2_SLA

-sla

string

SLA scheduling options

LSB_SUB2_USE_RSV

-U

string

Advance reservation ID

LSB_SUB3_ABSOLUTE_PRIORITY

bmod -aps

bmod -apsn

string

For bmod -aps, the value equal to the APS string given. For bmod -apsn, the value is SUB_RESET.

LSB_SUB3_AUTO_RESIZABLE

-ar

boolean

Job autoresizable attribute. LSB_SUB3_AUTO_RESIZABLE=Y if bsub -ar or bmod -ar is specified.

LSB_SUB3_AUTO_RESIABLE=SUB_RESET if bmod -arn is used.

LSB_SUB3_APP

-app

string

Application profile name

For bmod -appn, the value is SUB_RESET.

LSB_SUB3_CWD

-cwd

string

Current working directory

LSB_SUB3_ INIT_CHKPNT_PERIOD

-k init

integer

Initial checkpoint period

LSB_SUB_INTERACTIVE

LSB_SUB3_INTERACTIVE_SSH

bsub -IS

boolean

The session of the interactive job is encrypted with SSH.

LSB_SUB_INTERACTIVE

LSB_SUB_PTY

LSB_SUB3_INTERACTIVE_SSH

bsub –ISp

boolean

If LSB_SUB_INTERACTIVE is specified by "Y", LSB_SUB_PTY is specified by "Y", and LSB_SUB3_INTERACTIVE_SSH is specified by "Y", the session of interactive job with PTY support is encrypted by SSH.

LSB_SUB_INTERACTIVE

LSB_SUB_PTY

LSB_SUB_PTY_SHELL

LSB_SUB3_INTERACTIVE_SSH

bsub –ISs

boolean

If LSB_SUB_INTERACTIVE is specified by "Y", LSB_SUB_PTY is specified by "Y", LSB_SUB_PTY_SHELL is specified by "Y", and LSB_SUB3_INTERACTIVE_SSH is specified by "Y", the session of interactive job with PTY shell support is encrypted by SSH.

LSB_SUB3_JOB_REQUEUE

-Q

string

String format parameter containing the job requeue exit values

For bmod -Qn, the value is SUB_RESET.

LSB_SUB3_MIG

-mig

-mign

integer

Migration threshold

LSB_SUB3_POST_EXEC

-Ep

string

Run the specified post-execution command on the execution host after the job finishes.

The file path of the directory can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.

LSB_SUB3_RESIZE_NOTIFY_CMD

-rnc

string

Job resize notification command.

LSB_SUB3_RESIZE_NOTIFY_CMD=<cmd> if bsub -rnc or bmod -rnc is specified.

LSB_SUB3_RESIZE_NOTIFY_CMD=SUB_RESET if bmod -rnc is used.

LSB_SUB3_RUNTIME_ESTIMATION

-We

integer

Runtime estimate

LSB_SUB3_RUNTIME_ESTIMATION_ACC

-We+

integer

Runtime estimate that is the accumulated run time plus the runtime estimate

LSB_SUB3_RUNTIME_ESTIMATION_PERC

-Wep

integer

Runtime estimate in percentage of completion

LSB_SUB3_USER_SHELL_LIMITS

-ul

boolean

Pass user shell limits to execution host

LSB_SUB_INTERACTIVE

LSB_SUB3_XJOB_SSH

bsub -IX

boolean

If both are set to "Y", the session between the X-client and X-server as well as the session between the execution host and submission host are encrypted with SSH.


LSB_SUB_MODIFY_FILE

Points to the file that esub uses to modify the bsub job option values stored in the LSB_SUB_PARM_FILE. You can change the job options by having your esub write the new values to the LSB_SUB_MODIFY_FILE in any order, using the same format shown for the LSB_SUB_PARM_FILE. The value SUB_RESET, integers, and boolean values do not require quotes. String parameters must be entered with quotes around each string, or space-separated series of strings.

When your esub runs at job submission, LSF checks the LSB_SUB_MODIFY_FILE and applies changes so that the job runs with the revised option values.

Restriction:

LSB_SUB_ADDITIONAL is the only option that an esub cannot change or add at job submission.

LSB_SUB_MODIFY_ENVFILE

Points to the file that esub uses to modify the user environment variables with which the job is submitted (not specified by bsub options). You can change these environment variables by having your esub write the values to the LSB_SUB_MODIFY_ENVFILE in any order, using the format variable_name=value, or variable_name="string".

LSF uses the LSB_SUB_MODIFY_ENVFILE to change the environment variables on the submission host. When your esub runs at job submission, LSF checks the LSB_SUB_MODIFY_ENVFILE and applies changes so that the job is submitted with the new environment variable values. LSF associates the new user environment with the job so that the job runs on the execution host with the new user environment.

LSB_SUB_ABORT_VALUE
Indicates to LSF that a job should be rejected. For example, if you want LSF to reject a job, your esub should contain the line
exit $LSB_SUB_ABORT_VALUE
Restriction:

When an esub exits with the LSB_SUB_ABORT_VALUE, esub must not write to LSB_SUB_MODIFY_FILE or to LSB_SUB_MODIFY_ENVFILE.

If multiple esubs are specified and one of the esubs exits with a value of LSB_SUB_ABORT_VALUE, LSF rejects the job without running the remaining esubs and returns a value of LSB_SUB_ABORT_VALUE.

LSB_INVOKE_CMD

Specifies the name of the LSF command that most recently invoked an external executable.

Environment variables used by eexec

When you write an eexec, you can use the following environment variables in addition to all user-environment or application-specific variables.
LS_EXEC_T
Indicates the stage or type of job execution. LSF sets LS_EXEC_T to:
  • START at the beginning of job execution

  • END at job completion

  • CHKPNT at job checkpoint start

LS_JOBPID

Stores the process ID of the LSF process that invoked eexec. If eexec is intended to monitor job execution, eexec must spawn a child and then have the parent eexec process exit. The eexec child should periodically test that the job process is still alive using the LS_JOBPID variable.

Job submission and execution controls behavior

The following examples illustrate how customized esub and eexec executables can control job submission and execution.

Validating job submission parameters using esub

When a user submits a job using bsub-P, LSF accepts any project name entered by the user and associates that project name with the job. This example shows an esub that supports project-based accounting by enforcing the use of valid project names for jobs submitted by users who are eligible to charge to those projects. If a user submits a job to any project other than proj1 or proj2, or if the user name is not user1 or user2, LSF rejects the job based on the exit value of LSB_SUB_ABORT_VALUE.
#!/bin/sh 
. $LSB_SUB_PARM_FILE 
# Redirect stderr to stdout so echo can be used for error messages exec 1>&2 
# Check valid projects 
if [ $LSB_SUB_PROJECT_NAME != "proj1" -o $LSB_SUB_PROJECT_NAME != "proj2" ]; then
 echo "Incorrect project name specified"
   exit $LSB_SUB_ABORT_VALUE 
fi 
USER=`whoami` 
if [ $LSB_SUB_PROJECT_NAME="proj1" ]; then   
# Only user1 and user2 can charge to proj1   
 if [$USER != "user1" -a $USER != "user2" ]; then
      echo "You are not allowed to charge to this project"
      exit $LSB_SUB_ABORT_VALUE
   fi 
fi

Changing job submission parameters using esub

The following example shows an esub that modifies job submission options and environment variables based on the user name that submits a job. This esub writes the changes to LSB_SUB_MODIFY_FILE for userA and to LSB_SUB_MODIFY_ENVFILE for userB. LSF rejects all jobs submitted by userC without writing to either file:
#!/bin/sh 
. $LSB_SUB_PARM_FILE 
# Redirect stderr to stdout so echo can be used for error messages exec 1>&2
USER=`whoami` 
# Make sure userA is using the right queue queueA 
if [ $USER="userA" -a $LSB_SUB_QUEUE != "queueA" ]; then
   echo "userA has submitted a job to an incorrect queue"
   echo "...submitting to queueA"
   echo 'LSB_SUB_QUEUE="queueA"' > $LSB_SUB_MODIFY_FILE 
fi 
# Make sure userB is using the right shell (/bin/sh) 
if [ $USER="userB" -a $SHELL != "/bin/sh" ]; then
   echo "userB has submitted a job using $SHELL"
   echo "...using /bin/sh instead"
   echo 'SHELL="/bin/sh"' > $LSB_SUB_MODIFY_ENVFILE 
fi 
# Deny userC the ability to submit a job 
if [ $USER="userC" ]; then
   echo "You are not permitted to submit a job."
   exit $LSB_SUB_ABORT_VALUE
 fi

Monitoring the execution environment using eexec

This example shows how you can use an eexec to monitor job execution:
#!/bin/sh
# eexec
# Example script to monitor the number of jobs executing through RES.
# This script works in cooperation with an elim that counts the
# number of files in the TASKDIR directory. Each RES process on a host
# will have a file in the TASKDIR directory.
# Don’t want to monitor lsbatch jobs.
if [ "$LSB_JOBID" != "" ] ; then
    exit 0
fi
TASKDIR="/tmp/RES_dir" 
# directory containing all the task files 
#for the host. 
# you can change this to whatever
# directory you wish, just make sure anyone
# has read/write permissions.
# if TASKDIR does not exist create it
if [ "test -d $TASKDIR" != "0" ] ; then
   mkdir $TASKDIR > /dev/null 2>&1
fi
# Need to make sure LS_JOBPID, and USER are defined
# exit normally
if [ "test -z $LS_JOBPID"="0" ] ; then
    exit 0
elif [ "test -z $USER" =  "0" ] ; then
     exit 0
fi
taskFile="$TASKDIR/$LS_JOBPID.$USER"
# Fork grandchild to stay around for the duration of the task
touch $taskFile >/dev/null 2>&1
(
        (while : ;
        do
                kill -0 $LS_JOBPID >/dev/null 2>&1
                if [ $? -eq 0 ] ; then
                        sleep 10  # this is the poll interval
                                  # increase it if you want but
                                   # see the elim for its
                                  # corresponding update interval    
            else
                        rm $taskFile >/dev/null 2>&1 
                        exit 0
                fi
        done)&
)&
wait

Passing data between esub and eexec

A combination of esub and eexec executables can be used to pass AFS/DCE tokens from the submission host to the execution host. LSF passes data from the standard output of esub to the standard input of eexec. A daemon wrapper script can be used to renew the tokens.

Configuration to modify job submission and execution controls

There are configuration parameters that modify various aspects of job submission and execution controls behavior by:
  • Defining a mandatory esub that applies to all jobs in the cluster

  • Specifying the eexec user account (UNIX only)

Configuration to define a mandatory esub


Configuration file

Parameter and syntax

Behavior

lsf.conf

LSB_ESUB_METHOD="esub_application [esub_application] …"

  • The specified esub or esubs run for all jobs submitted to the cluster, in addition to any esub specified by the user in the command line

  • For example, to specify a mandatory esub named esub.fluent, define LSB_ESUB_METHOD=fluent


Configuration to specify the eexec user account

The eexec executable runs under the submission user account. You can modify this behavior for UNIX hosts by specifying a different user account.

Configuration file

Parameter and syntax

Behavior

lsf.sudoers

LSF_EEXEC_USER=user_name

  • Changes the user account under which eexec runs


Job submission and execution controls commands

Commands for submission


Command

Description

bsub -a esub_application [esub_application] …

  • Specifies one or more esub executables to run at job submission

  • For example, to specify the esub named esub.fluent, use bsub -a fluent

  • LSF runs any esub executables defined by LSB_ESUB_METHOD, followed by the executable named "esub" if it exists in LSF_SERVERDIR, followed by the esub executables specified by the -a option

  • LSF runs eexec if an executable file with that name exists in LSF_SERVERDIR

brestart

  • Restarts a checkpointed job and runs the esub executables specified when the job was submitted

  • LSF runs any esub executables defined by LSB_ESUB_METHOD, followed by the executable named "esub" if it exists in LSF_SERVERDIR, followed by the esub executables specified by the -a option

  • LSF runs eexec if an executable file with that name exists in LSF_SERVERDIR

lsrun

  • Submits an interactive task; LSF runs eexec if an eexec executable exists in LSF_SERVERDIR

  • LSF runs eexec only at task startup (LS_EXEC_T=START)

lsgrun

  • Submits an interactive task to run on a set of hosts; LSF runs eexec if an eexec executable exists in LSF_SERVERDIR

  • LSF runs eexec only at task startup (LS_EXEC_T=START)


Commands to monitor

Not applicable: There are no commands to monitor the behavior of this feature.

Commands to control


Command

Description

bmod -a esub_application [esub_application] …

  • Resubmits a job and changes the esubs previously associated with the job

  • LSF runs any esub executables defined by LSB_ESUB_METHOD, followed by the executable named "esub" if it exists in LSF_SERVERDIR, followed by the esub executables specified by the -a option of bmod

  • LSF runs eexec if an executable file with that name exists in LSF_SERVERDIR

bmod -an

  • Dissociates from a job all esub executables that were previously associated with the job

  • LSF runs any esub executables defined by LSB_ESUB_METHOD, followed by the executable named "esub" if it exists in LSF_SERVERDIR

  • LSF runs eexec if an executable file with that name exists in LSF_SERVERDIR


Commands to display configuration


Command

Description

badmin showconf

  • Displays all configured parameters and their values set in lsf.conf or ego.conf that affect mbatchd and sbatchd.

    Use a text editor to view other parameters in the lsf.conf or ego.conf configuration files.

  • In a MultiCluster environment, badmin showconf only displays the parameters of daemons on the local cluster.


Use a text editor to view the lsf.sudoers configuration file.