lsb.params

The lsb.params file defines general parameters used by the LSF system. This file contains only one section, named Parameters. mbatchd uses lsb.params for initialization. The file is optional. If not present, the LSF-defined defaults are assumed.

Some of the parameters that can be defined in lsb.params control timing within the system. The default settings provide good throughput for long-running batch jobs while adding a minimum of processing overhead in the batch daemons.

This file is installed by default in LSB_CONFDIR/cluster_name/configdir.

Changing lsb.params configuration

After making any changes to lsb.params, run badmin reconfig to reconfigure mbatchd.

Parameters section

This section and all the keywords in this section are optional. If keywords are not present, the default values are assumed.

Parameters set at installation

The following parameter values are set at installation for the purpose of testing a new cluster:

Begin Parameters
DEFAULT_QUEUE  = normal   #default job queue name
MBD_SLEEP_TIME = 20       #mbatchd scheduling interval (60 secs is default)
SBD_SLEEP_TIME = 15       #sbatchd scheduling interval (30 secs is default)
JOB_ACCEPT_INTERVAL = 1   #interval for any host to accept a job 
                          #(default is 1 (one-fold of MBD_SLEEP_TIME))
End Parameters

With this configuration, jobs submitted to the LSF system will be started on server hosts quickly. If this configuration is not suitable for your production use, you should either remove the parameters to take the default values, or adjust them as needed.

For example, to avoid having jobs start when host load is high, increase JOB_ACCEPT_INTERVAL so that the job scheduling interval is longer to give hosts more time to adjust load indices after accepting jobs.

In production use, you should define DEFAULT_QUEUE to the normal queue, MBD_SLEEP_TIME to 60 seconds (the default), and SBD_SLEEP_TIME to 30 seconds (the default).

ABS_RUNLIMIT

Syntax

ABS_RUNLIMIT=y | Y

Description

If set, absolute (wall-clock) run time is used instead of normalized run time for all jobs submitted with the following values:
  • Run time limit specified by the -W option of bsub

  • RUNLIMIT queue-level parameter in lsb.queues

  • RUNLIMIT application-level parameter in lsb.applications

  • RUNTIME parameter in lsb.applications

The run time estimates and limits are not normalized by the host CPU factor.

Default

N (run limit and run time estimate are normalized)

ACCT_ARCHIVE_AGE

Syntax

ACCT_ARCHIVE_AGE=days

Description

Enables automatic archiving of LSF accounting log files, and specifies the archive interval. LSF archives the current log file if the length of time from its creation date exceeds the specified number of days.

See also

  • ACCT_ARCHIVE_SIZE also enables automatic archiving

  • ACCT_ARCHIVE_TIME also enables automatic archiving

  • MAX_ACCT_ARCHIVE_FILE enables automatic deletion of the archives

Default

-1 (Not defined; no limit to the age of lsb.acct)

ACCT_ARCHIVE_SIZE

Syntax

ACCT_ARCHIVE_SIZE=kilobytes

Description

Enables automatic archiving of LSF accounting log files, and specifies the archive threshold. LSF archives the current log file if its size exceeds the specified number of kilobytes.

See also

  • ACCT_ARCHIVE_SIZE also enables automatic archiving

  • ACCT_ARCHIVE_TIME also enables automatic archiving

  • MAX_ACCT_ARCHIVE_FILE enables automatic deletion of the archives

Default

-1 (Not defined; no limit to the size of lsb.acct)

ACCT_ARCHIVE_TIME

Syntax

ACCT_ARCHIVE_TIME=hh:mm

Description

Enables automatic archiving of LSF accounting log file lsb.acct, and specifies the time of day to archive the current log file.

See also

  • ACCT_ARCHIVE_SIZE also enables automatic archiving

  • ACCT_ARCHIVE_TIME also enables automatic archiving

  • MAX_ACCT_ARCHIVE_FILE enables automatic deletion of the archives

Default

Not defined (no time set for archiving lsb.acct)

CHUNK_JOB_DURATION

Syntax

CHUNK_JOB_DURATION=minutes

Description

Specifies a CPU limit, run limit, or estimated run time for jobs submitted to a chunk job queue to be chunked.

When CHUNK_JOB_DURATION is set, the CPU limit or run limit set at the queue level (CPULIMIT or RUNLMIT), application level (CPULIMIT or RUNLIMIT), or job level (-c or -W bsub options), or the run time estimate set at the application level (RUNTIME) must be less than or equal to CHUNK_JOB_DURATION for jobs to be chunked.

If CHUNK_JOB_DURATION is set, jobs are not chunked if:

  • No CPU limit, run time limit, or run time estimate is specified at any level, or

  • A CPU limit, run time limit, or run time estimate is greater than the value of CHUNK_JOB_DURATION.

The value of CHUNK_JOB_DURATION is displayed by bparams -l.

Examples

  • CHUNK_JOB_DURATION is not defined:
    • Jobs with no CPU limit, run limit, or run time estimate are chunked

    • Jobs with a CPU limit, run limit, or run time estimate less than or equal to 30 are chunked

    • Jobs with a CPU limit, run limit, or run time estimate greater than 30 are not chunked

  • CHUNK_JOB_DURATION=90:
    • Jobs with no CPU limit, run limit, or run time estimate are not chunked

    • Jobs with a CPU limit, run limit, or run time estimate less than or equal to 90 are chunked

    • Jobs with a CPU limit, run limit, or run time estimate greater than 90 are not chunked

Default

-1 (Not defined.)

CLEAN_PERIOD

Syntax

CLEAN_PERIOD=seconds

Description

For non-repetitive jobs, the amount of time that job records for jobs that have finished or have been killed are kept in mbatchd core memory after they have finished.

Users can still see all jobs after they have finished using the bjobs command.

For jobs that finished more than CLEAN_PERIOD seconds ago, use the bhist command.

Default

3600 (1 hour)

COMMITTED_RUN_TIME_FACTOR

Syntax

COMMITTED_RUN_TIME_FACTOR=number

Description

Used only with fairshare scheduling. Committed run time weighting factor.

In the calculation of a user’s dynamic priority, this factor determines the relative importance of the committed run time in the calculation. If the -W option of bsub is not specified at job submission and a RUNLIMIT has not been set for the queue, the committed run time is not considered.

Valid Values

Any positive number between 0.0 and 1.0

Default

0.0

COMPUTE_UNIT_TYPES

Syntax

COMPUTE_UNIT_TYPES=type1 type2...

Description

Used to define valid compute unit types for topological resource requirement allocation.

The order in which compute unit types appear specifies the containment relationship between types. Finer grained compute unit types appear first, followed by the coarser grained type that contains them, and so on.

At most one compute unit type in the list can be followed by an exclamation mark designating it as the default compute unit type. If no exclamation mark appears, the first compute unit type in the list is taken as the default type.

Valid Values

Any space-separated list of alphanumeric strings.

Default

Not defined

Example

COMPUTE_UNIT_TYPES=cell enclosure! rack

Specifies three compute unit types, with the default type enclosure. Compute units of type rack contain type enclosure, and of type enclosure contain type cell.

CONDENSE_PENDING_REASONS

Syntax

CONDENSE_PENDING_REASONS=ALL | PARTIAL |N

Description

Set to ALL, condenses all host-based pending reasons into one generic pending reason. This is equivalent to setting CONDENSED_PENDING_REASON=Y.

Set to PARTIAL, condenses all host-based pending reasons except shared resource pending reasons into one generic pending reason.

If enabled, you can request a full pending reason list by running the following command:
badmin diagnose jobId
Tip:

You must be LSF administrator or a queue administrator to run this command.

Examples

  • CONDENSE_PENDING_REASONS=ALL If a job has no other pending reason, bjobs -p or bjobs -l displays the following:
    Individual host based reasons
  • CONDENSE_PENDING_REASONS=N The pending reasons are not suppressed. Host-based pending reasons are displayed.

Default

N

CPU_TIME_FACTOR

Syntax

CPU_TIME_FACTOR=number

Description

Used only with fairshare scheduling. CPU time weighting factor.

In the calculation of a user’s dynamic share priority, this factor determines the relative importance of the cumulative CPU time used by a user’s jobs.

Default

0.7

DEFAULT_APPLICATION

Syntax

DEFAULT_APPLICATION=application_profile_name

Description

The name of the default application profile. The application profile must already be defined in lsb.applications.

When you submit a job to LSF without explicitly specifying an application profile, LSF associates the job with the specified application profile.

Default

Not defined. When a user submits a job without explicitly specifying an application profile, and no default application profile is defined by this parameter, LSF does not associate the job with any application profile.

DEFAULT_HOST_SPEC

Syntax

DEFAULT_HOST_SPEC=host_name | host_model

Description

The default CPU time normalization host for the cluster.

The CPU factor of the specified host or host model will be used to normalize the CPU time limit of all jobs in the cluster, unless the CPU time normalization host is specified at the queue or job level.

Default

Not defined

DEFAULT_JOBGROUP

Syntax

DEFAULT_JOBGROUP=job_group_name

Description

The name of the default job group.

When you submit a job to LSF without explicitly specifying a job group, LSF associates the job with the specified job group. The LSB_DEFAULT_JOBGROUP environment variable overrrides the setting of DEFAULT_JOBGROUP. The bsub -g job_group_name option overrides both LSB_DEFAULT_JOBGROUP and DEFAULT_JOBGROUP.

Default job group specification supports macro substitution for project name (%p) and user name (%u). When you specify bsub -P project_name, the value of %p is the specified project name. If you do not specify a project name at job submission, %p is the project name defined by setting the environment variable LSB_DEFAULTPROJECT, or the project name specified by DEFAULT_PROJECT in lsb.params. the default project name is default.

For example, a default job group name specified by DEFAULT_JOBGROUP=/canada/%p/%u is expanded to the value for the LSF project name and the user name of the job submission user (for example, /canada/projects/user1).

Job group names must follow this format:
  • Job group names must start with a slash character (/). For example, DEFAULT_JOBGROUP=/A/B/C is correct, but DEFAULT_JOBGROUP=A/B/C is not correct.

  • Job group names cannot end with a slash character (/). For example, DEFAULT_JOBGROUP=/A/ is not correct.

  • Job group names cannot contain more than one slash character (/) in a row. For example, job group names like DEFAULT_JOBGROUP=/A//B or DEFAULT_JOBGROUP=A////B are not correct.

  • Job group names cannot contain spaces. For example, DEFAULT_JOBGROUP=/A/B C/D is not correct.

  • Project names and user names used for macro substitution with %p and %u cannot start or end with slash character (/).

  • Project names and user names used for macro substitution with %p and %u cannot contain spaces or more than one slash character (/) in a row.

  • Project names or user names containing slash character (/) will create separate job groups. For example, if the project name is canada/projects, DEFAULT_JOBGROUP=/%p results in a job group hierarchy /canada/projects.

Example

DEFAULT_JOBGROUP=/canada/projects

Default

Not defined. When a user submits a job without explicitly specifying job group name, and the LSB_DEFAULT_JOBGROUP environment variable is not defined, LSF does not associate the job with any job group.

DEFAULT_PROJECT

Syntax

DEFAULT_PROJECT=project_name

Description

The name of the default project. Specify any string.

When you submit a job without specifying any project name, and the environment variable LSB_DEFAULTPROJECT is not set, LSF automatically assigns the job to this project.

Default

default

DEFAULT_QUEUE

Syntax

DEFAULT_QUEUE=queue_name ...

Description

Space-separated list of candidate default queues (candidates must already be defined in lsb.queues).

When you submit a job to LSF without explicitly specifying a queue, and the environment variable LSB_DEFAULTQUEUE is not set, LSF puts the job in the first queue in this list that satisfies the job’s specifications subject to other restrictions, such as requested hosts, queue status, etc.

Default

This parameter is set at installation to DEFAULT_QUEUE=normal.

When a user submits a job to LSF without explicitly specifying a queue, and there are no candidate default queues defined (by this parameter or by the user’s environment variable LSB_DEFAULTQUEUE), LSF automatically creates a new queue named default, using the default configuration, and submits the job to that queue.

DEFAULT_SLA_VELOCITY

Syntax

DEFAULT_SLA_VELOCITY=num_slots

Description

For EGO-enabled SLA scheduling, the number of slots that the SLA should request for parallel jobs running in the SLA.

By default, an EGO-enabled SLA requests slots from EGO based on the number of jobs the SLA needs to run. If the jobs themselves require more than one slot, they will remain pending. To avoid this for parallel jobs, set DEFAULT_SLA_VELOCITY to the total number of slots that are expected to be used by parallel jobs.

Default

1

DETECT_IDLE_JOB_AFTER

Syntax

DETECT_IDLE_JOB_AFTER=time_minutes

Description

The minimum job run time before mbatchd reports that the job is idle.

Default

20 (mbatchd checks if the job is idle after 20 minutes of run time)

DISABLE_UACCT_MAP

Syntax

DISABLE_UACCT_MAP=y | Y

Description

Specify y or Y to disable user-level account mapping.

Default

N

EADMIN_TRIGGER_DURATION

Syntax

EADMIN_TRIGGER_DURATION=minutes

Description

Defines how often LSF_SERVERDIR/eadmin is invoked once a job exception is detected. Used in conjunction with job exception handling parameters JOB_IDLE, JOB_OVERRUN, and JOB_UNDERRUN in lsb.queues.

Tip:

Tune EADMIN_TRIGGER_DURATION carefully. Shorter values may raise false alarms, longer values may not trigger exceptions frequently enough.

Example

EADMIN_TRIGGER_DURATION=5

Default

1 minute

ENABLE_DEFAULT_EGO_SLA

Syntax

ENABLE_DEFAULT_EGO_SLA=service_class_name | consumer_name

Description

The name of the default service class or EGO consumer name for EGO-enabled SLA scheduling. If the specified SLA does not exist in lsb.servieclasses, LSF creates one with the specified consumer name, velocity of 1, priority of 1, and a time window that is always open.

If the name of the default SLA is not configured in lsb.servicesclasses, it must be the name of a valid EGO consumer.

ENABLE_DEFAULT_EGO_SLA is required to turn on EGO-enabled SLA scheduling. All LSF resource management is delegated to Platform EGO, and all LSF hosts are under EGO control. When all jobs running in the default SLA finish, all allocated hosts are released to EGO after the default idle timeout of 120 seconds (configurable by MAX_HOST_IDLE_TIME in lsb.serviceclasses).

When you submit a job to LSF without explicitly using the -sla option to specify a service class name, LSF puts the job in the default service class specified by service_class_name.

Default

Not defined. When a user submits a job to LSF without explicitly specifying a service class, and there is no default service class defined by this parameter, LSF does not attach the job to any service class.

ENABLE_EVENT_STREAM

Syntax

ENABLE_EVENT_STREAM=Y | N

Description

Used only with event streaming for system performance analysis tools, such as the Platform LSF reporting feature.

Default

N (event streaming is not enabled)

ENABLE_EXIT_RATE_PER_SLOT

Syntax

ENABLE_EXIT_RATE_PER_SLOT=Y | N

Description

Scales the actual exit rate thresholds on a host according to the number of slots on the host. For example, if EXIT_RATE=2 in lsb.hosts or GLOBAL_EXIT_RATE=2 in lsb.params, and the host has 2 job slots, the job exit rate threshold will be 4.

Default

N

ENABLE_HIST_RUN_TIME

Syntax

ENABLE_HIST_RUN_TIME=y | Y

Description

Used only with fairshare scheduling. If set, enables the use of historical run time in the calculation of fairshare scheduling priority.

Default

N

ENABLE_HOST_INTERSECTION

Syntax

ENABLE_HOST_INTERSECTION=Y | N

Description

When enabled, allows job submission to any host that belongs to the intersection created when considering the queue the job was submitted to, any advance reservation hosts, or any hosts specified by bsub -m at the time of submission.

When disabled job submission with hosts specified can be accepted only if specified hosts are a subset of hosts defined in the queue.

The following commands are affected by ENABLE_HOST_INTERSECTION:
  • bsub

  • bmod

  • bmig

  • brestart

  • bswitch

If no hosts exist in the intersection, the job is rejected.

Default

N

ENABLE_USER_RESUME

Syntax

ENABLE_USER_RESUME=Y | N

Description

Defines job resume permissions.

When this parameter is defined:

  • If the value is Y, users can resume their own jobs that have been suspended by the administrator.

  • If the value is N, jobs that are suspended by the administrator can only be resumed by the administrator or root; users do not have permission to resume a job suspended by another user or the administrator. Administrators can resume jobs suspended by users or administrators.

Default

N (users cannot resume jobs suspended by administrator)

ENFORCE_ONE_UG_LIMITS

Syntax

ENFORCE_ONE_UG_LIMITS=Y | N

Upon job submission with the -G option and when user groups have overlapping members, defines whether only the specified user group’s limits (or those of any parent group) are enforced or whether the most restrictive user group limits of any overlapping user/user group are enforced.

  • If the value is Y, only the limits defined for the user group that you specify with -G during job submission apply to the job, even if there are overlapping members of groups.

    If you have nested user groups, the limits of a user's group parent also apply.

    View existing limits by running blimits.

  • If the value is N and the user group has members that overlap with other user groups, the strictest possible limits (that you can view by running blimits) defined for any of the member user groups are enforced for the job.

Default

N

EVENT_STREAM_FILE

Syntax

EVENT_STREAM_FILE=file_path

Description

Determines the path to the event data stream file used by system performance analysis tools such as Platform LSF Reporting.

Default

LSF_TOP/work/cluster_name/logdir/stream/lsb.stream

EVENT_UPDATE_INTERVAL

Syntax

EVENT_UPDATE_INTERVAL=seconds

Description

Used with duplicate logging of event and accounting log files. LSB_LOCALDIR in lsf.conf must also be specified. Specifies how often to back up the data and synchronize the directories (LSB_SHAREDIR and LSB_LOCALDIR).

If you do not define this parameter, the directories are synchronized when data is logged to the files, or when mbatchd is started on the first LSF master host. If you define this parameter, mbatchd synchronizes the directories only at the specified time intervals.

Use this parameter if NFS traffic is too high and you want to reduce network traffic.

Valid values

1 to 2147483647

Recommended values

Between 10 and 30 seconds, or longer depending on the amount of network traffic.
Note:

Avoid setting the value to exactly 30 seconds, because this will trigger the default behavior and cause mbatchd to synchronize the data every time an event is logged.

Default

-1 (Not defined.)

See also

LSB_LOCALDIR in lsf.conf

EXIT_RATE_TYPE

Syntax

EXIT_RATE_TYPE=[JOBEXIT | JOBEXIT_NONLSF] [JOBINIT] [HPCINIT]

Description

When host exception handling is configured (EXIT_RATE in lsb.hosts or GLOBAL_EXIT_RATE in lsb.params), specifies the type of job exit to be handled.
JOBEXIT

Job exited after it was dispatched and started running.

JOBEXIT_NONLSF

Job exited with exit reasons related to LSF and not related to a host problem (for example, user action or LSF policy). These jobs are not counted in the exit rate calculation for the host.

JOBINIT

Job exited during initialization because of an execution environment problem. The job did not actually start running.

HPCINIT

Job exited during initialization of a Platform LSF HPC because of an execution environment problem. The job did not actually start running.

Default

JOBEXIT_NONLSF

EXTEND_JOB_EXCEPTION_NOTIFY

Syntax

EXTEND_JOB_EXCEPTION_NOTIFY=Y | y | N | n

Description

Sends extended information about a job exception in a notification email sent when a job exception occurs. Extended information includes:
  • JOB_ID

  • RUN_TIME

  • IDLE_FACTOR (Only applicable if the job has been idle.)

  • USER

  • QUEUE

  • EXEC_HOST

  • JOB_NAME

You can also set format options of the email in the eadmin script, located in the LSF_SERVERDIR directory. Valid values are fixed or full.

Default

N (Notfication for job exception is standard and includes only job ID and either run time or idle factor.)

FAIRSHARE_ADJUSTMENT_FACTOR

Syntax

FAIRSHARE_ADJUSTMENT_FACTOR=number

Description

Used only with fairshare scheduling. Fairshare adjustment plugin weighting factor.

In the calculation of a user’s dynamic share priority, this factor determines the relative importance of the user-defined adjustment made in the fairshare plugin (libfairshareadjust.*).

A positive float number both enables the fairshare plugin and acts as a weighting factor.

Default

0 (user-defined adjustment made in the fairshare plugin not used)

GLOBAL_EXIT_RATE

Syntax

GLOBAL_EXIT_RATE=number

Description

Specifies a cluster-wide threshold for exited jobs. If EXIT_RATE is not specified for the host in lsb.hosts, GLOBAL_EXIT_RATE defines a default exit rate for all hosts in the cluster. Host-level EXIT_RATE overrides the GLOBAL_EXIT_RATE value.

If the global job exit rate is exceeded for 5 minutes or the period specified by JOB_EXIT_RATE_DURATION, LSF invokes LSF_SERVERDIR/eadmin to trigger a host exception.

Example

GLOBAL_EXIT_RATE=10 defines a job exit rate of 10 jobs for all hosts.

Default

2147483647 (Unlimited threshold.)

HIST_HOURS

Syntax

HIST_HOURS=hours

Description

Used only with fairshare scheduling. Determines a rate of decay for cumulative CPU time and historical run time.

To calculate dynamic user priority, LSF scales the actual CPU time using a decay factor, so that 1 hour of recently-used time is equivalent to 0.1 hours after the specified number of hours has elapsed.

To calculate dynamic user priority with historical run time, LSF scales the accumulated run time of finished jobs using the same decay factor, so that 1 hour of recently-used time is equivalent to 0.1 hours after the specified number of hours has elapsed.

When HIST_HOURS=0, CPU time accumulated by running jobs is not decayed.

Default

5

JOB_ACCEPT_INTERVAL

Syntax

JOB_ACCEPT_INTERVAL=integer

Description

The number you specify is multiplied by the value of lsb.params MBD_SLEEP_TIME (60 seconds by default). The result of the calculation is the number of seconds to wait after dispatching a job to a host, before dispatching a second job to the same host.

If 0 (zero), a host may accept more than one job. By default, there is no limit to the total number of jobs that can run on a host, so if this parameter is set to 0, a very large number of jobs might be dispatched to a host all at once. This can overload your system to the point that it will be unable to create any more processes. It is not recommended to set this parameter to 0.

JOB_ACCEPT_INTERVAL set at the queue level (lsb.queues) overrides JOB_ACCEPT_INTERVAL set at the cluster level (lsb.params).

Note:

The parameter JOB_ACCEPT_INTERVAL only applies when there are running jobs on a host. A host running a short job which finishes before JOB_ACCEPT_INTERVAL has elapsed is free to accept a new job without waiting.

Default

1

JOB_ATTA_DIR

Syntax

JOB_ATTA_DIR=directory

Description

The shared directory in which mbatchd saves the attached data of messages posted with the bpost command.

Use JOB_ATTA_DIR if you use bpost and bread to transfer large data files between jobs and want to avoid using space in LSB_SHAREDDIR. By default, the bread command reads attachment data from the JOB_ATTA_DIR directory.

JOB_ATTA_DIR should be shared by all hosts in the cluster, so that any potential LSF master host can reach it. Like LSB_SHAREDIR, the directory should be owned and writable by the primary LSF administrator. The directory must have at least 1 MB of free space.

The attached data will be stored under the directory in the format:
JOB_ATTA_DIR/timestamp.jobid.msgs/msg$msgindex
On UNIX, specify an absolute path. For example:
JOB_ATTA_DIR=/opt/share/lsf_work
On Windows, specify a UNC path or a path with a drive letter. For example:
JOB_ATTA_DIR=\\HostA\temp\lsf_work
or
JOB_ATTA_DIR=D:\temp\lsf_work

After adding JOB_ATTA_DIR to lsb.params, use badmin reconfig to reconfigure your cluster.

Valid values

JOB_ATTA_DIR can be any valid UNIX or Windows path up to a maximum length of 256 characters.

Default

Not defined

If JOB_ATTA_DIR is not specified, job message attachments are saved in LSB_SHAREDIR/info/.

JOB_DEP_LAST_SUB

Description

Used only with job dependency scheduling.

If set to 1, whenever dependency conditions use a job name that belongs to multiple jobs, LSF evaluates only the most recently submitted job.

Otherwise, all the jobs with the specified name must satisfy the dependency condition.

Default

0

JOB_EXIT_RATE_DURATION

Description

Defines how long LSF waits before checking the job exit rate for a host. Used in conjunction with EXIT_RATE in lsb.hosts for LSF host exception handling.

If the job exit rate is exceeded for the period specified by JOB_EXIT_RATE_DURATION, LSF invokes LSF_SERVERDIR/eadmin to trigger a host exception.

Tuning

Tip:

Tune JOB_EXIT_RATE_DURATION carefully. Shorter values may raise false alarms, longer values may not trigger exceptions frequently enough.

Example

JOB_EXIT_RATE_DURATION=10

Default

5 minutes

JOB_GROUP_CLEAN

Syntax

JOB_GROUP_CLEAN=Y | N

Description

If JOB_GROUP_CLEAN = Y, implicitly created job groups that are empty and have no limits assigned to them are automatically deleted.

Default

N (Implicitly created job groups are not automatically deleted unless they are deleted manually with bgdel.)

JOB_INCLUDE_POSTPROC

Syntax

JOB_INCLUDE_POSTPROC=Y | N

Description

Specifies whether LSF includes the post-execution processing of the job as part of the job. When set to Y:
  • Prevents a new job from starting on a host until post-execution processing is finished on that host

  • Includes the CPU and run times of post-execution processing with the job CPU and run times

  • sbatchd sends both job finish status (DONE or EXIT) and post-execution processing status (POST_DONE or POST_ERR) to mbatchd at the same time

In MultiCluster job forwarding model, the JOB_INCLUDE_POSTPROC value in the receiving cluster applies to the job.

MultiCluster job lease model, the JOB_INCLUDE_POSTPROC value applies to jobs running on remote leased hosts as if they were running on local hosts.

The variable LSB_JOB_INCLUDE_POSTPROC in the user environment overrides the value of JOB_INCLUDE_POSTPROC in an application profile in lsb.applications. JOB_INCLUDE_POSTPROC in an application profile in lsb.applications overrides the value of JOB_INCLUDE_POSTPROC in lsb.params.

For SGI cpusets, if JOB_INCLUDE_POSTPROC=Y, LSF does not release the cpuset until post-execution processing has finished, even though post-execution processes are not attached to the cpuset.

Default

N (Post-execution processing is not included as part of the job, and a new job can start on the execution host before post-execution processing finishes.)

JOB_POSITION_CONTROL_BY_ADMIN

Syntax

JOB_POSITION_CONTROL_BY_ADMIN=Y | N

Description

Allows LSF administrators to control whether users can use btop and bbot to move jobs to the top and bottom of queues. When JOB_POSITION_CONTROL_BY_ADMIN=Y, only the LSF administrator (including any queue administrators) can use bbot and btop to move jobs within a queue.

Default

N

See also

bbot, btop

JOB_POSTPROC_TIMEOUT

Syntax

JOB_POSTPROC_TIMEOUT=minutes

Description

Specifies a timeout in minutes for job post-execution processing. The specified timeout must be greater than zero.

If post-execution processing takes longer than the timeout, sbatchd reports that post-execution has failed (POST_ERR status), and kills the entire process group of the job’s post-execution processes on UNIX and Linux. On Windows, only the parent process of the post-execution command is killed when the timeout expires. The child processes of the post-execution command are not killed.

If JOB_INCLUDE_POSTPROC=Y, and sbatchd kills the post-execution processes because the timeout has been reached, the CPU time of the post-execution processing is set to 0, and the job’s CPU time does not include the CPU time of post-execution processing.

JOB_POSTPROC_TIMEOUT defined in an application profile in lsb.applications overrides the value in lsb.params. JOB_POSTPROC_TIMEOUT cannot be defined in user environment.

In MultiCluster job forwarding model, the JOB_POSTPROC_TIMEOUT value in the receiving cluster applies to the job.

MultiCluster job lease model, the JOB_POSTPROC_TIMEOUT value applies to jobs running on remote leased hosts as if they were running on local hosts.

Default

2147483647 (Unlimited; post-execution processing does not time out.)

JOB_PRIORITY_OVER_TIME

Syntax

JOB_PRIORITY_OVER_TIME=increment/interval

Description

JOB_PRIORITY_OVER_TIME enables automatic job priority escalation when MAX_USER_PRIORITY is also defined.

Valid Values

increment

Specifies the value used to increase job priority every interval minutes. Valid values are positive integers.

interval

Specifies the frequency, in minutes, to increment job priority. Valid values are positive integers.

Default

-1 (Not defined.)

Example

JOB_PRIORITY_OVER_TIME=3/20

Specifies that every 20 minute interval increment to job priority of pending jobs by 3.

See also

MAX_USER_PRIORITY

JOB_RUNLIMIT_RATIO

Syntax

JOB_RUNLIMIT_RATIO=integer | 0

Description

Specifies a ratio between a job run limit and the runtime estimate specified by bsub -We or bmod -We, -We+, -Wep. The ratio does not apply to the RUNTIME parameter in lsb.applications.

This ratio can be set to 0 and no restrictions are applied to the runtime estimate.

JOB_RUNLIMIT_RATIO prevents abuse of the runtime estimate. The value of this parameter is the ratio of run limit divided by the runtime estimate.

By default, the ratio value is 0. Only administrators can set or change this ratio. If the ratio changes, it only applies to newly submitted jobs. The changed value does not retroactively reapply to already submitted jobs.

If the ratio value is greater than 0:
  • If the users specifiy a runtime estimate only (bsub -We), the job-level run limit will automatically be set to runtime_ratio * runtime_estimate. Jobs running longer than this run limit are killed by LSF. If the job-level run limit is greater than the hard run limit in the queue, the job is rejected.

  • If the users specify a runtime estimate (-We) and job run limit (-W) at job submission, and the run limit is greater than runtime_ratio * runtime_estimate, the job is rejected.

  • If the users modify the run limit to be greater than runtime_ratio, they must increase the runtime estimate first (bmod -We). Then they can increase the default run limit.

  • LSF remembers the run limit is set with bsub -W or convert from runtime_ratio * runtime_estimate. When users modify the run limit with bmod -Wn, the run limit is automatically be set to runtime_ratio * runtime_estimate If the run limit is set from runtime_ratio, LSF rejects the run limit modification.

  • If users modify the runtime estimate with bmod -We and the run limit is set by the user, the run limit is MIN(new_estimate * new_ratio, run_limit). If the run limit is set by runtime_ratio, the run limit is set to new_estimate * new_ratio.

  • If users modify the runtime estimate by using bmod -Wen and the run limit is set by the user, it is not changed. If the run limit is set by runtime_ratio, it is set to unlimited.

In MultiCluster job forwarding model, JOB_RUNLIMIT_RATIO valuese in both the sending and receiving clusters apply to the job. The run limit in the receiving cluster cannot be greater than the value of runtime * JOB_RUNLIMIT_RATIO in the receiving cluster. Some examples:
  • Run limit (for example with bsub -We) is 10, JOB_RUNLIMIT_RATIO=5 in the sending cluster, JOB_RUNLIMIT_RATIO=0 in the receiving cluster—run limit=50, and the job will run

  • Run limit (for example with bsub -We) is 10, JOB_RUNLIMIT_RATIO=5 in the sending cluster, JOB_RUNLIMIT_RATIO=3 in the receiving cluster—run limit=50, and the job will pend

  • Run limit (for example with bsub -We) is 10, JOB_RUNLIMIT_RATIO=5 in the sending cluster, JOB_RUNLIMIT_RATIO=6 in the receiving cluster—run limit=50, and the job will run

  • Run limit (for example with bsub -We) is 10, JOB_RUNLIMIT_RATIO=0 in the sending cluster, JOB_RUNLIMIT_RATIO=5 in the receiving cluster—run limit=50, and the job will run

MultiCluster job lease model, the JOB_RUNLIMIT_RATIO value applies to jobs running on remote leased hosts as if they were running on local hosts.

Default

0

JOB_SCHEDULING_INTERVAL

Syntax

JOB_SCHEDULING_INTERVAL=number [milliseconds]

Description

Time interval at which mbatchd sends jobs for scheduling to the scheduling daemon mbschd along with any collected load information. Specify in seconds, or include the keyword milliseconds (or ms) to specify in milliseconds. All values are converted to milliseconds for storage.

If set to 0, there is no interval between job scheduling sessions.

Valid Value

Number of seconds greater than or equal to zero (0).

Default

5000 milliseconds

JOB_SPOOL_DIR

Syntax

JOB_SPOOL_DIR=dir

Description

Specifies the directory for buffering batch standard output and standard error for a job.

When JOB_SPOOL_DIR is defined, the standard output and standard error for the job is buffered in the specified directory.

Files are copied from the submission host to a temporary file in the directory specified by the JOB_SPOOL_DIR on the execution host. LSF removes these files when the job completes.

If JOB_SPOOL_DIR is not accessible or does not exist, files are spooled to the default job output directory $HOME/.lsbatch.

For bsub -is and bsub -Zs, JOB_SPOOL_DIR must be readable and writable by the job submission user, and it must be shared by the master host and the submission host. If the specified directory is not accessible or does not exist, and JOB_SPOOL_DIR is specified, bsub -is cannot write to the default directory LSB_SHAREDIR/cluster_name/lsf_indir, and bsub -Zs cannot write to the default directory LSB_SHAREDIR/cluster_name/lsf_cmddir, and the job will fail.

As LSF runs jobs, it creates temporary directories and files under JOB_SPOOL_DIR. By default, LSF removes these directories and files after the job is finished. See bsub for information about job submission options that specify the disposition of these files.

On UNIX, specify an absolute path. For example:
JOB_SPOOL_DIR=/home/share/lsf_spool
On Windows, specify a UNC path or a path with a drive letter. For example:
JOB_SPOOL_DIR=\\HostA\share\spooldir
or
JOB_SPOOL_DIR=D:\share\spooldir
In a mixed UNIX/Windows cluster, specify one path for the UNIX platform and one for the Windows platform. Separate the two paths by a pipe character (|):
JOB_SPOOL_DIR=/usr/share/lsf_spool | \\HostA\share\spooldir

Valid value

JOB_SPOOL_DIR can be any valid path.

The entire path including JOB_SPOOL_DIR can up to 4094 characters on UNIX and Linux or up to 255 characters for Windows. This maximum path length includes:
  • All directory and file paths attached to the JOB_SPOOL_DIR path

  • Temporary directories and files that the LSF system creates as jobs run.

The path you specify for JOB_SPOOL_DIR should be as short as possible to avoid exceeding this limit.

Default

Not defined

Batch job output (standard output and standard error) is sent to the .lsbatch directory on the execution host:
  • On UNIX: $HOME/.lsbatch

  • On Windows: %windir%\lsbtmpuser_id\.lsbatch

If %HOME% is specified in the user environment, uses that directory instead of %windir% for spooled output.

JOB_TERMINATE_INTERVAL

Syntax

JOB_TERMINATE_INTERVAL=seconds

Description

UNIX only.

Specifies the time interval in seconds between sending SIGINT, SIGTERM, and SIGKILL when terminating a job. When a job is terminated, the job is sent SIGINT, SIGTERM, and SIGKILL in sequence with a sleep time of JOB_TERMINATE_INTERVAL between sending the signals. This allows the job to clean up if necessary.

Default

10 (seconds)

LOCAL_MAX_PREEXEC_RETRY

Syntax

LOCAL_MAX_PREEXEC_RETRY=integer

Description

The maximum number of times to attempt the pre-execution command of a job on the local cluster.

Valid values

0 < LOCAL_MAX_PREEXEC_RETRY < 2147483647

Default

2147483647 (Unlimited number of preexec retry times.)

LSB_SYNC_HOST_STAT_LIM

Syntax

LSB_SYNC_HOST_STAT_LIM=y | Y

Description

Improves the speed with which mbatchd obtains host status, and therefore the speed with which LSF reschedules rerunnable jobs: the sooner LSF knows that a host has become unavailable, the sooner LSF reschedules any rerunnable jobs executing on that host. Useful for a large cluster.

When you define this parameter, mbatchd periodically obtains the host status from the master LIM, and then verifies the status by polling each sbatchd at an interval defined by the parameters MBD_SLEEP_TIME and LSB_MAX_PROBE_SBD.

Default

N. mbatchd obtains and reports host status, without contacting the master LIM, by polling each sbatchd at an interval defined by the parameters MBD_SLEEP_TIME and LSB_MAX_PROBE_SBD.

See also

MBD_SLEEP_TIME

LSB_MAX_PROBE_SBD in lsf.conf

MAX_ACCT_ARCHIVE_FILE

Syntax

MAX_ACCT_ARCHIVE_FILE=integer

Description

Enables automatic deletion of archived LSF accounting log files and specifies the archive limit.

Compatibility

ACCT_ARCHIVE_SIZE or ACCT_ARCHIVE_AGE should also be defined.

Example

MAX_ACCT_ARCHIVE_FILE=10

LSF maintains the current lsb.acct and up to 10 archives. Every time the old lsb.acct.9 becomes lsb.acct.10, the old lsb.acct.10 gets deleted.

See also

  • ACCT_ARCHIVE_AGE also enables automatic archiving

  • ACCT_ARCHIVE_SIZE also enables automatic archiving

  • ACCT_ARCHIVE_TIME also enables automatic archiving

Default

-1 (Not defined. No deletion of lsb.acct.n files).

MAX_CONCURRENT_JOB_QUERY

Syntax

MAX_CONCURRENT_JOB_QUERY=integer

Description

Defines how many concurrent job queries mbatchd can handle.

  • If mbatchd is using multithreading, a dedicated query port is defined by the parameter LSB_QUERY_PORT in lsf.conf. When mbatchd has a dedicated query port, the value of MAX_CONCURRENT_JOB_QUERY sets the maximum number of job queries for each child mbatchd that is forked by mbatchd. This means that the total number of job queries can be more than the number specified by MAX_CONCURRENT_JOB_QUERY.

  • If mbatchd is not using multithreading, the value of MAX_CONCURRENT_JOB_QUERY sets the maximum total number of job queries.

Valid values

1-100

Default

2147483647 (Unlimited concurrent job queries.)

See also

LSB_QUERY_PORT in lsf.conf

MAX_EVENT_STREAM_FILE_NUMBER

Syntax

MAX_EVENT_STREAM_FILE_NUMBER=integer

Description

Determines the maximum number of different lsb.stream.utc files that mbatchd uses. If the number of lsb.stream.utc files reaches this number, mbatchd logs and error message to the mbd.log file and stops writing events to the lsb.stream file.

Default

10

MAX_EVENT_STREAM_SIZE

Syntax

MAX_EVENT_STREAM_SIZE=integer

Description

Determines the maximum size in MB of the lsb.stream file used by system performance analysis tools such as Platform LSF Reporting. For LSF Reporting, the recommended size is 2000 MB.

When the MAX_EVENT_STREAM_SIZE size is reached, LSF logs a special event EVENT_END_OF_STREAM, closes the stream and moves it to lsb.stream.0 and a new stream is opened.

All applications that read the file once the event EVENT_END_OF_STREAM is logged should close the file and reopen it.

Recommended value

2000 MB

Default

1024 MB

MAX_INFO_DIRS

Syntax

MAX_INFO_DIRS=num_subdirs

Description

The number of subdirectories under the LSB_SHAREDIR/cluster_name/logdir/info directory.

When MAX_INFO_DIRS is enabled, mbatchd creates the specified number of subdirectories in the info directory. These subdirectories are given an integer as its name, starting with 0 for the first subdirectory. mbatchd writes the job files of all new submitted jobs into these subdirectories using the following formula to choose the subdirectory in which to store the job file:
subdirectory = jobID % MAX_INFO_DIRS
This formula ensures an even distribution of job files across the subdirectories.
Important:

If you are using local duplicate event logging, you must run badmin mbdrestart after changing MAX_INFO_DIRS for the changes to take effect.

Valid values

0-1024

Default

0 (no subdirectories under the info directory; mbatchd writes all jobfiles to the info directory)

Example

MAX_INFO_DIRS=10

mbatchd creates ten subdirectories from LSB_SHAREDIR/cluster_name/logdir/info/0 to LSB_SHAREDIR/cluster_name/logdir/info/9.

MAX_JOB_ARRAY_SIZE

Syntax

MAX_JOB_ARRAY_SIZE=integer

Description

Specifies the maximum number of jobs in a job array that can be created by a user for a single job submission. The maximum number of jobs in a job array cannot exceed this value.

A large job array allows a user to submit a large number of jobs to the system with a single job submission.

Valid values

Specify a positive integer between 1 and 2147483646

Default

1000

MAX_JOB_ATTA_SIZE

Syntax

MAX_JOB_ATTA_SIZE=integer | 0

Specify any number less than 20000.

Description

Maximum attached data size, in KB, that can be transferred to a job.

Maximum size for data attached to a job with the bpost command. Useful if you use bpost and bread to transfer large data files between jobs and you want to limit the usage in the current working directory.

0 indicates that jobs cannot accept attached data files.

Default

2147483647 (Unlimited; LSF does not set a maximum size of job attachments.)

MAX_JOB_MSG_NUM

Syntax

MAX_JOB_MSG_NUM=integer | 0

Description

Maximum number of message slots for each job. Maximum number of messages that can be posted to a job with the bpost command.

0 indicates that jobs cannot accept external messages.

Default

128

MAX_JOB_NUM

Syntax

MAX_JOB_NUM=integer

Description

The maximum number of finished jobs whose events are to be stored in the lsb.events log file.

Once the limit is reached, mbatchd starts a new event log file. The old event log file is saved as lsb.events.n, with subsequent sequence number suffixes incremented by 1 each time a new log file is started. Event logging continues in the new lsb.events file.

Default

1000

MAX_JOB_PREEMPT

Syntax

MAX_JOB_PREEMPT=integer

Description

The maximum number of times a job can be preempted. Applies to queue-based preemption only.

Valid values

0 < MAX_JOB_PREEMPT < 2147483647

Default

2147483647 (Unlimited number of preemption times.)

MAX_JOB_REQUEUE

Syntax

MAX_JOB_REQUEUE=integer

Description

The maximum number of times to requeue a job automatically.

Valid values

0 < MAX_JOB_REQUEUE < 2147483647

Default

2147483647 (Unlimited number of requeue times.)

MAX_JOBID

Syntax

MAX_JOBID=integer

Description

The job ID limit. The job ID limit is the highest job ID that LSF will ever assign, and also the maximum number of jobs in the system.

By default, LSF assigns job IDs up to 6 digits. This means that no more than 999999 jobs can be in the system at once.

Specify any integer from 999999 to 2147483646 (for practical purposes, you can use any 10-digit integer less than this value).

You cannot lower the job ID limit, but you can raise it to 10 digits. This allows longer term job accounting and analysis, and means you can have more jobs in the system, and the job ID numbers will roll over less often.

LSF assigns job IDs in sequence. When the job ID limit is reached, the count rolls over, so the next job submitted gets job ID "1". If the original job 1 remains in the system, LSF skips that number and assigns job ID "2", or the next available job ID. If you have so many jobs in the system that the low job IDs are still in use when the maximum job ID is assigned, jobs with sequential numbers could have totally different submission times.

Example

MAX_JOBID=125000000

Default

999999

MAX_JOBINFO_QUERY_PERIOD

Syntax

MAX_JOBINFO_QUERY_PERIOD=integer

Description

Maximum time for job information query commands (for example, with bjobs) to wait.

When the time arrives, the query command processes exit, and all associated threads are terminated.

If the parameter is not defined, query command processes will wait for all threads to finish.

Specify a multiple of MBD_REFRESH_TIME.

Valid values

Any positive integer greater than or equal to one (1)

Default

2147483647 (Unlimited wait time.)

See also

LSB_BLOCK_JOBINFO_TIMEOUT in lsf.conf

MAX_PEND_JOBS

Syntax

MAX_PEND_JOBS=integer

Description

The maximum number of pending jobs in the system.

This is the hard system-wide pending job threshold. No user or user group can exceed this limit unless the job is forwarded from a remote cluster.

If the user or user group submitting the job has reached the pending job threshold as specified by MAX_PEND_JOBS, LSF will reject any further job submission requests sent by that user or user group. The system will continue to send the job submission requests with the interval specified by SUB_TRY_INTERVAL in lsb.params until it has made a number of attempts equal to the LSB_NTRIES environment variable. If LSB_NTRIES is not defined and LSF rejects the job submission request, the system will continue to send the job submission requests indefinitely as the default behavior.

Default

2147483647 (Unlimited number of pending jobs.)

See also

SUB_TRY_INTERVAL

MAX_PREEXEC_RETRY

Syntax

MAX_PREEXEC_RETRY=integer

Description

MultiCluster job forwarding model only. The maximum number of times to attempt the pre-execution command of a job from a remote cluster.

If the job's pre-execution command fails all attempts, the job is returned to the submission cluster.

Valid values

0 < MAX_PREEXEC_RETRY < 2147483647

Default

5

MAX_SBD_CONNS

Syntax

MAX_SBD_CONNS=integer

Description

The maximum number of file descriptors mbatchd can have open and connected concurrently to sbatchd

Controls the maximum number of connections that LSF can maintain to sbatchds in the system.

Do not exceed the file descriptor limit of the root process (the usual limit is 1024). Setting it equal or larger than this limit can cause mbatchd to constantly die because mbatchd allocates all file descriptors to sbatchd connection. This could cause mbatchd to run out of descriptors, which results in an mbatchd fatal error, such as failure to open lsb.events.

Use together with LSB_MAX_JOB_DISPATCH_PER_SESSION in lsf.conf.

Example

A reasonable setting is:

MAX_SBD_CONNS=768

For a large cluster, specify a value equal to the number of hosts in your cluster plus a buffer. For example, if your cluster includes 4000 hosts:MAX_SBD_CONNS=4100
Important:

Set LSB_MAX_JOB_DISPATCH_PER_SESSION in lsf.conf equal to one-half the value of MAX_SBD_CONNS.

Default

64

MAX_SBD_FAIL

Syntax

MAX_SBD_FAIL=integer

Description

The maximum number of retries for reaching a non-responding slave batch daemon, sbatchd.

The interval between retries is defined by MBD_SLEEP_TIME. If mbatchd fails to reach a host and has retried MAX_SBD_FAIL times, the host is considered unreachable.

If you define LSB_SYNC_HOST_STAT_LIM=Y, mbatchd obtains the host status from the master LIM before it polls sbatchd. When the master LIM reports that a host is unavailable (LIM is down) or unreachable (sbatchd is down) MAX_SBD_FAIL number of times, mbatchd reports the host status as unavailable or unreachable.

When a host becomes unavailable, mbatchd assumes that all jobs running on that host have exited and that all rerunnable jobs (jobs submitted with the bsub -r option) are scheduled to be rerun on another host.

Default

3

MAX_USER_PRIORITY

Syntax

MAX_USER_PRIORITY=integer

Description

Enables user-assigned job priority and specifies the maximum job priority a user can assign to a job.

LSF and queue administrators can assign a job priority higher than the specified value for jobs they own.

Compatibility

User-assigned job priority changes the behavior of btop and bbot.

Example

MAX_USER_PRIORITY=100

Specifies that 100 is the maximum job priority that can be specified by a user.

Default

-1 (Not defined.)

See also

  • bsub, bmod, btop, bbot

  • JOB_PRIORITY_OVER_TIME

MBD_EGO_CONNECT_TIMEOUT

Syntax

MBD_EGO_CONNECT_TIMEOUT=seconds

Description

For EGO-enabled SLA scheduling, timeout parameter for network I/O connection with EGO vemkd.

Default

0 seconds

MBD_EGO_READ_TIMEOUT

Syntax

MBD_EGO_READ_TIMEOUT=seconds

Description

For EGO-enabled SLA scheduling, timeout parameter for network I/O read from EGO vemkd after connection with EGO.

Default

0 seconds

MBD_EGO_TIME2LIVE

Syntax

MBD_EGO_TIME2LIVE=minutes

Description

For EGO-enabled SLA scheduling, specifies how long EGO should keep information about host allocations in case mbatchd restarts,

Default

0 minutes

MBD_QUERY_CPUS

Syntax

MBD_QUERY_CPUS=cpu_list

cpu_list defines the list of master host CPUS on which the mbatchd child query processes can run. Format the list as a white-space delimited list of CPU numbers.

For example, if you specify
MBD_QUERY_CPUS=1 2 3

the mbatchd child query processes will run only on CPU numbers 1, 2, and 3 on the master host.

Description

This parameter allows you to specify the master host CPUs on which mbatchd child query processes can run (hard CPU affinity). This improves mbatchd scheduling and dispatch performance by binding query processes to specific CPUs so that higher priority mbatchd processes can run more efficiently.

When you define this parameter, LSF runs mbatchd child query processes only on the specified CPUs. The operating system can assign other processes to run on the same CPU; however, if utilization of the bound CPU is lower than utilization of the unbound CPUs.

Important

  1. You can specify CPU affinity only for master hosts that use one of the following operating systems:
    • Linux 2.6 or higher

    • Solaris 8 or higher

  2. If failover to a master host candidate occurs, LSF maintains the hard CPU affinity, provided that the master host candidate has the same CPU configuration as the original master host. If the configuration differs, LSF ignores the CPU list and reverts to default behavior.

Related parameters

To improve scheduling and dispatch performance of all LSF daemons, you should use MBD_QUERY_CPUS together with EGO_DAEMONS_CPUS (in ego.conf), which controls LIM CPU allocation, and LSF_DAEMONS_CPUS, which binds mbatchd and mbschd daemon processes to specific CPUs so that higher priority daemon processes can run more efficiently. To get best performance, CPU allocation for all four daemons should be assigned their own CPUs. For example, on a 4 CPU SMP host, the following configuration will give the best performance:
EGO_DAEMONS_CPUS=0 LSF_DAEMONS_CPUS=1:2 MBD_QUERY_CPUS=3

Default

Not defined

See also

LSF_DAEMONS_CPUS in lsf.conf

MBD_REFRESH_TIME

Syntax

MBD_REFRESH_TIME=seconds [min_refresh_time]

where min_refresh_time defines the minimum time (in seconds) that the child mbatchd will stay to handle queries. The valid range is 0 - 300.

Description

Time interval, in seconds, when mbatchd will fork a new child mbatchd to service query requests to keep information sent back to clients updated. A child mbatchd processes query requests creating threads.

MBD_REFRESH_TIME applies only to UNIX platforms that support thread programming.

To enable MBD_REFRESH_TIME you must specify LSB_QUERY_PORT in lsf.conf. The child mbatchd listens to the port number specified by LSB_QUERY_PORT and creates threads to service requests until the job changes status, a new job is submitted, or MBD_REFRESH_TIME has expired.
  • If MBD_REFRESH_TIME is < min_refresh_time, the child mbatchd exits at MBD_REFRESH_TIME even if the job changes status or a new job is submitted before MBD_REFRESH_TIME expires.

  • If MBD_REFRESH_TIME > min_refresh_time:
    • the child mbatchd exits at min_refresh_time if a job changes status or a new job is submitted before the min_refresh_time

    • the child mbatchd exits after the min_refresh_time when a job changes status or a new job is submitted

  • If MBD_REFRESH_TIME > min_refresh_time and no job changes status or a new job is submitted, the child mbatchd exits at MBD_REFRESH_TIME

The value of this parameter must be between 0 and 300. Any values specified out of this range are ignored, and the system default value is applied.

The bjobs command may not display up-to-date information if two consecutive query commands are issued before a child mbatchd expires because child mbatchd job information is not updated. If you use the bjobs command and do not get up-to-date information, you may need to decrease the value of this parameter. Note, however, that the lower the value of this parameter, the more you negatively affect performance.

The number of concurrent requests is limited by the number of concurrent threads that a process can have. This number varies by platform:
  • Sun Solaris, 2500 threads per process

  • AIX, 512 threads per process

  • Digital, 256 threads per process

  • HP-UX, 64 threads per process

Default

5 seconds if min_refresh_time is not defined or if MBD_REFRESH_TIME is defined value is less than 5; 300 seconds if the defined value is more than 300

min_refresh_time default is 10 seconds

See also

LSB_QUERY_PORT in lsf.conf

MBD_SLEEP_TIME

Syntax

MBD_SLEEP_TIME=seconds

Description

Used in conjunction with the parameters SLOT_RESERVE, MAX_SBD_FAIL, and JOB_ACCEPT_INTERVAL

Amount of time in seconds used for calculating parameter values.

Default

If not defined, 60 seconds. MBD_SLEEP_TIME is set at installation to 20 seconds.

MBD_USE_EGO_MXJ

Syntax

MBD_USE_EGO_MXJ=Y | N

Description

By default, when EGO-enabled SLA scheduling is configured, EGO allocates an entire host to LSF, which uses its own MXJ definition to determine how many slots are available on the host. LSF gets its host allocation from EGO, and runs as many jobs as the LSF configured MXJ for that host dictates.

MBD_USE_EGO_MXJ forces LSF to use the job slot maximum configured in the EGO consumer. This allows partial sharing of hosts (for example, a large SMP computer) among different consumers or workload managers. When MBD_USE_EGO_MXJ is set, LSF schedules jobs based on the number of slots allocated from EGO. For example, if hostA has 4 processors, but EGO allocates 2 slots to an EGO-enabled SLA consumer. LSF can schedule a maximum of 2 jobs from that SLA on hostA.

Default

N (mbatcthd uses the LSF MXJ)

MC_PENDING_REASON_PKG_SIZE

Syntax

MC_PENDING_REASON_PKG_SIZE=kilobytes | 0

Description

MultiCluster job forwarding model only. Pending reason update package size, in KB. Defines the maximum amount of pending reason data this cluster will send to submission clusters in one cycle.

Specify the keyword 0 (zero) to disable the limit and allow any amount of data in one package.

Default

512

MC_PENDING_REASON_UPDATE_INTERVAL

Syntax

MC_PENDING_REASON_UPDATE_INTERVAL=seconds | 0

Description

MultiCluster job forwarding model only. Pending reason update interval, in seconds. Defines how often this cluster will update submission clusters about the status of pending MultiCluster jobs.

Specify the keyword 0 (zero) to disable pending reason updating between clusters.

Default

300

MC_PLUGIN_SCHEDULE_ENHANCE

Syntax

MC_PLUGIN_SCHEDULE_ENHANCE= RESOURCE_ONLY

MC_PLUGIN_SCHEDULE_ENHANCE= COUNT_PREEMPTABLE [HIGH_QUEUE_PRIORITY] [PREEMPTABLE_QUEUE_PRIORITY] [PENDING_WHEN_NOSLOTS]

Note:

When any one of HIGH_QUEUE_PRIORITY, PREEMPTABLE_QUEUE_PRIORITY or PENDING_WHEN_NOSLOTS is defined, COUNT_PREEMPTABLE is enabled automatically.

Description

MultiCluster job forwarding model only. The parameter MC_PLUGIN_SCHEDULE_ENHANCE enhances the scheduler for the MultiCluster job forwarding model based on the settings selected. Use in conjunction with MC_PLUGIN_UPDATE_INTERVAL to set the data update interval between remote clusters. MC_PLUGIN_UPDATE_INTERVAL must be a non-zero value to enable the MultiCluster enhanced scheduler.

With the parameter MC_PLUGIN_SCHEDULE_ENHANCE set to a valid value, remote resources are considered as if MC_PLUGIN_REMOTE_RESOURCE=Y regardless of the actual setting. In addition the submission cluster scheduler considers specific execution queue resources when scheduling jobs. See Using Platform LSF MultiCluster for details.

Note:

The parameter MC_PLUGIN_SCHEDULE_ENHANCE was introduced in LSF Version 7 Update 6. All clusters within a MultiCluster configuration must be running a version of LSF containing this parameter to enable the enhanced scheduler.

After a MultiCluster connection is established, counters take the time set in MC_PLUGIN_UPDATE_INTERVAL to update. Scheduling decisions made before this first interval has passed do not accurately account for remote queue workload.

Default

Not defined.

The enhanced scheduler is not used. If MC_PLUGIN_REMOTE_RESOURCE=Y in lsf.conf remote resource availability is considered before jobs are forwarded to the queue with the most available slots.

See also

MC_PLUGIN_UPDATE_INTERVAL in lsb.params.

MC_PLUGIN_REMOTE_RESOURCE in lsf.conf.

MC_PLUGIN_UPDATE_INTERVAL

Syntax

MC_PLUGIN_UPDATE_INTERVAL=seconds | 0

Description

MultiCluster job forwarding model only; set for the execution cluster. The number of seconds between data updates between clusters.

A non-zero value enables collection of remote cluster queue data for use by the submission cluster enhanced scheduler.

Suggested value when enabled is MBD_SLEEP_TIME (default is 20 seconds).

A value of 0 disables collection of remote cluster queue data.

Default

0

See Also

MC_PLUGIN_SCHEDULE_ENHANCE in lsf.params.

MC_RECLAIM_DELAY

Syntax

MC_RECLAIM_DELAY=minutes

Description

MultiCluster resource leasing model only. The reclaim interval (how often to reconfigure shared leases) in minutes.

Shared leases are defined by Type=shared in the lsb.resources HostExport section.

Default

10 (minutes)

MC_RUSAGE_UPDATE_INTERVAL

Syntax

MC_RUSAGE_UPDATE_INTERVAL=seconds

Description

MultiCluster only. Enables resource use updating for MultiCluster jobs running on hosts in the cluster and specifies how often to send updated information to the submission or consumer cluster.

Default

300

MIN_SWITCH_PERIOD

Syntax

MIN_SWITCH_PERIOD=seconds

Description

The minimum period in seconds between event log switches.

Works together with MAX_JOB_NUM to control how frequently mbatchd switches the file. mbatchd checks if MAX_JOB_NUM has been reached every MIN_SWITCH_PERIOD seconds. If mbatchd finds that MAX_JOB_NUM has been reached, it switches the events file.

To significantly improve the performance of mbatchd for large clusters, set this parameter to a value equal to or greater than 600. This causes mbatchd to fork a child process that handles event switching, thereby reducing the load on mbatchd. mbatchd terminates the child process and appends delta events to new events after the MIN_SWITCH_PERIOD has elapsed.

Default

0

No minimum period. Log switch frequency is not restricted.

See also

MAX_JOB_NUM

NEWJOB_REFRESH

Syntax

NEWJOB_REFRESH=Y | N

Description

Enables a child mbatchd to get up to date information about new jobs from the parent mbatchd. When set to Y, job queries with bjobs display new jobs submitted after the child mbatchd was created.

If you have enabled multithreaded mbatchd support, the bjobs command may not display up-to-date information if two consecutive query commands are issued before a child mbatchd expires because child mbatchd job information is not updated. Use NEWJOB_REFRESH=Y to enable the parent mbatchd to push new job information to a child mbatchd

When NEWJOB_REFRESH=Y, as users submit new jobs, the parent mbatchd pushes the new job event to the child mbatchd. The parent mbatchd transfers the following kinds of new jobs to the child mbatchd:
  • Newly submitted jobs

  • Restarted jobs

  • Remote lease model jobs from the submission cluster

  • Remote forwarded jobs from the submission cluster

When NEWJOB_REFRESH=Y, you should set MBD_REFRESH_TIME to a value greater than 10 seconds.

Required parameters

LSB_QUERY_PORT must be enabled in lsf.conf.

Restrictions

The parent mbatchd only pushes the new job event to a child mbatchd. The child mbatchd is not aware of status changes of existing jobs. The child mbatchd will not reflect the results of job control commands (bmod, bmig, bswitch, btop, bbot, brequeue, bstop, bresume, and so on) invoked after the child mbatchd is created.

Default

N (Not defined. New jobs are not pushed to the child mbatchd.)

See also

MBD_REFRESH_TIME

NO_PREEMPT_FINISH_TIME

Syntax

NO_PREEMPT_FINISH_TIME=minutes | percentage

Description

Prevents preemption of jobs that will finish within the specified number of minutes or the specified percentage of the estimated run time or run limit.

Specifies that jobs due to finish within the specified number of minutes or percentage of job duration should not be preempted, where minutes is wall-clock time, not normalized time. Percentage must be greater than 0 or less than 100% (between 1% and 99%).

For example, if the job run limit is 60 minutes and NO_PREEMPT_FINISH_TIME=10%, the job cannot be preempted after it running 54 minutes or longer.

If you specify percentage for NO_PREEMPT_RUN_TIME, requires a run time (bsub -We or RUNTIME in lsb.applications), or run limit to be specified for the job (bsub -W, or RUNLIMIT in lsb.queues, or RUNLIMIT in lsb.applications)

Default

-1 (Not defined.)

NO_PREEMPT_RUN_TIME

Syntax

NO_PREEMPT_RUN_TIME=minutes | percentage

Description

Prevents preemption of jobs that have been running for the specified number of minutes or the specified percentage of the estimated run time or run limit.

Specifies that jobs that have been running for the specified number of minutes or longer should not be preempted, where minutes is wall-clock time, not normalized time. Percentage must be greater than 0 or less than 100% (between 1% and 99%).

For example, if the job run limit is 60 minutes and NO_PREEMPT_RUN_TIME=50%, the job cannot be preempted after it running 30 minutes or longer.

If you specify percentage for NO_PREEMPT_RUN_TIME, requires a run time (bsub -We or RUNTIME in lsb.applications), or run limit to be specified for the job (bsub -W, or RUNLIMIT in lsb.queues, or RUNLIMIT in lsb.applications)

Default

-1 (Not defined.)

NQS_QUEUES_FLAGS

Syntax

NQS_QUEUES_FLAGS=integer

Description

For Cray NQS compatibility only. Used by LSF to get the NQS queue information.

If the NQS version on a Cray is NQS 1.1, 80.42 or NQS 71.3, this parameter does not need to be defined.

For other versions of NQS on Cray, define both NQS_QUEUES_FLAGS and NQS_REQUESTS_FLAGS.

To determine the value of this parameter, run the NQS qstat command. The value of Npk_int[1] in the output is the value you need for this parameter. Refer to the NQS chapter in Administering Platform LSF for more details.

Default

2147483647 (Not defined.)

NQS_REQUESTS_FLAGS

Syntax

NQS_REQUESTS_FLAGS=integer

Description

For Cray NQS compatibility only.

If the NQS version on a Cray is NQS 80.42 or NQS 71.3, this parameter does not need to be defined.

If the version is NQS 1.1 on a Cray, set this parameter to 251918848. This is the is the qstat flag that LSF uses to retrieve requests on Cray in long format.

For other versions of NQS on a Cray, run the NQS qstat command. The value of Npk_int[1] in the output is the value you need for this parameter. Refer to the NQS chapter in Administering Platform LSF for more details.

Default

2147483647 (Not defined.)

PARALLEL_SCHED_BY_SLOT

Syntax

PARALLEL_SCHED_BY_SLOT=y | Y

Description

If defined, LSF schedules jobs based on the number of slots assigned to the hosts instead of the number of CPUs. These slots can be defined by host in lsb.hosts or by slot limit in lsb.resources.

All slot-related messages still show the word “processors”, but actually refer to “slots” instead. Similarly, all scheduling activities also use slots instead of processors.

Default

N (Disabled.)

See also

  • JL/U and MXJ in lsb.hosts

  • SLOTS and SLOTS_PER_PROCESSOR in lsb.resources

PEND_REASON_MAX_JOBS

Syntax

PEND_REASON_MAX_JOBS=integer

Description

Number of jobs for each user per queue for which pending reasons are calculated by the scheduling daemon mbschd. Pending reasons are calculated at a time period set by PEND_REASON_UPDATE_INTERVAL.

Default

20 jobs

PEND_REASON_UPDATE_INTERVAL

Syntax

PEND_REASON_UPDATE_INTERVAL=seconds

Description

Time interval that defines how often pending reasons are calculated by the scheduling daemon mbschd.

Default

30 seconds

PG_SUSP_IT

Syntax

PG_SUSP_IT=seconds

Description

The time interval that a host should be interactively idle (it > 0) before jobs suspended because of a threshold on the pg load index can be resumed.

This parameter is used to prevent the case in which a batch job is suspended and resumed too often as it raises the paging rate while running and lowers it while suspended. If you are not concerned with the interference with interactive jobs caused by paging, the value of this parameter may be set to 0.

Default

180 seconds

PREEMPT_FOR

Syntax

PREEMPT_FOR=[GROUP_JLP] [GROUP_MAX] [HOST_JLU] [LEAST_RUN_TIME] [MINI_JOB] [USER_JLP] [OPTIMAL_MINI_JOB]

Description

If preemptive scheduling is enabled, this parameter is used to disregard suspended jobs when determining if a job slot limit is exceeded, to preempt jobs with the shortest running time, and to optimize preemption of parallel jobs.

If preemptive scheduling is enabled, more lower-priority parallel jobs may be preempted than necessary to start a high-priority parallel job. Both running and suspended jobs are counted when calculating the number of job slots in use, except for the following limits:
  • The total job slot limit for hosts, specified at the host level

  • Total job slot limit for individual users, specified at the user level—by default, suspended jobs still count against the limit for user groups

Specify one or more of the following keywords. Use spaces to separate multiple keywords.

GROUP_JLP

Counts only running jobs when evaluating if a user group is approaching its per-processor job slot limit (SLOTS_PER_PROCESSOR, USERS, and PER_HOST=all in the lsb.resources file). Suspended jobs are ignored when this keyword is used.

GROUP_MAX

Counts only running jobs when evaluating if a user group is approaching its total job slot limit (SLOTS, PER_USER=all, and HOSTS in the lsb.resources file). Suspended jobs are ignored when this keyword is used. When preemptive scheduling is enabled, suspended jobs never count against the total job slot limit for individual users.

HOST_JLU

Counts only running jobs when evaluating if a user or user group is approaching its per-host job slot limit (SLOTS and USERS in the lsb.resources file). Suspended jobs are ignored when this keyword is used.

LEAST_RUN_TIME

Preempts the job that has been running for the shortest time. Run time is wall-clock time, not normalized run time.

MINI_JOB

Optimizes the preemption of parallel jobs by preempting only enough parallel jobs to start the high-priority parallel job.

OPTIMAL_MINI_JOB

Optimizes preemption of parallel jobs by preempting only low-priority parallel jobs based on the least number of jobs that will be suspended to allow the high-priority parallel job to start.

User limits and user group limits can interfere with preemption optimization of OPTIMAL_MINI_JOB. You should not configure OPTIMAL_MINI_JOB if you have user or user group limits configured.

You should configure PARALLEL_SCHED_BY_SLOT=Y when using OPTIMAL_MINI_JOB.

USER_JLP

Counts only running jobs when evaluating if a user is approaching their per-processor job slot limit (SLOTS_PER_PROCESSOR, USERS, and PER_HOST=all in the lsb.resources file). Suspended jobs are ignored when this keyword is used. Ignores suspended jobs when calculating the user-processor job slot limit for individual users. When preemptive scheduling is enabled, suspended jobs never count against the total job slot limit for individual users.

Default

0 (The parameter is not defined.)

Both running and suspended jobs are included in job slot limit calculations, except for job slots limits for hosts and individual users where only running jobs are ever included.

PREEMPT_JOBTYPE

Syntax

PREEMPT_JOBTYPE=[EXCLUSIVE] [BACKFILL]

Description

If preemptive scheduling is enabled, this parameter enables preemption of exclusive and backfill jobs.

Specify one or both of the following keywords. Separate keywords with a space.

EXCLUSIVE

Enables preemption of and preemption by exclusive jobs. LSB_DISABLE_LIMLOCK_EXCL=Y in lsf.conf must also be defined.

BACKFILL

Enables preemption of backfill jobs. Jobs from higher priority queues can preempt jobs from backfill queues that are either backfilling reserved job slots or running as normal jobs.

Default

Not defined. Exclusive and backfill jobs are only preempted if the exclusive low priority job is running on a different host than the one used by the preemptive high priority job.

PREEMPTABLE_RESOURCES

Syntax

PREEMPTABLE_RESOURCES=resource_name1 [resource_name2] [resource_name3] ...

Description

Enables license preemption when preemptive scheduling is enabled (has no effect if PREEMPTIVE is not also specified) and specifies the licenses that will be preemption resources. Specify shared numeric resources, static or decreasing, that LSF is configured to release (RELEASE=Y in lsf.shared, which is the default).

You must also configure LSF preemption actions to make the preempted application releases its licenses. To kill preempted jobs instead of suspending them, set TERMINATE_WHEN=PREEMPT in lsb.queues, or set JOB_CONTROLS in lsb.queues and specify brequeue as the SUSPEND action.

Default

Not defined (if preemptive scheduling is configured, LSF preempts on job slots only)

PREEMPTION_WAIT_TIME

Syntax

PREEMPTION_WAIT_TIME=seconds

Description

Platform LSF License Scheduler only. You must also specify PREEMPTABLE_RESOURCES in lsb.params).

The amount of time LSF waits, after preempting jobs, for preemption resources to become available. Specify at least 300 seconds.

If LSF does not get the resources after this time, LSF might preempt more jobs.

Default

300 (seconds)

PRIVILEGED_USER_FORCE_BKILL

Syntax

PRIVILEGED_USER_FORCE_BKILL=y | Y

Description

If Y, only root or the LSF administrator can successfully run bkill -r. For any other users, -r is ignored. If not defined, any user can run bkill -r.

Default

Not defined.

REMOTE_MAX_PREEXEC_RETRY

Syntax

REMOTE_MAX_PREEXEC_RETRY=integer

Description

The maximum number of times to attempt the pre-execution command of a job on the remote cluster.

Valid values

0 < REMOTE_MAX_PREEXEC_RETRY < 2147483647

Default

5

RESOURCE_RESERVE_PER_SLOT

Syntax

RESOURCE_RESERVE_PER_SLOT=y | Y

Description

If Y, mbatchd reserves resources based on job slots instead of per-host.

By default, mbatchd only reserves resources for parallel jobs on a per-host basis. For example, by default, the command:
bsub -n 4 -R "rusage[mem=500]" -q reservation my_job

requires the job to reserve 500 MB on each host where the job runs.

Some parallel jobs need to reserve resources based on job slots, rather than by host. In this example, if per-slot reservation is enabled by RESOURCE_RESERVE_PER_SLOT, the job my_job must reserve 500 MB of memory for each job slot (4*500=2 GB) on the host in order to run.

If RESOURCE_RESERVE_PER_SLOT is set, the following command reserves the resource my_resource on all 4 job slots instead of only 1 on the host where the job runs:
bsub -n 4 -R "my_resource > 0 rusage[my_resource=1]" myjob

Default

N (Not defined; reserve resources per-host.)

RUN_JOB_FACTOR

Syntax

RUN_JOB_FACTOR=number

Description

Used only with fairshare scheduling. Job slots weighting factor.

In the calculation of a user’s dynamic share priority, this factor determines the relative importance of the number of job slots reserved and in use by a user.

Default

3.0

RUN_TIME_FACTOR

Syntax

RUN_TIME_FACTOR=number

Description

Used only with fairshare scheduling. Run time weighting factor.

In the calculation of a user’s dynamic share priority, this factor determines the relative importance of the total run time of a user’s running jobs.

Default

0.7

SBD_SLEEP_TIME

Syntax

SBD_SLEEP_TIME=seconds

Description

The interval at which LSF checks the load conditions of each host, to decide whether jobs on the host must be suspended or resumed.

The job-level resource usage information is updated at a maximum frequency of every SBD_SLEEP_TIME seconds.

The update is done only if the value for the CPU time, resident memory usage, or virtual memory usage has changed by more than 10 percent from the previous update or if a new process or process group has been created.

Default

SBD_SLEEP_TIME is set at installation to 15 seconds. If not defined, 30 seconds.

SCHED_METRIC_ENABLE

Syntax

SCHED_METRIC_ENABLE=Y | N

Description

Enable scheduler performance metric collection.

Use badmin perfmon stop and badmin perfmon start to dynamically control performance metric collection.

The update is done only if the value for the CPU time, resident memory usage, or virtual memory usage has changed by more than 10 percent from the previous update or if a new process or process group has been created.

Default

N

SCHED_METRIC_SAMPLE_PERIOD

Syntax

SCHED_METRIC_SAMPLE_PERIOD=seconds

Description

Set a default performance metric sampling period in seconds.

Cannot be less than 60 seconds.

Use badmin perfmon setperiod to dynamically change performance metric sampling period.

Default

60 seconds

SLA_TIMER

Syntax

SLA_TIMER=seconds

Description

For EGO-enabled SLA scheduling. Controls how often each service class is evaluated and a network message is sent to EGO communicating host demand.

Valid values

Positive integer between 2 and 21474847

Default

0 (Not defined.)

SUB_TRY_INTERVAL

Syntax

SUB_TRY_INTERVAL=integer

Description

The number of seconds for the requesting client to wait before resubmitting a job. This is sent by mbatchd to the client.

Default

60 seconds

See also

MAX_PEND_JOBS

SYSTEM_MAPPING_ACCOUNT

Syntax

SYSTEM_MAPPING_ACCOUNT=user_account

Description

Enables Windows workgroup account mapping, which allows LSF administrators to map all Windows workgroup users to a single Windows system account, eliminating the need to create multiple users and passwords in LSF. Users can submit and run jobs using their local user names and passwords, and LSF runs the jobs using the mapped system account name and password. With Windows workgroup account mapping, all users have the same permissions because all users map to the same system account.

To specify the user account, include the domain name in uppercase letters (DOMAIN_NAME\user_name).

Define this parameter for LSF Windows Workgroup installations only.

Default

Not defined

USE_SUSP_SLOTS

Syntax

USE_SUSP_SLOTS=Y | N

Description

If USE_SUSP_SLOTS=Y, allows jobs from a low priority queue to use slots held by suspended jobs in a high priority queue, which has a preemption relation with the low priority queue.

Set USE_SUSP_SLOTS=N to prevent low priority jobs from using slots held by suspended jobs in a high priority queue, which has a preemption relation with the low priority queue.

Default

Y

Automatic time-based configuration

Variable configuration is used to automatically change LSF configuration based on time windows. You define automatic configuration changes in lsb.params by using if-else constructs and time expressions. After you change the files, reconfigure the cluster with the badmin reconfig command.

The expressions are evaluated by LSF every 10 minutes based on mbatchd start time. When an expression evaluates true, LSF dynamically changes the configuration based on the associated configuration statements. Reconfiguration is done in real time without restarting mbatchd, providing continuous system availability.

Example

# if 18:30-19:30 is your short job express period, but 
# you want all jobs going to the short queue by default
# and be subject to the thresholds of that queue
# for all other hours, normal is the default queue
#if time(18:30-19:30)
DEFAULT_QUEUE=short
#else
DEFAULT_QUEUE=normal
#endif