Enhanced scheduler decisions can be customized to consider characteristics of remote queues before forwarding a job. Remote queue attributes such as queue priority, number of preemptable jobs, and queue workload are sent to the submission scheduler. The decisions made by the scheduler, based on this information, depend on the setting of MC_PLUGIN_SCHEDULE_ENHANCE in lsb.params.
Queue workload and configuration is considered in conjunction with remote resource availability (MC_PLUGIN_REMOTE_RESOURCE=Y is automatically set in lsf.conf).
The submission cluster receives up-to-date information about each queue in remote clusters. This information is considered during job forwarding decisions.
Queue information is collected by the submission cluster when MC_PLUGIN_SCHEDULE_ENHANCE (on the submission cluster) is set to a valid value. Information is sent by each execution cluster when MC_PLUGIN_UPDATE_INTERVAL (on the execution cluster) is defined, and the submission cluster is collecting queue information.
Some jobs may be forwarded between counter update intervals. The submission scheduler increments locally stored counter information as jobs are forwarded, and reconciles incoming counter updates to account for all jobs.
The following counter information is collected for each queue:
total slots: The total number of slots (on all hosts) jobs are dispatched to from this queue. This includes slots on hosts with the status ok, and with the status closed due to running jobs.
available slots: The free slots, or slots (out of the total slots) which do not currently have a job running.
running slots: The number of slots currently running jobs from the queue.
pending slots: The number of slots required by jobs pending on the queue.
preemptable available slots: The number of slots the queue can access through preemption.
After a MultiCluster connection is established, counters take the time set in MC_PLUGIN_UPDATE_INTERVAL to update. Scheduling decisions made before this first interval has passed do not accurately account for remote queue workload.
The parameter MC_PLUGIN_SCHEDULE_ENHANCE was introduced in LSF Version 7 Update 6. All clusters within a MultiCluster configuration must be running a version of LSF containing this parameter to enable the enhanced scheduler.
The information considered by the job-forwarding scheduler when accounting for workload and remote resources depends on the setting of MC_PLUGIN_SCHEDULE_ENHANCE in lsb.params. Valid settings for this parameter are:
Jobs are forwarded to the remote queue with the requested resources and the largest (available slots)-(pending slots).
Jobs are forwarded as with RESOURCE_ONLY, but if no appropriate queues have free slots, the best queue is selected based on the largest (preemptable available slots)-(pending slots).
COUNT_PREEMPTABLE with HIGH_QUEUE_PRIORITY
Jobs are forwarded as with COUNT_PREEMPTABLE, but jobs are forwarded to the highest priority remote queue.
COUNT_PREEMPTABLE with PREEMPTABLE_QUEUE_PRIORITY
Jobs are forwarded as with COUNT_PREEMPTABLE, but queue selection is based on which queues can preempt lowest priority queue jobs.
COUNT_PREEMPTABLE with PENDING_WHEN_NOSLOTS
Jobs are forwarded as with COUNT_PREEMPTABLE, but if no queues have free slots even after preemption, submitted jobs pend.
COUNT_PREEMPTABLE with HIGH_QUEUE_PRIORITY and PREEMPTABLE_QUEUE_PRIORITY
COUNT_PREEMPTABLE with HIGH_QUEUE_PRIORITY and PENDING_WHEN_NOSLOTS
If no appropriate queues have free slots, queues with free slots after jobs are preempted are considered.
If no queues have free slots even after preemption, submitted jobs pend.
COUNT_PREEMPTABLE with PREEMPTABLE_QUEUE_PRIORITY and PENDING_WHEN_NOSLOTS
If no queues have free slots even after preemption, submitted jobs pend.
COUNT_PREEMPTABLE with HIGH_QUEUE_PRIORITY and PREEMPTABLE_QUEUE_PRIORITY and PENDING_WHEN_NOSLOTS
If no queues have free slots even after preemption, submitted jobs pend.
The figure shown illustrates the scheduler decision-making process for valid settings of MC_PLUGIN_SCHEDULE_ENHANCE.
When an advance reservation is active on a remote cluster, slots within the advance reservation are excluded from the number of available slots. Inactive advance reservations do not affect the number of available slots since the slots may still be available for backfill jobs.
Hosts in a hostgroup configured without the required same boolean resources can cause ineffectual job-forwarding decisions from the scheduler.
For example, a job may be forwarded to a queue accessing a hostgroup with many slots available, only some of which have the boolean resource required. If there are not enough slots to run the job it will return to the submission cluster, which may continue forwarding the same job back to the same queue.
A remote queue hostgroup satisfies host type requirements when any one of the hosts available is the host type requested by a job. As for boolean resources, the submission cluster assumes all slots within a hostgroup are of the same host type. Other hostgroup configurations can result in unexpected job-forwarding decisions.
Submission cluster scheduler considers whether remote resources exist, and only forwards jobs to a queue with free slots or space in the MultiCluster pending job threshold (IMPT_JOBBKLG).
If no appropriate queues with free slots or space for new pending jobs are found, the best queue is selected based on the number of preemptable jobs and the pending job workload.
Submission cluster scheduler considers whether remote resources exist, and only forwards jobs to a queue with free slots or space in the MultiCluster pending job threshold (IMPT_JOBBKLG).If no appropriate queues with free slots or space for new pending jobs are found, the best queue is selected based on which queues can preempt lower priority jobs.
If no queues have free slots even after preemption, jobs pend on the submission cluster.
All scheduler options are configured.
Submission cluster scheduler considers whether remote resources exist, and only forwards jobs to a queue with free slots or space in the MultiCluster pending job threshold (IMPT_JOBBKLG).
If no appropriate queues with free slots or space for new pending jobs are found, the best queue is selected based on the number of free slots after preempting low priority jobs and preemptable jobs.
If no queues have free slots even after preemption, jobs pend on the submission cluster.
MultiCluster job forwarding is enabled from a send-queue on Cluster1 to the receive-queues HighPriority@Cluster2 and HighPriority@Cluster3. Both clusters have lower priority queues from running local jobs, and the high priority queues can preempt jobs from the lower priority queues. The scheduler on Cluster1 has the following information about the remote clusters:
Example 1: MC_PLUGIN_SCHEDULE_ENHANCE=COUNT_PREEMPTABLE:
Cluster2 has a total of 70 running slots out of 100 total slots, with 20 pending slots. The number of (available slots) -(pending slots) for Cluster2 is 10. Cluster3 has a total of 90 running slots out of 100 total slots, with 5 pending slots. The number of (available slots) -(pending slots) for Cluster3 is 5. Thus a job forwarded from Cluster1 is sent to HighPriority@Cluster2.
Example 2: MC_PLUGIN_SCHEDULE_ENHANCE=COUNT_PREEMPTABLE PREEMPTABLE_QUEUE_PRIORITY:
In both Cluster1 and Cluster2, running jobs occupy all 100 slots. LowPriority@Cluster2 has a queue priority of 30, while LowPriority@Cluster3 has a queue priority of 20. Thus a job forwarded from Cluster1 is sent to HighPriority@Cluster3, where slots can be preempted from the lowerest priority queue.
Example 3: MC_PLUGIN_SCHEDULE_ENHANCE=COUNT_PREEMPTABLE HIGH_QUEUE_PRIORITY PREEMPTABLE_QUEUE_PRIORITY:
Cluster2 has a total of 70 running slots out of 100 total slots, with 20 pending slots. The number of (available slots) -(pending slots) for Cluster2 is 10. Cluster3 has a total of 90 running slots out of 100 total slots, with 5 pending slots. The number of (available slots) -(pending slots) for Cluster3 is 5.
Although (available slots)-(pending slots) is higher for Cluster2, Cluster3 contains a higher priority queue. Thus a job forwarded from Cluster1 is sent to HighPriority@Cluster3.
Example 4: MC_PLUGIN_SCHEDULE_ENHANCE=COUNT_PREEMPTABLE HIGH_QUEUE_PRIORITY PREEMPTABLE_QUEUE_PRIORITY:
In both Cluster1 and Cluster2, running jobs occupy all 100 slots. In this case (preemptable available slots)-(pending slots) is considered. For HighPriority@Cluster2 this number is (80-20)=60; for HighPriority@Cluster3 this number is (70-5)=65. Both queues have the same priority, thus a job forwarded from Cluster1 is sent to HighPriority@Cluster3.