Two MultiCluster models

There are two different ways to share resources between clusters using MultiCluster. These models can be combined, for example, Cluster1 forwards jobs to Cluster2 using the job forwarding model, and Cluster2 borrows resources from Cluster3 using the resource leasing model.

Job forwarding model

In this model, the cluster that is starving for resources sends jobs over to the cluster that has resources to spare. To work together, two clusters must set up compatible send-jobs and receive-jobs queues.

With this model, scheduling of MultiCluster jobs is a process with two scheduling phases: the submission cluster selects a suitable remote receive-jobs queue, and forwards the job to it; then the execution cluster selects a suitable host and dispatches the job to it. This method automatically favors local hosts; a MultiCluster send-jobs queue always attempts to find a suitable local host before considering a receive-jobs queue in another cluster.

Resource leasing model

In this model, the cluster that is starving for resources takes resources away from the cluster that has resources to spare. To work together, the provider cluster must “export” resources to the consumer, and the consumer cluster must configure a queue to use those resources.

In this model, each cluster schedules work on a single system image, which includes both borrowed hosts and local hosts.

Choosing a model

Consider your own goals and priorities when choosing the best resource-sharing model for your site.

  • The job forwarding model can make resources available to jobs from multiple clusters, this flexibility allows maximum throughput when each cluster’s resource usage fluctuates. The resource leasing model can allow one cluster exclusive control of a dedicated resource, this can be more efficient when there is a steady amount of work.

  • The lease model is the most transparent to users and supports the same scheduling features as a single cluster.

  • The job forwarding model has a single point of administration, while the lease model shares administration between provider and consumer clusters.

Resizable jobs

Resizable jobs across MultiCluster clusters is not supported. This implies following behaviors:

  • For the forwarding model, once job is forwarded to remote cluster, job is not autoresizable.

  • For the lease model, the initial allocation for the job may contain lease hosts. But once the job allocation includes a leased host, LSF does not generate a pending allocation request. LSF does not allocate any leased hosts to pending allocation requests.

  • You cannot run bresize commands to grow or shrink allocations from submission clusters in either lease model or job forwarding model

Only bresize release is supported in the job forwarding model from execution cluster:

  • The submission cluster does log all events related to bresize release in submission cluster lsb.events file

  • The submission cluster logs JOB_RESIZE events into lsb.acct file after the allocation is changed.

  • Users should be able to view allocation changes from submission cluster through bjobs, bhist and bacct, busers, bqueues etc.