[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
This section describes the Platform LSF MultiCluster product ("MultiCluster"), its features and benefits.
[ Top ]
Benefits of MultiCluster
Within an organization, sites may have separate, independently managed LSF clusters. Having multiple LSF clusters could solve problems related to:
When you have more than one cluster, it is desirable to allow the clusters to cooperate to reap the following benefits of global load sharing:
- Access to a diverse collection of computing resources
- Enterprise grid computing becomes a reality
- Get better performance and computing capabilities
- Use idle machines to process jobs
- Use multiple machines to process a single parallel job
- Increase user productivity
- Add resources anywhere and make them available to the entire organization
- Plan computing resources globally based on total computing demand
- Increase computing power in an economical way
MultiCluster enables a large organization to form multiple cooperating clusters of computers so that load sharing happens not only within clusters, but also among them. MultiCluster enables:
- Load sharing across a large numbers of hosts
- Co-scheduling between different clusters
- Resource ownership and autonomy to be enforced
- Non-shared user accounts and file systems to be supported
- Communication limitations among the clusters to be taken into consideration in job scheduling
[ Top ]
Two MultiCluster Models
There are two different ways to share resources between clusters using MultiCluster. These models can be combined, for example, Cluster1 forwards jobs to Cluster2 using the job forwarding model, and Cluster2 borrows resources from Cluster3 using the resource leasing model.
Job forwarding model
In this model, the cluster that is starving for resources sends jobs over to the cluster that has resources to spare. To work together, two clusters must set up compatible send-jobs and receive-jobs queues.
With this model, scheduling of MultiCluster jobs is a process with two scheduling phases: the submission cluster selects a suitable remote receive-jobs queue, and forwards the job to it; then the execution cluster selects a suitable host and dispatches the job to it. This method automatically favors local hosts; a MultiCluster send-jobs queue always attempts to find a suitable local host before considering a receive-jobs queue in another cluster.
Resource leasing model
In this model, the cluster that is starving for resources takes resources away from the cluster that has resources to spare. To work together, the provider cluster must "export" resources to the consumer, and the consumer cluster must configure a queue to use those resources.
In this model, each cluster schedules work on a single system image, which includes both borrowed hosts and local hosts.
Choosing a model
Consider your own goals and priorities when choosing the best resource-sharing model for your site.
- The job forwarding model can make resources available to jobs from multiple clusters, this flexibility allows maximum throughput when each cluster's resource usage fluctuates. The resource leasing model can allow one cluster exclusive control of a dedicated resource, this can be more efficient when there is a steady amount of work.
- The lease model is the most transparent to users and supports the same scheduling features as a single cluster.
- The job forwarding model has a single point of administration, while the lease model shares administration between provider and consumer clusters.
Resizable jobs
Resizable jobs across MultiCluster clusters is not supported. This implies following behaviors:
- For the forwarding model, once job is forwarded to remote cluster, job is not autoresizable.
- For the lease model, the initial allocation for the job may contain lease hosts. But once the job allocation includes a leased host, LSF does not generate a pending allocation request. LSF does not allocate any leased hosts to pending allocation requests.
- You cannot run
bresize
commands to grow or shrink allocations from submission clusters in either lease model or job forwarding modelOnly
bresize release
is supported in the job forwarding model from execution cluster:
- The submission cluster does log all events related to bresize release in submission cluster lsb.events file
- The submission cluster logs JOB_RESIZE events into lsb.acct file after the allocation is changed.
- Users should be able to view allocation changes from submission cluster through bjobs, bhist and bacct, busers, bqueues etc.
[ Top ]
Testing the Resource Leasing Model
The following instructions explain how to configure the lease model on two clusters.
Cluster2
will be the resource provider; it will export hosts tocluster1
.
- In the provider cluster, edit the
LSF_TOP/conf/lsbatch/
cluster_name/configdir/lsb.resources
file to specify the hosts to be exported.For example, in
cluster2
, one job slot each onhostE
andhostF
will be exported and can be used bycluster1
:Begin HostExport PER_HOST = hostE hostF SLOTS = 1 DISTRIBUTION = ([cluster1, 100]) End HostExport- Reconfigure the cluster:
% badmin reconfig- Use the
bclusters
command to make sure the cluster is configured correctly.For example, in
cluster2
:% bclusters ... [Resourse Lease Information] REMOTE_CLUSTER RESOURCE_FLOW STATUS cluster1 EXPORT conn- In the consumer cluster, edit the
LSF_TOP/conf/lsbatch/
cluster_name/configdir/lsb.queues
file and add a queue that will use the hosts borrowed from the provider cluster as if they were local resources. For example, incluster1
:Begin Queue QUEUE_NAME = ssimodel HOSTS = all@cluster2 DESCRIPTION = Jobs in this queue will use cluster2 hosts End Queue- Reconfigure the cluster:
% badmin reconfig- Use the
bclusters
command to make sure the queue is configured correctly.For example, in
cluster1
:% bclusters ... [Resourse Lease Information ] REMOTE_CLUSTER RESOURCE_FLOW STATUS cluster2 IMPORT conn- Submit a job to the queue in
cluster1
. It must run on a host borrowed fromcluster2
.Example
Submit a job to the queue named
ssimodel
incluster1
:% bsub -q ssimodel -R "type==any" sleep 500 Job <204> is submitted to queue <ssimodel>. % bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMI T_TIME 204 user1 RUN ssimodel hostA hostE@cluster2 sleep 500 Nov 13 12:15 % bhosts HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV hostE@cluster2 ok - 1 1 1 0 0 0 hostA ok - - 0 0 0 0 0You can also view this job from
cluster2
, where it has a different job ID:% bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMI T_TIME 854 user1 RUN ssimodel@cluster1 hostA@cluster1 hostE sleep 500 Nov 13 12:15[ Top ]
Testing the Job Forwarding Model
The following instructions explain how to configure the job forwarding model on two clusters.
Cluster2
will be the execution cluster; it will run jobs forcluster1
.
- In the submission cluster, edit the
LSF_TOP/conf/lsbatch/
cluster_name/configdir/lsb.queues
file and add a queue to send jobs to the execution cluster.For example, configure a queue called
sendq
incluster1
that will send all jobs to execute incluster2
:Begin Queue QUEUE_NAME = sendq SNDJOBS_TO = receiveq@cluster2 HOSTS = none DESCRIPTION = Jobs submitted to this queue will be run in cluster2 End Queue
HOSTS = none
specifies that this queue cannot place jobs on any local hosts.- Reconfigure the cluster:
% badmin reconfig- In the execution cluster, edit the
LSF_TOP/conf/lsbatch/
cluster_name/configdir/lsb.queues
file and add a queue to receive jobs sent from the submission cluster.For example, configure a queue called
receiveq
incluster2
that will receive jobs fromcluster1
:Begin Queue QUEUE_NAME = receiveq PRIORITY = 40 RCVJOBS_FROM = cluster1 End Queue- Reconfigure the cluster:
% badmin reconfig- Use the
bclusters
command to make sure the queues are configured correctly.For example, in
cluster1
:% bclusters LOCAL_QUEUE JOB_FLOW REMOTE CLUSTER STATUS sendq send receiveq cluster2 okFor example, in
cluster2
:% bclusters LOCAL_QUEUE JOB_FLOW REMOTE CLUSTER STATUS receiveq recv - cluster1 ok- Submit a job to make sure the queues are configured correctly.
Example
Submit a job to
cluster1
that will run incluster2
:% bsub -q sendq -R "type==any" sleep 500 Job <103> is submitted to queue <sendq>. % bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TI ME 103 user1 RUN sendq hostA hostE@cluster2 sleep 500 Nov 13 11:44You can also view this job from
cluster2
, where it has a new job ID:JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIM E 899 user1 RUN receiveq hostA@cluster1 hostE sleep 500 Nov 13 11:44[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: March 13, 2009
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.