[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
The resource leasing model was developed to be transparent to the user.
Configuring the Provider Cluster:
Configuring of the Consumer Cluster:
Special Considerations under the Lease Model
[ Top ]
Overview of Lease Model
Two clusters agree that one cluster will borrow resources from the other, taking control of the resources. Both clusters must change their configuration to make this possible, and the arrangement, called a "lease", does not expire, although it might change due to changes in the cluster configuration.
With this model, scheduling of jobs is always done by a single cluster. When a queue is configured to run jobs on borrowed hosts, LSF schedules jobs as if the borrowed hosts actually belonged to the cluster.
How the lease model works
- Setup:
- Establishing a lease:
- To establish a lease,
- Configure two clusters properly (the provider cluster must export the resources, and the consumer cluster must have a queue that requests remote resources).
- Start up the clusters.
- In the consumer cluster, submit jobs to the queue that requests remote resources.
At this point, a lease is established that gives the consumer cluster control of the remote resources.
- If the provider did not export the resources requested by the consumer, there is no lease. The provider continues to use its own resources as usual, and the consumer cannot use any resources from the provider.
- If the consumer did not request the resources exported to it, there is no lease. However, when entire hosts are exported the provider cannot use resources that it has exported, so neither cluster can use the resources; they will be wasted.
- Changes to the lease:
- The lease does not expire. To modify or cancel the lease, you should change the export policy in the provider cluster.
- If you export a group of workstations allowing LSF to automatically select the hosts for you, these hosts do not change until the lease is modified. However, if the original lease could not include the requested number of hosts, LSF can automatically update the lease to add hosts that become available later on.
- If the configuration changes and some resources are no longer exported, jobs from the consumer cluster that have already started to run using those resources will be killed and requeued automatically.
If LSF selects the hosts to export, and the new export policy allows some of the same hosts to be exported again, then LSF tries to re-export the hosts that already have jobs from the consumer cluster running on them (in this case, the jobs continue running without interruption). If LSF has to kill some jobs from the consumer cluster to remove some hosts from the lease, it selects the hosts according to job run time, so it kills the most recently started jobs.
[ Top ]
Using the Lease Model
Submit jobs
LSF will automatically schedule jobs on the available resources, so jobs submitted to a queue that uses borrowed hosts can automatically use the borrowed resources.
To submit a job and request a particular host borrowed from another cluster, use the format host_name
@
cluster_name to specify the host. For example, to run a job onhostA
incluster4
:bsub -q myqueue -m hostA@cluster4 myjobThis will not work when you first start up the MultiCluster grid; the remote host names are not recognized until the lease has been established.
The
bmod
syntax also allows you to specify borrowed hosts in the same format host_name@
cluster_name.Administration
The administrator of the consumer cluster can open and close borrowed hosts using
badmin
. Use the format host_name@
cluster_name to specify the borrowed host. This action only affects scheduling on the job slots that belong to that consumer cluster. For example, if slots on a host are shared among multiple consumers, one consumer can close the host, but the others will not be affected or be aware of any change.You must be the administrator of the provider cluster to shut down or start up a host. This action will affect the consumer cluster as well.
When you define a host group in
lsb.hosts
, or a host partition, you can use the keywordallremote
to indicate all borrowed hosts available to the cluster. You cannot define a host group that includes borrowed hosts specified by host name or cluster name.Compute units defined in
lsb.hosts
can use wild cards to include the names of borrowed hosts available to the cluster. You cannot define a host group that includes borrowed hosts specified by host name or cluster name directly.Hosts running LSF 7 Update 4 or earlier cannot satisfy compute unit resource requirements, and thus cannot be included in compute units.
The pre-execution command retry limit (MAX_PREEXEC_RETRY and REMOTE_MAX_PREEXEC_RETRY), job requeue limit (MAX_JOB_REQUEUE), and job preemption retry limit (MAX_JOB_PREEMPT) configured in
lsb.params
,lsb.queues
, andlsb.applications
apply to jobs running on remote leased hosts as if they are running on local hostsTracking
By default,
bhosts
only shows information about hosts and resources that are available to the local cluster and information about jobs that are scheduled by the local cluster. Therefore, borrowed resources are included in the summary, but exported resources are not normally included (the exception is reclaimed resources, which are shown during the times that they are available to the local cluster).For borrowed resources, the host name is displayed in the format host_name
@
cluster_name. The number of job slots shown is the number available to the consumer cluster, the JL/U and host status shown is determined by the consumer cluster, and the status shown is relative to the consumer cluster. For example, the consumer might seeclosed
orclosed_Full
status, while the provider seesok
status.
- Cluster1 has borrowed one job slot on hostA. It shows the borrowed host is closed because that job slot is in use by a running job.
bhosts HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV hostA@cluster2 closed - 1 1 1 0 0 0- Cluster2 has kept 3 job slots on hostA for its own use. It shows the host is open, because all the available slots are free.
bhosts HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV hostA ok - 3 0 0 0 0 0This option displays information about the exported resources. The provider cluster does not display JL/U or host status; this status information is determined by the consumer cluster and does not affect the provider.
This option displays information about exported shared resources.
The
bjobs
command shows all jobs associated with hosts in the cluster, including MultiCluster jobs. Jobs from remote clusters can be identified by the FROM_HOST column, which shows the remote cluster name and the submission or consumer cluster job ID in the format host_name@
remote_cluster_name:
remote_job_ID.If the MultiCluster job is running under the job forwarding model, the QUEUE column shows a local queue, but if the MultiCluster job is running under the resource leasing model, the name of the remote queue is shown in the format queue_name
@
remote_cluster_name.Use
-w
or-l
to prevent the MultiCluster information from being truncated.For the resource leasing model,
bclusters
shows information about each lease.
- Status
ok
means that the resources are leased and the resources that belong to the provider are being used by the consumer.conn
indicates that a connection has been established but the lease has not yet started; probably because the consumer has not yet attempted to use the shared resources. If this status persists in a production environment, it could mean that the consumer cluster is not properly configured.disc
indicates that there is no connection between the two clusters.- Resource flow
[ Top ]
Resource Exporting
lsb.resources file
The
lsb.resources
file contains MultiCluster configuration information for the lease model, including the export policies which describe the hosts and resources that are exported, and the clusters that can use them.You must reconfigure the cluster to make the configuration take effect.
Resources that can be exported
To export resources, you must always export job slots on hosts, so that the consumer cluster can start jobs on the borrowed hosts.
By default, all the jobs on a host compete for its resources. To help share resources fairly when a host's job slots are divided among multiple clusters, you can export quantities of memory and swap space, also for the use of the consumer cluster.
By default, shared resources such as software licenses are not exported. You can create a separate policy to export these resources.
Who can use exported resources
The export policy defines the consumers of exported resources. By default, resources that are exported can be used by the provider; this applies to job slots on a host and also to resources like memory.
With resource reclaim, exported job slots can be reclaimed by the provider if the consumer is not using them to run jobs. In this way, the provider can share in the use of the exported job slots. For more information, see Shared Lease.
[ Top ]
Creating an Export Policy
An export policy defined in
lsb.resources
is enclosed by the lines:Begin HostExport ... End HostExportIn each policy, you must specify which hosts to export, how many job slots, and distribution of resources. Optionally, you can specify quantities of memory and swap space.
To export hosts of HostExport Type==DLINUX, specifying swap space is mandatory. See Exporting Other Resources.
Configure as many different export policies as you need.
Each export policy corresponds to a separate lease agreement.
Export policy examples
This simple export policy exports a single job slot on a single host to a single consumer cluster:
Begin HostExport PER_HOST=HostA SLOTS=1 DISTRIBUTION=([Cluster5, 1]) End HostExportThis simple policy exports all the resources on a single Linux host to a single consumer cluster:
Begin HostExport RES_SELECT=type==LINUX NHOSTS=1 DISTRIBUTION=([Cluster5, 1]) End HostExportExporting hosts
To export resources such as job slots or other resources, you must specify which hosts the resources are located on. There are two ways to specify which hosts you want to export: you can list host names, or you can specify resource requirements and let LSF find hosts that match those resource requirements. The method you use to specify the exported hosts determines the method that LSF uses to share the hosts among competing consumer clusters.
If you have a group of similar hosts, you can share a portion of these hosts with other clusters. To choose this method, let LSF automatically select the hosts to export. The group of hosts can be shared among multiple consumer clusters, but each host is leased to only one consumer cluster, and all the job slots on the host are exported to the consumer.
You can share a powerful multiprocessor host among multiple clusters. To choose this method, export one or more hosts by name and specify the number of job slots to export. The exported job slots on each host are divided among multiple consumer clusters.
Distributing exported resources
An export policy exports specific resources. The distribution statement in
lsb.resources
partitions these resources, assigning a certain amount exclusively to each consumer cluster. Clusters that are not named in the distribution list do not get to use any of the resources exported by the policy.The simplest distribution policy assigns all of the exported resources to a single consumer cluster:
DISTRIBUTION=([Cluster5, 1])
The syntax for the distribution list is a series of share assignments. Enclose each share assignment in square brackets, as shown, and use a space to separate multiple share assignments. Enclose the full list in parentheses:
DISTRIBUTION=([
share_assignment]
...)
The share assignment determines what fraction of the total resources is assigned to each cluster.
The syntax of each share assignment is the cluster name, a comma, and the number of shares.
[
cluster_name,
number_shares]
- cluster_name
Specify the name of a cluster allowed to use the exported resources.
- number_shares
Specify a positive integer representing the number of shares of exported resources assigned to the cluster.
The number of shares assigned to a cluster is only meaningful when you compare it to the number assigned to other clusters, or to the total number. The total number of shares is just the sum of all the shares assigned in each share assignment.
- In this example, resources are leased to 3 clusters in an even 1:1:1 ratio. Each cluster gets 1/3 of the resources.
DISTRIBUTION=([C1, 1] [C2, 1] [C3, 1])- In this example, resources are leased to 3 clusters in an uneven ratio. There are 5 shares assigned in total, so C1 gets 2/5 of the resources, C2 gets the same, and C3 gets 1/5 of the resources.
DISTRIBUTION=([C1, 2] [C2, 2] [C3, 1])[ Top ]
Exporting Workstations
These steps describe the way to share part of a large farm of identical hosts. This is most useful for reallocating resources among different departments, to meet a temporary need for more processing power.
- Create the new policy.
- Specify the hosts that are affected by the policy. Each host is entirely exported; the provider cluster does not save any job slots on the exported hosts for its own use. See Allowing LSF to select the hosts you want to export.
- Specify the distribution policy. This determines which clusters share in the use of the exported job slots. See Distribution policy for automatically selected hosts.
- Optional. Share additional resources (any combination of memory, swap space, or shared resources). See
Allowing LSF to select the hosts you want to export
To export a set of hosts that meet certain resource requirements, specify both RES_SELECT and NHOSTS in
lsb.resources
.For RES_SELECT, specify the selection criteria using the same syntax as the "select" part of the resource requirement string (normally used in the LSF
bsub
command). For details about resource selection syntax, see Administering Platform LSF. For this parameter, if you do not specify the required host type, the default is "type==any
".For NHOSTS, specify a maximum number of hosts to export.
Begin HostExport RES_SELECT=type==LINUX NHOSTS=4In this example, we want to export 4 Linux hosts. If the cluster has 5 Linux hosts available, 4 are exported, and the last one is not exported. If the cluster has only 3 Linux hosts available at this time, then only 3 hosts are exported, but LSF can update the lease automatically if another host becomes available to export later on.
Use
lshosts
to view the host types that are available in your cluster.Distribution policy for automatically selected hosts
For syntax of the distribution policy, see Distributing exported resources.
When you export hosts by specifying the resource selection statement, multiple hosts are divided among multiple consumer clusters, but each host is entirely exported to a single consumer cluster. All the job slots on a host are exported to the consumer cluster, along with all its other host-based resources including swap space and memory.
Begin HostExport RES_SELECT=type==LINUX NHOSTS=2 DISTRIBUTION=([C1, 1] [C2, 1]) End HostExportIn this example, 2 hosts that match the resource requirements are selected, suppose they are
HostA
andHostB
, and each has 2 job slots. All job slots on each host are exported. Resources are shared evenly among 2 clusters, each cluster gets 1/2 of the resources.Since the hosts are automatically selected, the hosts are distributed to only one consumer cluster, so the first host,
HostA
, goes toCluster1
, and the second host,HostB
, goes toCluster2
. Assume each host has 2 job slots for use by the consumer cluster.Cluster1
gets 2 job slots onHostA
, andCluster2
gets 2 job slots onHostB
.In this example there is an even distribution policy, but it is still possible for one consumer cluster to get more resources than the other, if the exported hosts are not all identical.
[ Top ]
Exporting Special Hosts
These steps describe the way to share a large multiprocessor host among multiple clusters. This is most useful for allowing separate departments to share the cost and use of a very powerful host. It might also be used to allow multiple clusters occasional access to a host that has some unique feature.
- Create the new policy.
- Specify the hosts that are affected by the policy. See Naming the hosts you want to export.
- Specify how many job slots you want to export from each host. Optionally, reduce the number of job slots available to the local cluster by the same amount. See Controlling job slots.
- Specify the distribution policy. This determines which clusters share in the use of the exported job slots. See Distribution policy for named hosts.
- Optional. Share additional resources (any combination of memory, swap space, or shared resources). See
Naming the hosts you want to export
Specify the name of a host in the PER_HOST parameter in
lsb.resources
:Begin HostExport PER_HOST=HostAIf you specify multiple hosts, this policy will apply to all the hosts you specify:
Begin HostExport PER_HOST=HostA HostB HostCControlling job slots
Use the SLOTS parameter to specify the number of job slots to export from each host.
By default, the provider can still run the usual number of jobs at all times. The additional jobs that the consumer clusters are allowed to start might overload the host. If you are concerned with keeping the host's performance consistent, reduce the job slot configuration in the local cluster to compensate for the number of slots exported to remote clusters.
For example, this policy exports 4 job slots on each host:
Begin HostExport PER_HOST=HostA HostB SLOTS=4
- Default configuration of
lsb.hosts
in the provider cluster:HOST_NAME MXJ HostA 6 HostB 8- How you can update
lsb.hosts
to compensate for the exported job slots:HOST_NAME MXJ HostA 2 HostB 4Distribution policy for named hosts
For syntax of the distribution policy, see Distributing exported resources.
When you export hosts by specifying host names, the job slots on each host are divided among multiple consumer clusters, so each cluster gets a part of each host.
Begin HostExport PER_HOST=HostA HostB SLOTS=2 DISTRIBUTION=([C1, 1] [C2, 1]) End HostExportIn this example, 2 job slots are exported from
HostA
andHostB
. Resources are shared evenly among 2 clusters, so each cluster is entitled to 1/2 of the resources.Because the hosts are specified by name, the distribution policy is applied at the job slot level. The first job slot on
HostA
goes toCluster1
, and the second job slot onHostA
goes toCluster2
. Similarly, one job slot onHostB
goes toCluster1
, and the other job slot onHostB
goes toCluster2
. Each consumer cluster can start 2 jobs, one onHostA
, and one onHostB
.The provider cluster can always use the number of job slots that are configured in the provider cluster (no matter how many slots are exported). You might want to adjust the configuration of the provider cluster after exporting hosts and reduce the number of job slots (MXJ in
lsb.hosts
); otherwise, you might notice a difference in performance because of the extra jobs that can be started by the consumer clusters.[ Top ]
Exporting Other Resources
Once you have exported a host, you can export memory and swap space in addition to job slots.
By default, the consumer cluster borrows a job slot but is not guaranteed that there will be free memory or swap space, because all jobs on the host compete for the host's resources. If these resources are exported, each consumer cluster schedules work as if only the exported amount is available (the exported amount acts a limit for the consumer cluster), and the provider cluster can no longer use the amount that has been exported.
- The distribution policies that apply to job slots also apply to other resources.
- If the provider cluster doesn't have the amount that is specified in the export policy, it will export as much as it has.
To export hosts of HostExport Type==DLINUX, exporting swap space is mandatory. If you do not specify swap space, the hosts of this host type are filtered because the resource is seen as unavailable
Exporting memory
To export memory, set MEM in
lsb.resources
host export policy, and specify the number of MB per host:Exporting swap space
To export swap space, set SWP in
lsb.resources
host export policy, and specify the number of MB per host:[ Top ]
Exporting Shared Resources
In addition to job slots and some other built-in resources, it is possible to export numeric shared resources (for example, representing software application licenses). The resource definitions in
lsf.shared
must be the same in both clusters.Export policies for shared resources are defined in
lsb.resources
, after export policies for hosts. The configuration is different--shared resources are not exported per host.When you export a shared resource to a consumer cluster, you must already have a host export policy that exports hosts to the same consumer cluster, and the shared resource must be available on one or more of those exported hosts. Otherwise, the export policy does not have any effect.
Configure shared resource export
In
lsb.resources
, configure a resource export policy for each resource as shown:Begin SharedResourceExport NAME = AppX NINSTANCES = 10 DISTRIBUTION = ([C1, 30] [C2, 70]) End SharedResourceExportIn each policy, you specify one shared numeric resource (here, a license for ApplicationX), the maximum number of these you want to export, and distribution, using the same syntax as a host export policy. See Distributing exported resources.
If some quantity of the resource is available, but not the full amount you configured, LSF exports as many instances of the resource as are available to the exported hosts.
[ Top ]
Shared Lease
Optional.
You can export resources from a cluster and enable shared lease, which allows the provider cluster to share in the use of the exported resources. This type of lease dynamically balances the job slots according to the load in each cluster.
Only job slots will be shared. If you export memory, swap space, and shared resources, they become available to the consumer cluster exclusively.
About shared lease
By default, exported resources are for the exclusive use of the consumer, they cannot be used by the provider. If they are not being used by the consumer, they are wasted.
There is a way to lease job slots to a cluster part-time. With shared lease, both provider and consumer clusters can have the opportunity to take any idle job slots. The benefit of the shared lease is that the provider cluster has a chance to share in the use of its exported resources, so the average resource usage is increased.
Shared lease is not compatible with advance reservation.
If you enable shared leasing, each host can only be exported to a single consumer cluster. Therefore, when shared leasing is enabled, you can export a group of workstations to multiple consumers using RES_SELECT syntax, but you cannot share a powerful multiprocessor host among multiple consumer clusters using PER_HOST syntax unless the distribution policy specifies just one cluster.
How it works
By default, a lease is exclusive, which means a fixed amount of exported resources is always dedicated exclusively to a consumer cluster. However, if you configure leases to be shared, the job slots exported by each export policy can also become available to the provider cluster.
Reclaimable resources are job slots that are exported with shared leasing enabled. The reclaim process is managed separately for each lease, so the set of job slots exported by one resource export policy to one consumer cluster is managed as a group.
When the provider cluster is started, the job slots are allocated to the provider cluster, except for one that is reserved for the consumer cluster, to allow a lease to be made. Therefore, all but one slot is initially available to the provider cluster, and one slot could be available to the consumer. The lease is made when the consumer schedules a job to run on the single job slot that is initially available to it.
To make job slots available to a different cluster, LSF automatically modifies the lease contract. The lease will go through a temporary "inactive" phase each time. When a lease is updated, the slots controlled by the corresponding export policy are distributed as follows: the slots that are being used to run jobs remain under the control of the cluster that is using them, but the slots that are idle are all made available to just one cluster.
To determine which cluster will reclaim the idle slots each time, LSF considers the number of idle job slots in each cluster:
idle_slots_provider = available_slots_provider - used_slots_provider idle_slots_consumer = available_slots_consumer - used_slots_consumerThe action depends on the relative quantity of idle slots in each cluster.
- If the consumer has more idle slots:
idle_slots_consumer > idle_slots_providerthen the provider reclaims idle slots from the consumer, and all the idle slots go to the provider cluster.
- If the provider has more idle slots:
idle_slots_provider > idle_slots_consumerthen the reverse happens, and all the idle slots go to the consumer cluster.
- However, if each cluster has an equal number of idle slots:
idle_slots_consumer = idle_slots_providerthen the lease does not get updated.
LSF evaluates the status at regular intervals, specified by MC_RECLAIM_DELAY in
lsb.params
.The calculations are performed separately for each set of reclaimable resources, so if a provider cluster has multiple resource export policies, some leases could be reconfigured in favor of the provider while others get reconfigured in favor of the consumer.
Configure shared leasing
To make a shared lease, set
TYPE=shared
in the resource export policy (lsb.resources
HostExport section). Remember that each resource export policy using PER_HOST syntax must specify just one cluster in the distribution policy, if the lease is shared.Begin HostExport PER_HOST=HostA SLOTS=4 TYPE=shared DISTRIBUTION=([C1, 1]) End HostExportIn this example,
HostA
is exported with shared leasing enabled, so the lease can be reconfigured at regular intervals, allowing LSF to give any idle job slots to the cluster that needs them the most.Optional. To set the reclaim interval, set MC_RECLAIM_DELAY in
lsb.params
and specify how often to reconfigure a shared lease, in minutes. The interval is the same for every lease in the cluster.The default interval is 10 minutes.
[ Top ]
Borrowing Resources
When you add new hosts to a single LSF cluster, you might need to update your queues to start sending work to the new hosts. This is often not necessary, because queues with the default configuration can use all hosts in the local cluster.
However, when a MultiCluster provider cluster exports resources to a consumer cluster, the default queue configuration does not allow the consumer cluster to use those resources. You must update your queue configuration to start using the borrowed resources.
By default, LSF queues only use hosts that belong to the submission cluster. Queues can use borrowed resources when they are configured to use borrowed hosts (and the provider cluster's export policy must be compatible).
If your clusters do not have a shared file system, then parallel jobs that require a common file space could fail if they span multiple clusters. One way to prevent this is to submit these jobs to a queue that uses hosts all from one cluster (for example, configure the queue to use local hosts or borrowed hosts, but not both).
Configure a queue to use borrowed resources
To configure a queue to use borrowed resources, edit
lsb.queues
HOSTS parameter and specify the hosts you want to borrow from one or more other clusters.
- The keyword
all
does not include borrowed hosts, only hosts that belong to the consumer cluster.- The keyword
allremote
specifies the group of borrowed hosts belonging to all provider clusters.- The keyword
others
does not include borrowed hosts, only hosts that belong to the consumer cluster.- The keyword
none
is not compatible with the resource leasing model.- You can specify a borrowed host in the format host_name
@
cluster_name. Make sure you configure this correctly, LSF does not validate names of borrowed hosts when you reconfigure the cluster.- You can specify a host group that includes borrowed resources; see Host groups or host partitions.
- You can specify all the hosts borrowed from another cluster in the format
all
@
cluster_name.
- Queues configured with the keyword
all
can use all available resources that belong to the consumer cluster. You can specify additional clusters or hosts to use selected borrowed resources also.HOSTS = all all@cluster2 hostB@cluster4- Queues configured with the keyword
allremote
can use all available borrowed resources, from all other clusters. You can also specify additional host names to use selected resources that belong to the consumer cluster.HOSTS = hostB hostC allremote- Queues configured with both keywords can use all available resources whether the hosts are borrowed or belong to the consumer cluster.
HOSTS = all allremoteYou can specify preference levels for borrowed resources, as well as for local resources. If your clusters do not have a common file system, the extra overhead of file transfer between clusters can affect performance, if a job involves large files. In this case, you should give preference to local hosts.
HOSTS = all+1 allremote[ Top ]
Running Parallel Jobs with the Lease Model
To run parallel jobs (specifying multiple processors with
bsub -n
) across clusters, you must configure the RemoteClusters list in each cluster. By default, this list is not configured. For more information on running parallel jobs, see Administering Platform LSF.
- If you do not already have a RemoteClusters list, create the RemoteClusters list and include the names of all remote clusters (the same list as
lsf.shared
). This enables proper communication among all clusters, and enables cross-cluster parallel jobs for all clusters.- If you have a RemoteClusters list, and you do not want to run parallel jobs on resources from all provider clusters, configure the RECV_FROM column in
lsf.cluster.
cluster_name.
[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: March 13, 2009
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.