[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
[ Top ]
Setup Overview
- System requirements
- Installation and configuration procedures
- Licensing MultiCluster
- Installing MultiCluster products
- Setting common ports
- Setting common resource definitions
- Defining participating clusters and valid master hosts
System requirements
The setup procedures will guide you through configuring your system to meet each requirement. However, you might find it helpful to understand the system requirements before you begin. This section includes:
- Requirements to install MultiCluster
- Requirements for MultiCluster communication to occur between 2 clusters
- Requirements for resource sharing to occur between 2 clusters
- Requirements for jobs to run across clusters
Requirements to install MultiCluster
MultiCluster is a licensed product; you will have to obtain a license from Platform Computing in order to run MultiCluster.
You can use MultiCluster to link two or more LSF clusters. Then, the participating clusters can be configured to share resources.
MultiCluster files are automatically installed by LSF's regular Setup program (
lsfinstall
). Install LSF and make sure each cluster works properly as a standalone cluster before you proceed to configure MultiCluster.Requirements for MultiCluster communication to occur between 2 clusters
- The local master host must be configured to communicate with the remote cluster:
- The local cluster can only communicate with other clusters if they are specified in
lsf.shared
. See Defining participating clusters and valid master hosts.- If the
RemoteClusters
section inlsf.cluster.
cluster_name is defined, the local cluster has a list of recognized clusters, and is only aware of those clusters. See Restricted Awareness of Remote Clusters.- The local master host must be able to contact the master host of the remote cluster:
- The valid master host list for remote clusters is used to locate the current master host on that cluster and ensure that any remote host is a valid master host for its cluster. See Defining participating clusters and valid master hosts.
- Participating clusters must use the same port numbers for the LSF daemons RES,
mbatchd
, andsbatchd
, and the LIM daemon. By default, all clusters have the identical settings. See Setting common ports.Requirements for resource sharing to occur between 2 clusters
- The local cluster must use the same resource definitions as the remote cluster:
- Clusters should have common definitions of host types, host models, and resources. Each cluster finds this information in
lsf.shared
. See Setting common resource definitions.- A host cannot belong to more than one cluster.
- The local cluster and the remote cluster must have compatible configurations, with the resource owner sharing the resource and the resource consumer seeking to use the resource.
- For the job forwarding model, See Enabling MultiCluster Queues.
- For the resource leasing model, see Creating an Export Policy and Borrowing Resources.
Requirements for jobs to run across clusters
- The user must have a valid user account in each cluster.
- By default, LSF expects that the user accounts will have the same name in each cluster. If clusters do not share a file system and common user name space, you can configure account mapping. See Account mapping between clusters.
- LSF must be able to transfer job files and data files between clusters.
- Dynamic IP addressing is not supported across clusters. LSF client hosts require a fixed IP address to communicate with a host that belongs to another cluster.
- If you use floating client hosts, do not share
lsf.conf
files. You must configure separatelsf.conf
files for each cluster.- If you use static clients (listed in
lsf.cluster.
cluster_name), you may choose to share onelsf.conf
file across multiple clusters. LSF client hosts can only use servers in their local cluster, so if you do this, you must have at least one host from each cluster listed in the LSF_SERVER_HOSTS line. To improve performance, configure separatelsf.conf
files for each cluster instead of sharinglsf.conf
.Installation and configuration procedures
To install and configure MultiCluster, take these steps:
- Plan the cluster
- Required tasks to establish communication between clusters
- Additional tasks that might be required to establish communication between clusters
- Testing communication between clusters
- Required tasks to establish resource sharing
- Optional tasks
- Read the overview to learn about how MultiCluster can be useful to you. See MultiCluster Overview.
- Decide which clusters will participate. Read about setup to learn about the issues that could prevent clusters from working together. See MultiCluster Setup.
- Decide which resources you want to share.
- Decide how you will share the resources among clusters. To learn about the various configuration options, see MultiCluster Job Forwarding Model or MultiCluster Resource Leasing Model.
- Read about setup to learn about configuration options common to both models. See MultiCluster Setup.
Required tasks to establish communication between clusters
- For each participating cluster, obtain and install a valid MultiCluster license. See Licensing MultiCluster.
- For each participating cluster, add the MultiCluster product to the LSF cluster configuration file. See Installing MultiCluster products.
- For resource sharing to work between clusters, the clusters should have common definitions of host types, host models, and resources. Configure this information in
lsf.shared
. See Setting common resource definitions.- To establish communication, clusters must be aware of other clusters and know how to contact other clusters. Add each cluster name and its master host name to Cluster section of
lsf.shared
. Defining participating clusters and valid master hosts.Additional tasks that might be required to establish communication between clusters
- By default, LSF assumes a uniform user name space within a cluster and between clusters.
- With MultiCluster, LSF daemons can use non-privileged ports. By default, LSF daemons in a MultiCluster environment use privileged port authentication. See Security of Daemon Communication.
Testing communication between clusters
- Restart each cluster using the
lsadmin
andbadmin
commands.% lsadmin limrestart all % badmin mbdrestart- To verify that MultiCluster is enabled, run
lsclusters
andbclusters
.% lsclusters CLUSTER_NAME STATUS MASTER_HOST ADMIN HOSTS SERVERS cluster1 ok hostA admin1 1 1 cluster2 ok hostD admin2 3 3 % bclusters [Remote Batch Information] No local queue sending/receiving jobs from remote clustersRequired tasks to establish resource sharing
- Optional. Run a simple test of resource sharing. See Testing the Resource Leasing Model or Testing the Job Forwarding Model.
- Configure resource-sharing policies between clusters. See MultiCluster Job Forwarding Model or MultiCluster Resource Leasing Model.
- By default, all the clusters in a MultiCluster environment are aware of all the other clusters. This makes it possible for clusters to share resources or information. You can restrict awareness of remote clusters at the cluster level. See Restricted Awareness of Remote Clusters.
- With MultiCluster, LSF daemons can use non-privileged ports (by default, LSF daemons in a MultiCluster environment use privileged port authentication). You can also choose the method of daemon authentication. See Security of Daemon Communication and Authentication Between Clusters.
- When a local cluster requests load or host information from a remote cluster, the information is cached. If the local cluster is required to display the same information again, LSF displays the cached information, unless the cache has expired. The expiry period for cached information is configurable. See Cache thresholds.
- The default configuration of LSF is that clusters share information about the resources used by other clusters, and the information is updated every 5 minutes by the execution or provider cluster. You can disable the feature or modify how often MultiCluster resource use is updated. See Configuring resource use updating for MultiCluster jobs.
- To learn about optional features related to each configuration model, see MultiCluster Job Forwarding Model or MultiCluster Resource Leasing Model.
Licensing MultiCluster
To license MultiCluster, do the following:
- Send the license server host IDs for each participating cluster to your LSF vendor, and Platform Computing will generate license keys for you.
- Append the new FEATURE lines to your existing LSF
license.dat
files, so that each participating cluster is appropriately licensed. The feature line required to license MultiCluster islsf_multicluster
.- To make the change take effect, restart each LSF cluster and each license server.
Installing MultiCluster products
MultiCluster files are automatically installed by LSF's regular Setup program (
lsfinstall
). Install LSF and make sure each cluster works properly as a standalone cluster before you proceed to configure MultiCluster.To make each cluster run MultiCluster, add
LSF_MultiCluster
to the products specified in the parameters section oflsf.cluster.
cluster_name:Begin Parameters PRODUCTS=LSF_Base LSF_Manager LSF_MultiCluster End ParametersSetting common ports
Participating clusters must use the same port numbers for the daemons LIM, RES, and MBD.
By default, all clusters have the identical settings, as shown:
LSF_LIM_PORT=7869 LSF_RES_PORT=6878 LSB_MBD_PORT=6881 LSB_SBD_PORT=6882The default for LSF_LIM_PORT has changed to accommodate Platform EGO default port configuration. On EGO, default ports start with
lim
at 7869, and are numbered consecutively for the EGOpem
,vemkd
, andegosc
daemons.This is different from previous LSF releases where the default LSF_LIM_PORT was 6879. LSF
res
,sbatchd
, andmbatchd
continue to use the default pre-7.0 ports 6878, 6881, and 6882.Upgrade installation preserves existing port settings for
lim
,res
,sbatchd
, andmbatchd
. EGOpem
,vemkd
, andegosc
use default EGO ports starting at 7870, if they do not conflict with existinglim
,res
,sbatchd
, andmbatchd
ports.To check your port numbers, check the
LSF_TOP/conf/lsf.conf
file in each cluster. (LSF_TOP is the LSF installation directory. On UNIX, this is defined in theinstall.config
file). Make sure you have identical settings in each cluster for the following parameters:Setting common resource definitions
For resource sharing to work between clusters, the clusters should have common definitions of host types, host models, and resources. Each cluster finds this information in
lsf.shared
, so the best way to configure MultiCluster is to make surelsf.shared
is identical for each cluster. If you do not have a shared file system, replicatelsf.shared
across all clusters.Defining participating clusters and valid master hosts
To enable MultiCluster, define all participating clusters in the Cluster section of the
LSF_TOP/conf/lsf.shared
file.
- For ClusterName, specify the name of each participating cluster. On UNIX, each cluster name is defined by LSF_CLUSTER_NAME in the
install.config
file.- For Servers, specify one or more candidate master hosts for the cluster (these are the first hosts listed in the
Host
section oflsf.cluster.
cluster_name). A cluster will not participate in MultiCluster resource sharing unless its current master host is listed here.Begin Cluster ClusterName Servers Cluster1 (hostA hostB) Cluster2 (hostD) End ClusterIn this example,
hostA
should be the master host ofCluster1
(the first host listed inlsf.cluster.cluster1
HOST section) withhostB
as the backup, and hostD should be the master host ofCluster2
. If the master host fails inCluster1
, MultiCluster will still work because the backup master is also listed here. However, if the master host fails inCluster2
, MultiCluster will not recognize any other host as the master, soCluster2
will no longer participate in MultiCluster resource sharing.EGO_PREDEFINED_RESOURCES in lsf.conf
When Platform EGO is enabled in the LSF cluster (LSF_ENABLE_EGO=Y), you also can set the several EGO parameters related to LIM, PIM, and ELIM in either lsf.conf or ego.conf.
All clusters must have the same value of EGO_PREDEFINED_RESOURCES in
lsf.conf
to enable thenprocs
,ncores
, andnthreads
host resources in remote clusters to be usable.See Administering Platform LSF for more information about configuring Platform LSF on EGO.
[ Top ]
Non-Uniform Name Spaces
By default, LSF assumes a uniform user name space within a cluster and between clusters.
To support the execution of batch jobs across non-uniform user name spaces between clusters, LSF allows user account mapping.
See Account mapping between clusters.
By default, LSF uses
lsrcp
for file transfer (bsub -f
option),
The lsrcp utility depends on a uniform user ID in different clusters.
Account mapping between clusters
By default, LSF assumes a uniform user name space within a cluster and between clusters. To support the execution of batch jobs across non-uniform user name spaces between clusters, LSF allows user account mapping.
For a job submitted by one user account in one cluster to run under a different user account on a host that belongs to a remote cluster, both the local and remote clusters must have the account mapping properly configured. System-level account mapping is configured by the LSF administrator, while user-level account mapping can be configured by LSF users.
System-level account mapping
You must be an LSF administrator to configure system level account mapping.
System-level account mapping is defined in the
UserMap
section oflsb.users
. The submission cluster proposes a set of user mappings (defined using the keywordexport
) and the execution cluster accepts a set of user mappings (defined using the keywordimport
). For a user's job to run, the mapping must be both proposed and accepted.
lsb.users
oncluster1
:Begin UserMap LOCAL REMOTE DIRECTION user1 user2@cluster2 export user3 (user4@cluster2 user6@cluster2) export End UserMap
lsb.users
oncluster2
:Begin UserMap LOCAL REMOTE DIRECTION user2 user1@cluster1 import (user6 user8) user3@cluster1 import End UserMap
Cluster1
configuresuser1
to run jobs asuser2
incluster2
, anduser3
to run jobs asuser4
oruser6
incluster2
.
Cluster2
configuresuser1
fromcluster1
to run jobs asuser2
, anduser3
fromcluster1
to run jobs asuser6
oruser8
.Only mappings configured in both clusters work. The common account mappings are for
user1
to run jobs asuser2
, and foruser3
to run jobs asuser6
. Therefore, these mappings work, but the mappings ofuser3
to users 4 and 8 are only half-done and so do not work.User-level account mapping
To set up your own account mapping, set up an
.lsfhosts
file in your home directory with Owner Read-Write permissions only. Do not give other users and groups permissions on this file.Account mapping can specify cluster names in place of host names.
You have two accounts: user1 on
cluster1
, anduser2
oncluster2
. To run jobs in either cluster, configure.lsfhosts
as shown.On each host in
cluster1
:% cat ~user1/.lsfhosts
cluster2 user2On each host in
cluster2
:% cat ~user2/.lsfhosts
cluster1 user1You have the account
user1
oncluster1
, and want to run jobs oncluster2
under thelsfguest
account. Configure.lsfhosts
as shown.On each host in
cluster1
:%cat ~user1/.lsfhosts
cluster2 lsfguest sendOn each host in
cluster2
:%cat ~lsfguest/.lsfhosts
cluster1 user1 recvYou have a uniform account name (
user2
) on all hosts incluster2
, and a uniform account name (user1
) on all hosts incluster1
excepthostX
. OnhostX
, you have the account nameuser99
.To use both clusters transparently, configure
.lsfhosts
in your home directories on different hosts as shown.On
hostX
incluster1
:%cat ~user99/.lsfhosts
cluster1 user1 hostX user99 cluster2 user2On every other host in
cluster1
:%cat ~user1/.lsfhosts
cluster2 user2 hostX user99On each host in
cluster2
:%cat ~user2/.lsfhosts
cluster1 user1 hostX user99[ Top ]
Restricted Awareness of Remote Clusters
By default, all the clusters in a MultiCluster environment are aware of all the other clusters. This makes it possible for clusters to share resources or information when you configure MultiCluster links between them.
You can restrict awareness of remote clusters at the cluster level, by listing which of the other clusters in the MultiCluster environment are allowed to interact with the local cluster. In this case, the local cluster cannot display information about unrecognized clusters and does not participate in MultiCluster resource sharing with unrecognized clusters.
How it works
By default, the local cluster can obtain information about all other clusters specified in
lsf.shared
. The default behavior of RES is to accept requests from all the clusters inlsf.shared
.If the
RemoteClusters
section inlsf.cluster.
cluster_name is defined, the local cluster has a list of recognized clusters, and is only aware of those clusters. The local cluster is not aware of the other clusters in the MultiCluster environment:
- The cluster does not forward jobs to unrecognized clusters, even if a local queue is configured to do so.
- The cluster does not borrow resources from unrecognized clusters, even if the remote cluster has exported the resources.
- The cluster does not export resources to unrecognized clusters, even if the local resource export section is configured to do to.
- The cluster does not receive jobs from unrecognized clusters, even if a local queue is configured to do so.
- The cluster cannot view information about unrecognized clusters.
However, remote clusters might still be aware of this cluster:
- Unrecognized clusters can view information about this cluster.
- Unrecognized clusters can send MultiCluster jobs to this cluster (they will be rejected, even if a local queue is configured to accept them).
- Unrecognized clusters can export resources to this cluster (this cluster will not use the resources, even if a local queue is configured to import them).
This example illustrates how the RemoteClusters list works.
The MultiCluster environment consists of 4 clusters with a common
lsf.shared
:CLUSTERS cluster1 cluster2 cluster3 cluster4In addition,
cluster2
is configured with a RemoteClusters list inlsf.cluster.
cluster_name:Begin RemoteClusters CLUSTERNAME cluster3 cluster4 End RemoteClustersBecause of the RemoteClusters list, local applications in
cluster2
are aware ofcluster3
andcluster4
, but notcluster1
. For example, if you view information or configure queues using the keywordall
, LSF will behave as if you specified the list of recognized clusters instead of all clusters inlsf.shared
.Adding or modifying RemoteClusters list
You must have cluster administrator privileges in the local cluster to perform this task.
- Open
lsf.cluster.
cluster_name of the local cluster.- If it does not already exist, create the RemoteClusters section as shown:
Begin RemoteClusters CLUSTERNAME ... End RemoteClusters- Edit the
RemoteClusters
section. Under the heading CLUSTERNAME, specify the names of the remote clusters that you want the local cluster recognize.These clusters must also be listed in
lsf.shared
, so the RemoteClusters list is always a subset of the clusters list inlsf.shared
.[ Top ]
Security of Daemon Communication
With MultiCluster, LSF daemons can be configured to communicate over non- privileged ports (by default, LSF daemons in a MultiCluster environment use privileged port authentication).
If disabling the privileged port authentication makes you concerned about the security of daemon authentication, you can use an
eauth
program to enable any method of authentication for secure communication between clusters. See Authentication Between Clusters.
- Configure all clusters to use non-privileged ports for LSF daemon communication.
- If you use a firewall, it must accept incoming communication from non-privileged source ports if the destination ports are the LIM port configured LSF_LIM_PORT in
lsf.conf
andmbatchd
port configured in LSB_MBD_PORT inlsf.conf
.- If you use a firewall, it must allow outgoing communication from non-privileged source ports to non-privileged destination ports.
- To make LSF daemons use non-privileged ports, edit
lsf.conf
in every cluster as shown:LSF_MC_NON_PRIVILEGED_PORTS=Y
- To make the changes take effect, restart the master LIM and MBD in every cluster. For example, if a cluster's master host is hostA, run the following commands in that cluster:
lsadmin limrestart hostA badmin mbdrestart[ Top ]
Authentication Between Clusters
For extra security, you can use any method of external authentication between any two clusters in the MultiCluster grid.
Because this is configured for individual clusters, not globally, different cluster pairs can use different systems of authentication. You use a different
eauth
program for each different authentication mechanism.If no common external authentication method has been configured, two clusters communicate with the default security, which is privileged port authentication.
eauth executables
Contact Platform Professional Services for more information about the
eauth
programs that Platform distributes to allow LSF to work with different security mechanisms. If you already have aneauth
that works with LSF for daemon authentication within the cluster, use a copy of it.If different clusters use different methods of authentication, set up multiple
eauth
programs.
- Copy the corresponding
eauth
program to LSF_SERVERDIR.- Name the
eauth
program eauth.method_name.If you happen to use the same
eauth
program for daemon authentication within the cluster, you should have two copies, one namedeauth
(used by LSF) and one named eauth.method_name (used by MultiCluster).MultiCluster configuration
- Edit the
lsf.cluster.
cluster_name RemoteClusters section.If the cluster does not already include a RemoteClusters list, you must add it now. To maintain the existing compatibility, specify all remote clusters in the list, even if the preferred method of authentication is the default method.
- If necessary, add the AUTH column to the RemoteClusters section.
- For each remote cluster, specify the preferred authentication method. Set AUTH to method_name (using the same method name that identifies the corresponding
eauth
program). For default behavior, specify a dash (-).- To make the changes take effect in a working cluster, run the following commands:
lsadmin limrestart
master_hostbadmin mbdreconfig
Repeat the steps for each cluster that will use external authentication, making sure that the configurations of paired-up clusters match.
Configuration example
In this example, Cluster1 and Cluster2 use Kerberos authentication with each other, but not with Cluster3. It does not matter how Cluster3 is configured, because there is no extra authentication unless the configurations of both clusters agree.
lsf.cluster.cluster1
:Begin RemoteClusters CLUSTERNAME EQUIV CACHE_INTERVAL RECV_FROM AUTH cluster2 Y 60 Y KRB cluster3 N 30 N - End RemoteClustersLSF_SERVERDIR in Cluster1 includes an
eauth
executable namedeauth.KRB
.
lsf.cluster.cluster2
:Begin RemoteClusters CLUSTERNAME EQUIV CACHE_INTERVAL RECV_FROM AUTH cluster1 Y 60 Y KRB cluster3 N 30 N - End RemoteClustersLSF_SERVERDIR in Cluster2 includes an
eauth
executable namedeauth.KRB
.[ Top ]
Resource Use Updating for MultiCluster Jobs
Upon installation, the default configuration of LSF is that clusters share information about the resources used by other clusters, and the information is updated every 5 minutes by the execution or provider cluster. You can disable the feature or modify how often MultiCluster resource use is updated. Depending on load, updating the information very frequently can affect the performance of LSF.
Configuring resource use updating for MultiCluster jobs
To change the timing of resource usage updating between clusters, set MC_RUSAGE_UPDATE_INTERVAL in
lsb.params
in the execution or provider cluster. Specify how often to update the information in the submission or consumer cluster, in seconds.To disable LSF resource usage updating between clusters, specify zero:
MC_RUSAGE_UPDATE_INTERVAL=0You must configure this parameter manually; you cannot use LSF GUI tools to add or modify this parameter.
[ Top ]
MultiCluster Information Cache
When a local cluster requests load or host information from a remote cluster, the information is cached. If the local cluster is required to display the same information again, LSF displays the cached information, unless the cache has expired.
The expiry period for cached information is configurable, so you can view more up-to- date information if you don't mind connecting to the remote cluster more often.
It is more efficient to get information from a local cluster than from a remote cluster. Caching remote cluster information locally minimizes excessive communication between clusters.
Cache thresholds
The cache threshold is the maximum time that remote cluster information can remain in the local cache.
There are two cache thresholds, one for load information, and one for host information. The threshold for host information is always double the threshold for load information.
By default, cached load information expires after 60 seconds and cached host information expires after 120 seconds.
How it works
When a local cluster requests load or host information from a remote cluster, the information is cached by the local master LIM.
When the local cluster is required to display the same information again, LSF evaluates the age of the information in the cache.
- If the information has been stored in the local cluster for longer than the specified time, LSF contacts the remote cluster again, updates the cache, and displays current information.
- If the age of the cached information is less than the threshold time, LSF displays the cached information.
Configuring cache threshold
Set CACHE_INTERVAL in the
RemoteClusters
section oflsf.cluster.
cluster_name, and specify the number of seconds to cache load information.[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: March 13, 2009
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.