Knowledge Center Contents Previous Next Index |
Tuning the Cluster
Contents
Tuning LIM
LIM provides critical services to all LSF components. In addition to the timely collection of resource information, LIM provides host selection and job placement policies. If you are using Platform MultiCluster, LIM determines how different clusters should exchange load and resource information. You can tune LIM policies and parameters to improve performance.
LIM uses load thresholds to determine whether to place remote jobs on a host. If one or more LSF load indices exceeds the corresponding threshold (too many users, not enough swap space, etc.), then the host is regarded as busy and LIM will not recommend jobs to that host. You can also tune LIM load thresholds.
You can also change default LIM behavior and pre-select hosts to be elected master to improve performance.
In this section
Adjusting LIM Parameters
There are two main goals in adjusting LIM configuration parameters: improving response time, and reducing interference with interactive use. To improve response time, tune LSF to correctly select the best available host for each job. To reduce interference, tune LSF to avoid overloading any host.
LIM policies are advisory information for applications. Applications can either use the placement decision from LIM, or make further decisions based on information from LIM.
Most of the LSF interactive tools use LIM policies to place jobs on the network. LSF uses load and resource information from LIM and makes its own placement decisions based on other factors in addition to load information.
Files that affect LIM are
lsf.shared, lsf.cluster.
cluster_name
, wherecluster_name
is the name of your cluster.RUNWINDOW parameter
LIM thresholds and run windows affect the job placement advice of LIM. Job placement advice is not enforced by LIM.
The RUNWINDOW parameter defined in
lsf.cluster.
cluster_name
specifies one or more time windows during which a host is considered available. If the current time is outside all the defined time windows, the host is considered locked and LIM will not advise any applications to run jobs on the host.Load Thresholds
Load threshold parameters define the conditions beyond which a host is considered busy by LIM and are a major factor in influencing performance. No jobs will be dispatched to a busy host by LIM's policy. Each of these parameters is a load index value, so that if the host load goes beyond that value, the host becomes busy.
LIM uses load thresholds to determine whether to place remote jobs on a host. If one or more LSF load indices exceeds the corresponding threshold (too many users, not enough swap space, etc.), then the host is regarded as busy and LIM will not recommend jobs to that host.
Thresholds can be set for any load index supported internally by the LIM, and for any external load index.
If a particular load index is not specified, LIM assumes that there is no threshold for that load index. Define looser values for load thresholds if you want to aggressively run jobs on a host.
See Load Thresholds for more details.
In this section
- Load indices that affect LIM performance
- Comparing LIM load thresholds
- If LIM often reports a host as busy
- If interactive jobs slow down response
- Multiprocessor systems
Load indices that affect LIM performance
For more details on load indices see Load Indices.
Comparing LIM load thresholds
To tune LIM load thresholds, compare the output of
lsload
to the thresholds reported bylshosts -l
.The
lsload
andlsmon
commands display an asterisk*
next to each load index that exceeds its threshold.Example
Consider the following output from
lshosts -l
andlsload
:lshosts -l
HOST_NAME: hostD ... LOAD_THRESHOLDS: r15s r1m r15m ut pg io ls it tmp swp mem - 3.5 - - 15 - - - - 2M 1M HOST_NAME: hostA ... LOAD_THRESHOLDS: r15s r1m r15m ut pg io ls it tmp swp mem - 3.5 - - 15 - - - - 2M 1Mlsload
HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem hostD ok 0.0 0.0 0.0 0% 0.0 6 0 30M 32M 10M hostA busy 1.9 2.1 1.9 47% *69.6 21 0 38M 96M 60MIn this example, the hosts have the following characteristics:
hostD
isok
.hostA
isbusy
- Thepg
(paging rate) index is 69.6, above the threshold of 15.If LIM often reports a host as busy
If LIM often reports a host as
busy
when the CPU utilization and run queue lengths are relatively low and the system is responding quickly, the most likely cause is the paging rate threshold. Try raising thepg
threshold.Different operating systems assign subtly different meanings to the paging rate statistic, so the threshold needs to be set at different levels for different host types. In particular, HP-UX systems need to be configured with significantly higher
pg
values; try starting at a value of 50.There is a point of diminishing returns. As the paging rate rises, eventually the system spends too much time waiting for pages and the CPU utilization decreases. Paging rate is the factor that most directly affects perceived interactive response. If a system is paging heavily, it feels very slow.
If interactive jobs slow down response
If you find that interactive jobs slow down system response too much while LIM still reports your host as
ok
, reduce the CPU run queue lengths (r15s
,r1m
,r15m
). Likewise, increase CPU run queue lengths if hosts become busy at low loads.Multiprocessor systems
On multiprocessor systems, CPU run queue lengths (
r15s
,r1m
,r15m
) are compared to the effective run queue lengths as displayed by thelsload -E
command.CPU run queue lengths should be configured as the load limit for a single processor. Sites with a variety of uniprocessor and multiprocessor machines can use a standard value for
r15s
,r1m
andr15m
in the configuration files, and the multiprocessor machines will automatically run more jobs.Note that the normalized run queue length displayed by
lsload -N
is scaled by the number of processors. See Load Indices for the concept of effective and normalized run queue lengths.Changing Default LIM Behavior to Improve Performance
You may want to change the default LIM behavior in the following cases:
- In very large sites. As the size of the cluster becomes large (500 hosts or more), reconfiguration of the cluster causes each LIM to re-read the configuration files. This can take quite some time.
- In sites where each host in the cluster cannot share a common configuration directory or exact replica.
In this section
- Default LIM behavior
- Changing Default LIM Behavior to Improve Performance
- Reconfiguration and LSF_MASTER_LIST
- How LSF works with LSF_MASTER_LIST
- Considerations
Default LIM behavior
By default, each LIM running in an LSF cluster must read the configuration files
lsf.shared
andlsf.cluster.
cluster_name
to obtain information about resource definitions, host types, host thresholds, etc. This includes master and slave LIMs.This requires that each host in the cluster share a common configuration directory or an exact replica of the directory.
Change default LIM behavior
The parameter LSF_MASTER_LIST in
lsf.conf
allows you to identify for the LSF system which hosts can become masters. Hosts not listed in LSF_MASTER_LIST will be considered as slave-only hosts and will never be considered to become master.Set LSF_MASTER_LIST (lsf.conf)
- Edit
lsf.conf
and set the parameter LSF_MASTER_LIST to indicate hosts that are candidates to become the master host. For example:LSF_MASTER_LIST="hostA hostB hostC"The order in which you specify hosts in LSF_MASTER_LIST is the preferred order for selecting hosts to become the master LIM.
- Save your changes.
- Reconfigure the cluster
lsadmin reconfig
badmin mbdrestart.
Reconfiguration and LSF_MASTER_LIST
If you change LSF_MASTER_LIST
Whenever you change the parameter LSF_MASTER_LIST, reconfigure the cluster with
lsadmin reconfig
andbadmin mbdrestart
.If you change lsf.cluster.
cluster_name
or lsf.sharedIf you make changes that do not affect load report messages such as adding or removing slave-only hosts, you only need to restart the LIMs on all master candidates with the command
lsadmin limrestart
and the specific host names.For example:
lsadmin limrestart hostA hostB hostC
If you make changes that affect load report messages such as load indices, you must restart all the LIMs in the cluster. Use the command
lsadmin reconfig
.How LSF works with LSF_MASTER_LIST
The files
lsf.shared
andlsf.cluster.
cluster_name
are shared only among LIMs listed as candidates to be elected master with the parameter LSF_MASTER_LIST.The preferred master host is no longer the first host in the cluster list in
lsf.cluster.
cluster_name
, but the first host in the list specified by LSF_MASTER_LIST inlsf.conf
.Whenever you reconfigure, only master LIM candidates read
lsf.shared
andlsf.cluster.
cluster_name
to get updated information. The elected master LIM sends configuration information to slave LIMs.The order in which you specify hosts in LSF_MASTER_LIST is the preferred order for selecting hosts to become the master LIM.
Considerations
Generally, the files
lsf.cluster.
cluster_name
andlsf.shared
for hosts that are master candidates should be identical.When the cluster is started up or reconfigured, LSF rereads configuration files and compares
lsf.cluster.
cluster_name
andlsf.shared
for hosts that are master candidates.In some cases in which identical files are not shared, files may be out of sync. This section describes situations that may arise should
lsf.cluster.
cluster_name
andlsf.shared
for hosts that are master candidates not be identical to those of the elected master host.LSF_MASTER_LIST defined
When LSF_MASTER_LIST is defined, LSF only rejects candidate master hosts listed in LSF_MASTER_LIST from the cluster if the number of load indices in
lsf.cluster.
cluster_name
or
lsf.shared
for master candidates is different from the number of load indices in thelsf.cluster.
cluster_name
orlsf.shared
files of the elected master.A warning is logged in the log file
lim.log.
master_host_name
and the cluster continues to run, but without the hosts that were rejected.If you want the hosts that were rejected to be part of the cluster, ensure the number of load indices in
lsf.cluster.
cluster_name
andlsf.shared
are identical for all master candidates and restart LIMs on the master and all master candidates:
lsadmin limrestart hostA hostB hostC
LSF_MASTER_LIST defined, and master host goes down
If LSF_MASTER_LIST is defined and the elected master host goes down, and if the number of load indices in
lsf.cluster.
cluster_name
orlsf.shared
for the new elected master is different from the number of load indices in the files of the master that went down, LSF will reject all master candidates that do not have the same number of load indices in their files as the newly elected master. LSF will also reject all slave-only hosts. This could cause a situation in which only the newly elected master is considered part of the cluster.A warning is logged in the log file
lim.log.
new_master_host_name
and the cluster continues to run, but without the hosts that were rejected.To resolve this, from the current master host, restart all LIMs:
lsadmin limrestart all
All slave-only hosts will be considered part of the cluster. Master candidates with a different number of load indices in their
lsf.cluster.
cluster_name
or
lsf.shared
files will be rejected.When the master that was down comes back up, you will have the same situation as described in The files lsf.shared and lsf.cluster.cluster_name are shared only among LIMs listed as candidates to be elected master with the parameter LSF_MASTER_LIST.. You will need to ensure load indices defined in
lsf.cluster.
cluster_name
andlsf.shared
for all master candidates are identical and restart LIMs on all master candidates.Improving performance of mbatchd query requests on UNIX
You can improve
mbatchd
query performance on UNIX systems using the following methods:
- Multithreading-On UNIX platforms that support thread programming, you can change default
mbatchd
behavior to use multithreading and increase performance of query requests when you use thebjobs
command. Multithreading is beneficial for busy clusters with many jobs and frequent query requests. This may indirectly increase overallmbatchd
performance.- Hard CPU affinity-You can specify the master host CPUs on which
mbatchd
child query processes can run. This improvesmbatchd
scheduling and dispatch performance by binding query processes to specific CPUs so that higher prioritymbatchd
processes can run more efficiently.In this section
- How mbatchd works without multithreading
- Configure mbatchd to use multithreading
- Set a query-dedicated port for mbatchd
- Specify an expiry time for child mbatchds (optional)
- Specify hard CPU affinity
- Configure mbatchd to push new job information to child mbatchd
How mbatchd works without multithreading
Ports
By default,
mbatchd
uses the port defined by the parameter LSB_MBD_PORT inlsf.conf
or looks into the system services database for port numbers to communicate with LIM and job request commands.It uses this port number to receive query requests from clients.
Servicing requests
For every query request received,
mbatchd
forks a childmbatchd
to service the request. Each childmbatchd
processes the request and then exits.Configure mbatchd to use multithreading
When
mbatchd
has a dedicated port specified by the parameter LSB_QUERY_PORT inlsf.conf
, it forks a childmbatchd
which in turn creates threads to process query requests.As soon as
mbatchd
has forked a childmbatchd
, the childmbatchd
takes over and listens on the port to process more query requests. For each query request, the childmbatchd
creates a thread to process it.The child
mbatchd
continues to listen to the port number specified by LSB_QUERY_PORT and creates threads to service requests until the job status changes, a new job is submitted, or until the time specified in MBD_REFRESH_TIME inlsb.params
has passed.Specify a time interval, in seconds, when
mbatchd
will fork a new childmbatchd
to service query requests to keep information sent back to clients updated. A childmbatchd
processes query requests creating threads.MBD_REFRESH_TIME has the following syntax:
MBD_REFRESH_TIME=
seconds
[min_refresh_time
]where
min_refresh_time
defines the minimum time (in seconds) that the childmbatchd
will stay to handle queries. The valid range is 0 - 300. The default is 5 seconds.
- If MBD_REFRESH_TIME is <
min_refresh_time
, the childmbatchd
exits at MBD_REFRESH_TIME even if the job changes status or a new job is submitted before MBD_REFRESH_TIME expires.- If MBD_REFRESH_TIME >
min_refresh_time
- the child
mbatchd
exits atmin_refresh_time
if a job changes status or a new job is submitted before themin_refresh_time
- the child
mbatchd
exits after themin_refresh_time
when a job changes status or a new job is submitted- If MBD_REFRESH_TIME >
min_refresh_time
and no job changes status or a new job is submitted, the childmbatchd
exits at MBD_REFRESH_TIMEThe default for
min_refresh_time
is 10 seconds.If you use the
bjobs
command and do not get up-to-date information, you may want to decrease the value of MBD_REFRESH_TIME or MIN_REFRESH_TIME inlsb.params
to make it likely that successive job queries could get the newly-submitted job information.
note:
Lowering the value of MBD_REFRESH_TIME or MIN_REFRESH_TIME increases the load on mbatchd and might negatively affect performance.
- Specify a query-dedicated port for the
mbatchd
by setting LSB_QUERY_PORT inlsf.conf
.See Set a query-dedicated port for mbatchd.
- Optional: Set an interval of time to indicate when a new child
mbatchd
is to be forked by setting MBD_REFRESH_TIME inlsb.params
. The default value of MBD_REFRESH_TIME is 5 seconds, and valid values are 0-300 seconds.See Specify an expiry time for child mbatchds (optional).
- Optional: Use NEWJOB_REFRESH=Y in
lsb.params
to enable a childmbatchd
to get up to date new job information from the parentmbatchd
.See Configure mbatchd to push new job information to child mbatchd.
Set a query-dedicated port for mbatchd
To change the default
mbatchd
behavior so thatmbatchd
forks a childmbatchd
that can create threads, specify a port number with LSB_QUERY_PORT inlsf.conf
.
tip:
This configuration only works on UNIX platforms that support thread programming.
- Log on to the host as the primary LSF administrator.
- Edit
lsf.conf
.- Add the LSB_QUERY_PORT parameter and specify a port number that will be dedicated to receiving requests from hosts.
- Save the
lsf.conf
file.- Reconfigure the cluster:
badmin mbdrestart
Specify an expiry time for child mbatchds (optional)
Use MBD_REFRESH_TIME in
lsb.params
to define how oftenmbatchd
forks a new childmbatchd
.
- Log on to the host as the primary LSF administrator.
- Edit
lsb.params
.- Add the MBD_REFRESH_TIME parameter and specify a time interval in seconds to fork a child
mbatchd
.The default value for this parameter is 5 seconds. Valid values are 0 to 300 seconds.
- Save the
lsb.params
file.- Reconfigure the cluster as follows:
badmin reconfig
Specify hard CPU affinity
You can specify the master host CPUs on which
mbatchd
child query processes can run (hard CPU affinity). This improvesmbatchd
scheduling and dispatch performance by binding query processes to specific CPUs so that higher prioritymbatchd
processes can run more efficiently.When you define this parameter, LSF runs
mbatchd
child query processesonly
on the specified CPUs. The operating system can assign other processes to run on the same CPU, however, if utilization of the bound CPU is lower than utilization of the unbound CPUs.
- Identify the CPUs on the master host that will run
mbatchd
child query processes.
- Linux: To obtain a list of valid CPUs, run the command
/proc/cpuinfo
- Solaris: To obtain a list of valid CPUs, run the command
psrinfo
- In the file
lsb.params
, define the parameter MBD_QUERY_CPUS.For example, if you specify:
MBD_QUERY_CPUS=1 2
the
mbatchd
child query processes will run only on CPU numbers 1 and 2 on the master host.You can specify CPU affinity only for master hosts that use one of the following operating systems:
- Linux 2.6 or higher
- Solaris 8 or higher
If failover to a master host candidate occurs, LSF maintains the hard CPU affinity, provided that the master host candidate has the same CPU configuration as the original master host. If the configuration differs, LSF ignores the CPU list and reverts to default behavior.
- Verify that the
mbatchd
child query processes are bound to the correct CPUs on the master host.
- Start up a query process by running a query command such as
bjobs
.- Check to see that the query process is bound to the correct CPU.
- Linux: Run the command
taskset -p <pid>
- Solaris: Run the command
ps -AP
Configure mbatchd to push new job information to child mbatchd
Prerequisites: LSB_QUERY_PORT must be defined. in
lsf.conf
.If you have enabled multithreaded mbatchd support, the bjobs command may not display up-to-date information if two consecutive query commands are issued before a child mbatchd expires because child mbatchd job information is not updated. Use NEWJOB_REFRESH=Y in
lsb.params
to enable a childmbatchd
to get up to date new job information from the parentmbatchd
.When NEWJOB_REFRESH=Y the parent
mbatchd
pushes new job information to a childmbatchd
. Job queries withbjobs
display new jobs submitted after the childmbatchd
was created.
- Log on to the host as the primary LSF administrator.
- Edit
lsb.params
.- Add NEWJOB_REFRESH=Y.
You should set MBD_REFRESH_TIME in
lsb.params
to a value greater than 10 seconds.- Save the
lsb.params
file.- Reconfigure the cluster as follows:
badmin reconfig
Platform Computing Inc.
www.platform.com |
Knowledge Center Contents Previous Next Index |