Architecture

LSF License Scheduler manages license tokens instead of controlling the licenses directly. Using LSF License Scheduler, jobs receive a license token before starting the application. The number of tokens available from LSF corresponds to the number of licenses available from FLEXnet, so if a token is not available, the job does not start. In this way, the number of licenses requested by running jobs does not exceed the number of available licenses.

When a job starts, the application is not aware of LSF License Scheduler. The application checks out licenses from FLEXnet in the usual manner.

Non-LSF jobs

Jobs that start outside of LSF do not receive a license token, but they can still check out a license. LSF automatically adjusts the total number of licenses managed to compensate for the licenses that have been taken by non-LSF jobs.

LSF ELIM not needed

Using License Scheduler, you do not need to configure custom resources or write an ELIM. LSF automatically sets up license tokens as LSF resources and makes ELIM redundant.

No lsf.shared setup

You do not need to define the license as a shared LSF resource in lsf.shared.

No lsf.cluster.cluster_name setup

You do not need to define the license as a shared LSF resource in lsf.cluster.cluster_name; just configure license projects, which you include in the distribution policy for that license.

No ELIM

Do not write an ELIM to monitor the license. With License Scheduler configured, LSF is aware of the actual license availability. If the job can receive a license token, it is guaranteed to receive an actual license when required.

LSF License Scheduler and host reliability

You can define a list of candidate hosts for LSF License Scheduler in case of a host failure. The LSF License Scheduler daemon (bld) runs on the candidate hosts and maintains a connection between each candidate and the LSF License Scheduler host.

Failover in a LAN

If the LSF License Scheduler host fails, the first candidate host listed in the License Scheduler host list takes over the license scheduling until the master host restarts. It must be running the LSF License Scheduler daemon.

Failover in a WAN

If License Scheduler is managing licenses in a WAN configuration, and the connection between sites breaks, a candidate LSF License Scheduler host manages license scheduling locally until the WAN connection returns.

Using License Scheduler in a WAN

The following examples illustrate the benefits of using License Scheduler to manage license tokens in a WAN. In these examples, the license server in Design Center A can only serve licenses to jobs from Design Center A. The license server in Design Center B, however, can serve licenses to jobs from both centers.

Figure 1. Two design centers without LSF License Scheduler

In this example

The elim collects license information from the FLEXnet license server host (LAN or WAN) and reports back to the LSF cluster master batch daemon (mbatchd) through LIM. When the LSF cluster starts jobs, the decision is based on license availability. The jobs check out the licenses directly from the server.

Interactive jobs check out licenses directly from the server without any scheduling controls.

This example shows two potential problems:

  • Uncontrolled competition for license checkout can lead to a race condition that can result in job failure for some users.

  • There is no way to balance license usage among multiple projects or multiple sites.

Figure 2. Two design centers with LSF License Scheduler

In this example

LSF License Scheduler collects license information from the FLEXnet license server host (LAN or WAN). The LSF cluster daemon (mbatchd) receives tokens from License Scheduler and starts jobs. The jobs check out the license directly from the server.

  1. LSF License Scheduler collects the information related to licenses:
    • License availability and license usage from the FLEXnet license server hosts

    • License demand and license usage from LSF clusters and interactive users

  2. Based on the information it collects, and on its scheduling and distribution policies, License Scheduler makes license distribution and preemption decisions.

Because License Scheduler distributes each license to only one license project, there is no race condition among multiple users. Because License Scheduler is a central point of control, scheduling policies can include multiple LSF clusters and non-LSF users.

LSF scheduling policies

With LSF License Scheduler, LSF gathers information about the licensing requirements of pending jobs to efficiently distribute available licenses. Other LSF scheduling policies are independent from LSF License Scheduler policies.

When starting a job, the basic LSF scheduling comes first.

  • Assign a suitable LSF host before considering the requirements of any other resources, like licenses.

For example, a job must have a candidate LSF host on which to start before the LSF License Scheduler fairshare policy (for the license project this job belongs to) will apply.

  • Other LSF fairshare policies are based on CPU time, run time, and usage. If LSF fairshare scheduling is configured, LSF determines which user or queue has the highest priority, then considers other resources. In this way, the other LSF fairshare policies have priority over LSF License Scheduler.

Offline behavior

  • mbatchd

If mbatchd is offline while reconfiguring LSF or because of an unexpected failure of LSF software, tokens distributed to license projects in the unavailable cluster will be redistributed to other projects. When mbatchd comes back online, it immediately receives updated information about the number of tokens currently distributed to its projects in its cluster.

When LSF is reconfigured (badmin reconfig) the bld restarts. (Platform LSF Version 7 Update 5 onwards.)

  • LSF License Scheduler

If mbatchd cannot contact LSF License Scheduler, it does not receive any updated information about the number of tokens dynamically distributed to the projects in its cluster, so it continues to run using the most recent data available.