Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Writing an External Scheduler Plugin


Contents

[ Top ]


About External Scheduler Plugin

The default scheduler plugin modules provided by LSF may not satisfy all the particular scheduling policies you need. You can use the LSF scheduler plugin API to customize existing scheduling policies or implement new ones that can operate with existing LSF scheduler plugin modules.

Sample plugin code

Sample code for an example external scheduler plugin, and information about writing, building, and configuring your own custom scheduler plugin is located in:

LSF_TOP/7/misc/examples/external_plugin/

[ Top ]


Writing an External Scheduler Plugin

Scheduling policies can be applied into two phases of a scheduling cycle: match phase and allocation phase.

Match/sort phase

In match phase, scheduler prepares candidate hosts for jobs. All jobs with the same resource requirements share the same candidate hosts. The plugin at this phase can decide which host is eligible for future consideration. If the host is not eligible for the job, it is removed from the candidate host list. At the same time, the plugin associates a pending reason with the removed host, which will be shown by the bjobs command.

Finally, the plugin can decide which candidate host should be considered first in future.

The plugin in this phase provides two functions:

Match():

Doing filtering on candidate hosts

Sort():

Doing ordering on candidate hosts

Input and output of match phase

The input/output of this phase are candHostGroupList and PendingReasonTable. Candidate hosts are divided into several groups. Jobs can only use hosts from one of candHostGroup in the candHostGroupList.

The plugin filters the candHostGroups in candHostGroupList, removes the ineligible hosts from the group, and sets the pending reason in the PendingReasonTable.

Plugin Invocation

Since each plugin does match/sort based on certain resource requirements, it decides which host is qualified and which should be first based on certain kinds of resource requirements. The scheduler organizes the Match() and Sort() into the handler of each resource requirement.

After the handler is created, all that plugin needs to do is to register it to scheduler framework. Then it is the scheduler framework's responsibility to call each handler doing match and sort and handling each specific resource requirement.

When the plugin registers the handler, a resource criteria type is associated with the handler. The Criteria Type indicates which kind of resource requirement the handler is handling.

Handler functions

Together with Match() and Sort(), there are other two handler functions:

New()

Gets the user-specific resource requirements string, parses it, creates the handler- specific data, and finally attaches the data to related resource requirement.

Free()

Frees the handler-specific data when not needed.

See sched_api.h for details.

Implementing match phase

See sch.mod.matchexample.c for details.

Step 1.

Define resource criteria type, handler-specific data, and user specific pending reason.

The criteria type indicates the kind of resource requirement the handler is handling. Usually, the external plugin handler only handles external resource requirement (string) which is specified through bsub command using the -extsched option.

In order to use -extsched, you must set LSF_ENABLE_EXTSCHEDULER=y in lsf.conf.

New() function parses the external resource requirement string, and stores the parsed resource to handler-specific data.

handler-specific data is a container used to store any data which is needed by the handler.

If the plugin needs to set a user specific pending reason, a pending reason ID needs to be defined. See lsb_reason_set() in sched_api.h for more information.

Step 2.

Implement handler functions: New(), Free(), Match(), and Sort().

Step 3.

Implement sched_init(). This function is the plugin initialization function, which is called when the plugin is loaded.

  1. Create handler, and register it to scheduler framework (lsb_resreq_registerhandler).

Allocation phase

In allocation phase, the scheduler makes allocation decisions for each job. It assigns host slot, memory, and other resources to the job. It also checks if the allocation satisfies all constrains defined in configuration, such as queue slot limit, deadline for the job, etc.

Your plugin at this phase can modify allocation decisions made by another LSF module.

Limitations or allocation modifications

  1. External plugin is only allowed to change the host slot distribution, i.e., reduce/increase the slot usage on certain host, add more hosts to the allocation. Other resource usage modification is not supported now.
  2. External plugin is not allowed to remove a host from an allocation.
  3. External plugin cannot change reservation in an allocation.

Input and output of allocation phase

INPUT:

job: current job we are making allocation for.

candHostGroupList: (see section 2.1.1)

pendingReasonTable: (see section 2.1.1)

INPUT/OUTPUT:

alloc: LSF allocation decision is passed in, and plugin will modify it, and make its own allocation decision on top of it.

Invocation

At allocation phase, the plugin needs to provide a callback function, AllocatorFn, which adjusts allocation decisions made by LSF. This function must be registered to the scheduler framework. The scheduler framework calls it after LSF makes a decision for the job.

In addition to AllocatorFn(), the plugin may also need to provide a New() function in the handler for the user-specific resource criteria, if there are any. If there is no such user- specific resource requirement, AllocatorFn() is applied to all jobs.

Implementing allocation phase

See sch.mod.allocexample.c for details.

Step1.

Optional.

Define criteria type for external resource requirements.

Step2.

Optional.

Implement New() function in the handler for the resource criteria type.

Step3.

Implement callback AllocatorFn():

  1. Check if the allocation has the type of SCH_MOD_DECISION_DISPATCH. If not, just return (lsb_alloc_type()).
  2. Optional. Get external message, and decide whether to continue (lsb_job_getextresreq()).
  3. Get current slot distribution in allocation and availability information for all candidate hosts (lsb_alloc_gethostslot()).
  4. Modify the allocation (lsb_alloc_modify()).

Use lsb_alloc_modify() gradually, not for big changes, because lsb_alloc_modify() may return FALSE due to conflict with other scheduling policies, such as user slot limits on host.

In sch.mod.allocexample.c, slots are adjusted in small steps.

Step4.

Implement sched_init(). This function is the plugin initialization function, which is called when the plugin is loaded.

  1. Optional. Create a handler for resource requirement processing, and register it to the scheduler framework (lsb_resreq_registerhandler()).
  2. Register the allocation callback AllocatorFn() (lsb_alloc_registerallocator()).

[ Top ]


Building the External Scheduler Plugin

Step1.

Set INCDIR and LIBDIR in the makefile to point to the appropriate directories for the LSF include files and libraries.

Step2.

Create a Make.def for the platform on which you want to build the plugin. The Make.def should be located in the LSF_MISC directory at the same level of Make.misc.

All Make.def templates for each platform are in config directory. For example, if you want run examples on Solaris2.6, use following command to create Make.def:

ln -s config/Make.def.sparc-sol2 Make.def

You can also change the file, if necessary.

Step3.

Run make in current directory.

[ Top ]


Enabling and Using the External Scheduler Plugin

Use sch.mod.matchexample.c as an example.

  1. Copy schmod_matchexample.so to LSF_LIBDIR (defined in lsf.conf).
  2. Configure the plugin in lsb.modules; add following line after all LSF modules:
    schmod_matchexample     ()           ()
    
  3. badmin mbdrestart
  4. Use bsub to submit a job.

    If external message is needed, use the option -extsched.

    For example:

    bsub -n 2 -extsched "EXAMPLE_MATCH_OPTIONS=goedel" -R 
    "type==any" sleep 1000
    

    In order to use -extsched, you must set LSF_ENABLE_EXTSCHEDULER=y in lsf.conf.

  5. Use bjobs to look at external message, and customized pending reason.
--------------------------------------------------------------------------
./bjobs -lp

Job <224>, User <yhu>, Project <default>, Status <PEND>, Queue <short>, Job Pri
                     ority <500>, Command <sleep 1000>
Thu Nov 29 15:08:05: Submitted from host <goedel> with hold, CWD <$HOME/LSF4_1/
                     utopia/lsbatch/cmd>, Requested Resources <type==any>;
 PENDING REASONS:
 Load information unavailable: pauli, varley, peano, bongo;
 Closed by LSF administrator: curie, togni;
 Customized pending reason number 20002: goedel;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -  
 loadStop    -     -     -     -       -     -    -     -     -      -      -  

          total_jobs mbd_size 
 loadSched        -        -  
 loadStop         -        -  

 EXTERNAL MESSAGES:
 MSG_ID FROM       POST_TIME      MESSAGE                             
ATTACHMENT 
 0          -             -                        -                      -     
 1      yhu        Nov 29 15:08   EXAMPLE_MATCH_OPTIONS=goedel            N     
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
---------------------------------------------------------------------------

[ Top ]


Scheduler API Reference Summary

See the following API man pages for details:

[ Top ]


Debugging the External Scheduling Plugin

  1. mbschd.log.goedel will show which plugins are successfully loaded. If loading fails, the error message is also logged.
  2. Use debug tool to debug plugins, such gdb, dbx, etc. Attach to mbschd, and set breakpoint in the functions of plugin.

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: March 13, 2009
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.