Platform LSF batch hosts

LSF batch execution hosts execute jobs in the LSF batch system.

lsb_hostinfo()

LSBLIB provides lsb_hostinfo() to get information about the server hosts in LSF batch. This includes configured static and dynamic information. Examples of host information include: host name, status, job limits and statistics, dispatch windows, and scheduling parameters.

The example program in this section uses lsb_hostinfo():

struct hostInfoEnt *lsb_hostinfo(hosts, numHosts)

lsb_hostinfo() gets information about LSF batch server hosts. On success, it returns an array of hostInfoEnt structures which hold the host information and sets *numHosts to the size of the array. On failure, lsb_hostinfo() returns NULL and sets lsberrno to indicate the error.

lsb_hostinfo() has the following parameters:

char  **hosts;                Array of names of hosts of interest
int   *numHosts;               Number of names in hosts

To get information on all hosts, set *numHosts to 0. This sets *numHosts to the actual number of hostInfoEnt structures when lsb_hostinfo() returns successfully.

If *numHosts is 1 and hosts is NULL, lsb_hostinfo()returns information on the local host.

hostInfoEnt structure

The hostInfoEnt structure is defined in lsbatch.h as

struct hostInfoEnt {
    char   *host;
    int    hStatus;          Host status
    int    *busySched;       Host loadSched busy reason
    int    *busyStop;        Host loadStop  busy reason
    float  cpuFactor;
    int    nIdx;             Number of load index
    float  *load;            Load for scheduling batch jobs
    float  *loadSched;       Stop scheduling new jobs if over
    float  *loadStop;        Stop jobs if over this load
    char   *windows;         ASCII desp of run windows
    int    userJobLimit;     Number of jobs per user allowed to run
    int    maxJobs;          Maximum number of jobs allowed to run
    int    numJobs;          Number of total jobs
    int    numRUN;           Number of running jobs
    int    numSSUSP;         Number of system suspended jobs
    int    numUSUSP;         Number of user suspended jobs
    int    mig;   Number of minutes suspended before migration
    int    attr;            Host attributes
#define H_ATTR_CHKPNTABLE  0x1
#define H_ATTR_CHKPNT_COPY 0x2 
   float *realLoad;         Effective load of the host
    int   numRESERVE;        Number of slots reserved for pending jobs
    int   chkSig;           If attr has an H_ATTR_CHKPNT_COPY attribute. chkSig is set to the signal which triggers checkpoint and copy operation. Otherwise, chkSig is set to the signal which triggers checkpoint operation on the host
    float   cnsmrUsage;    Number of resources used by consumer
    float   prvdrUsage;    Number of resource used by provider
    float   cnsmrAvail;    Number of resources available for consumer
    float   prvdrAvail;    Number of resources available for provider
    float   maxAvail;      Maximum number of resources available
    float   maxExitRate;   Job exit rate threshold on the host
    float   numExitRate;    Number of job exit rate on the host
    char    *hCtrlMsg;     AdminAction - host control message
};

There are differences between the host information returned by ls_gethostinfo() and the host information returned by the lsb_hostinfo(). ls_gethostinfo() returns general information about the hosts whereas lsb_hostinfo()returns LSF batch specific information about hosts.

For a complete description of the fields in the hostInfoEnt structure, see the lsb_hostinfo(3) man page.

Example

The following example takes a host name as an argument and displays information about the named host. It is a simplified version of the LSF batch bhosts command.

/******************************************************
* LSBLIB -- Examples
*
* simbhosts
* Display information about the batch server host with 
* the given name in the cluster.
******************************************************/
#include <lsf/lsbatch.h>
int main (int argc, char *argv[])
{
    struct hostInfoEnt *hInfo;
        /* array holding all job info entries */
    char *hostname = argv[1]; /* given host name */
    int numHosts = 1;/* number of interested host */
    /* check if input is in the right format: "./simbhosts
    HOSTNAME" */
    if (argc!=2) {
        printf("Usage: %s hostname\n", argv[1]);
        exit(-1);
    }
    /* initialize LSBLIB and get the configuration environment */
    if (lsb_init(argv[0]) < 0) {
        lsb_perror("simbhosts: lsb_init() failed");
        exit(-1);
    }
    hInfo = lsb_hostinfo(&hostname, &numHosts);   
        /* get host info */
    if (hInfo == NULL) {
        lsb_perror("simbhosts: lsb_hostinfo() failed");
        exit (-1);
    }
    /* display the host information (name,status, job limit,
    num of RUN/SSUSP/USUSP jobs)*/
    printf("HOST_NAME          STATUS    JL/U  NJOBS  RUN 
    SSUSP USUSP\n");
    printf ("%-18.18s", hInfo->host);
    if (hInfo->hStatus & HOST_STAT_UNLICENSED)
        printf(" %-9s\n", "unlicensed");
    else if (hInfo->hStatus & HOST_STAT_UNAVAIL)
        printf(" %-9s",  "unavail");
    else if (hInfo->hStatus & HOST_STAT_UNREACH)
        printf(" %-9s", "unreach");
    else if (hInfo->hStatus & ( HOST_STAT_BUSY | HOST_STAT_WIND |                               HOST_STAT_DISABLED |
                              HOST_STAT_LOCKED |
                              HOST_STAT_FULL |
                              HOST_STAT_NO_LIM))
        printf(" %-9s", "closed");
    else
        printf(" %-9s", "ok");
    if (hInfo->userJobLimit < INFINIT_INT)
        printf("%4d", hInfo->userJobLimit);
    else
        printf("%4s", "-");
    printf("%7d  %4d  %4d  %4d\n", hInfo->numJobs, hInfo->            numRUN, hInfo->numSSUSP, hInfo->numUSUSP);
exit(0);
} /* main */

The example output from the above program follows:

% a.out hostB
HOST_NAME    STATUS    JL/U  NJOBS  RUN  SSUSP USUSP
hostB           ok        -     2     1     1     0

hStatus is the status of the host. It is the bitwise inclusive OR of some of the following constants defined in lsbatch.h:

Host Status Name

Host Status Description

HOST_STAT_BUSY

The host load is greater than a scheduling threshold. In this status, no new batch job is scheduled to run on this host.

HOST_STAT_WIND

The host dispatch window is closed. In this status, no new batch job is accepted.

HOST_STAT_DISABLED

The host has been disabled by the Platform LSF administrator and will not accept jobs. In this status, no new batch job will be scheduled to run on this host.

HOST_STAT_LOCKED

The host is locked by the LSF administrator. In this status, no new batch job is scheduled to run on this host.

HOST_STAT_FULL

The host has reached its job limit. In this status, no new batch job is scheduled to run on this host.

HOST_STAT_UNREACH

The sbatchd on this host is unreachable.

HOST_STAT_UNAVAIL

The LIM and sbatchd on this host are unreachable.

HOST_STAT_UNLICENSED

The host does not have an LSF license.

HOST_STAT_NO_LIM

The host is running an sbatchd but not a LIM.

HOST_STAT_EXCLUSIVE

The host is locked by an exclusive job. In this status, no new batch job is scheduled to run on this host.

HOST_STAT_LOCKED_MASTER

The host is locked by the master LIM.

HOST_STAT_REMOTE_DISABLED

The remote leased host is disabled by the Platform LSF administrator and will not accept new jobs. This status is used with HOST_STATUS_LOCKED.

HOST_STAT_LEASE_INACTIVE

The remote host is closed while the lease is renewed or terminated.

HOST_STAT_DISABLED_RES

The host is closed because RES is unavailable. This status occurs only when LSF_HPC_EXTENSIONS="LSB_HCLOSE_BY_RES" is set in lsf.conf.

HOST_STAT_DISABLED_RMS

The host is closed because RES is unavailable.

HOST_STAT_LOCKED_BY_EGO

The host is locked by EGO.

HOST_STAT_CLOSED_BY_ADMIN

The host is closed by the Platform_LSF administrator.

HOST_STAT_CU_EXCLUSIVE

The host is locked by a compute unit exclusive job. In this status, no new batch job is scheduled to run on this host.

If none of the above holds, hStatus is set to HOST_STAT_OK to indicate that the host is ready to accept and run jobs.

The constant INFINIT_INT defined in lsf.h is used to indicate that there is no limit set for userJobLimit.