Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Common LSF Functions


Contents

[ Top ]


Job Related Functions

Deleting a job

To delete a job, send a KILL signal to the job by using lsb_signaljob() or use lsb_deletejob() to kill the job.

int lsb_deletejob(jobId, times, options)
LS_LONG_INT jobId;
int times;
int options;Set to 0

lsb_deletejob() deletes the job after a specific number of runs. The variable times represents the number of runs .

Viewing job output

The output from an LSF job is normally not available until the job is finished. However, LSBLIB provides lsb_peekjob() to retrieve the name of a job file for the job specified by jobId.

To get the job output and job error files, append .out or .err to the end of the base job file name from lsb_peekjob().

Only the job owner can use lsb_peekjob() to see job output.

char *lsb_peekjob(jobId)
LS_LONG_INT  jobId;                   Job ID

On success, the job file name is returned. On failure, it returns NULL and sets lsberrno to indicate the error.

The next call reuses the storage for the file name.

Moving jobs from one host to another

Use lsb_mig() to migrate a job from one host to another.

int lsb_mig(mig, badHostIdx);
struct submig *mig;           Job to be migrated
int    *badHostIdx;

If the call fails, (**askedHosts)[*badHostIdx] is not a host known to the LSF system.

lsf.batch.h defines the struct submig to hold the details of the job to be migrated. It has the following fields:

struct submig {
    LS_LONG_INT jobId;           Job ID to be migrated
    int         options;
    int         numAskedHosts;   Number of hosts supplied for migration
    char        **askedHosts;    Array of pointers to the hosts
};

For the values of options, see the options field of struct submit used in lsb_submit() function call.

On success, lsb_mig() returns 0. On failure, it returns -1 and sets lsberrno to the usual error.

External job message and data exchange

lsb_postjobmsg() sends an external message/status to a job. It can also transfer an attached data file through a TCP connection. The posted messages and attached data files can be read from mbatchd by invoking lsb_readjobmsg().

int lsb_postjobmsg(jobExternalMsgReq, fileName)
struct jobExternalMsgReq *jobExternalMsgReq;
    char *fileName;           Data file to be attached
int lsb_readjobmsg(jobExternalMsgReq, jobExternalMsgReply) 
struct jobExternalMsgReq *jobExternalMsgReq;
struct jobExternalMsgReply *jobExternalMsgReply;

Use struct jobExternalMsgReq as a parameter in both lsb_postjobmsg() and lsb_readjobmsg(). It contains all the details on the external message or status to be read or posted.

struct jobExternalMsgReq {
    int         options;    Indicated which operation to be performed
#define EXT_MSG_POST 0x01   Post external message
#define EXT_ATTA_POST 0x02  Post external data file
#define EXT_MSG_READ 0x04   Read external message
#define EXT_ATTA_READ 0x08  Read external data file
#define EXT_MSG_REPLAY 0x10 Replay external message
    LS_LONG_INT jobId;      Message of the job to be posted/read
    char        *jobName;   Name of the job if jobId is undefined (<=0)
    int         msgIdx;     Index in the list
    char        *desc;      Text description of the message
    int         userId;     Author of the message
    long        dataSize;   Size of the data file
    time_t      postTime;   Message sending time
};

The struct jobExternalMsgReply holds information on external message/status requested by the user. It is defined in lsbatch.h as follows:

struct jobExternalMsgReply {
    LS_LONG_INT jobId;       Message of the job to be read
    int         msgIdx;      Index in the message list
    char        *desc;       Text description of the message
    int         userId;      Author of the message
    long        dataSize;    Size of the data file
    time_t      postTime;    Message sending time
    int         dataStatus;  Status of the attached data
#define EXT_DATA_UNKNOWN 0   Data transferring of the message is processing
#define EXT_DATA_NOEXIST 1   Message without data attached
#define EXT_DATA_AVAIL 2     Data of the message is available
#define EXT_DATA_UNAVAIL 3   Data of the message is corrupt
}; 

[ Top ]


User and Host Related Functions

User information

Use lsb.users to:

LSBLIB provides the function lsb_userinfo() for getting information on LSF user and user groups.

struct userInfoEnt *lsb_userinfo(users, numUsers)
    char **users;            User names
    int  *numUsers;          Number of user names

To get information about all users, set *numUsers = 0; *numUsers is updated to the actual number of users when lsb_userinfo() returns. To get information on the invoker, set users = NULL and *numUsers = 1.

The function returns an array of userInfoEnt structure containing user information. The structure is defined in lsbatch.h as followed:

struct userInfoEnt { 
    char  *user;              Name of the user or user group
    float procJobLimit;       Max number of started jobs on each processor
    int   maxJobs;            Max number of started or running jobs allowed
    int   numStartJobs;       Number of started jobs of the user/group
    int   numJobs;            Number of jobs the user/group submitted
    int   numPEND;            Number of pending jobs of the user/group
    int   numRUN;             Number of running jobs of the user/group
    int   numSSUSP;           Number of system-suspended jobs
    int   numUSUSP;           Number of user-suspended jobs
    int   numRESERVE;         Number of job slots reserved for pending jobs
}; 

lsb_userinfo() gets:

The maximum number of job slots are defined in the lsb.users LSF configuration file. The reserved user name default, also defined in lsb.users, matches users not already listed in lsb.users who have no jobs started in the system.

On success, returns an array of userInfoEnt structures and sets *numUsers to the number of userInfoEnt structures returned. The next call writes over the returned array.

On failure, lsb_userinfo() returns NULL and sets lsberrno to indicate the error. If lsberrno is LSBE_BAD_USER, (*users)[*numUsers] is not a user known to the LSF system. Otherwise, if *numUsers is less than its original value, *numUsers is the actual number of users found.

Getting information in host group or user group

lsb_hostgrpinfo() and lsb_usergrpinfo() get membership of LSF host or user groups.

struct groupInfoEnt *lsb_hostgrpinfo (groups, numGroups, 
                                      options)
struct groupInfoEnt *lsb_usergrpinfo (groups, numGroups, 
                                     options)
    char  **groups;            Array of group names
    int   *numGroups;          Number of group names
    int   options;
struct groupInfoEnt {
    char   *group;            Group name
    char   *memberList;       ASCII list of member names
    int    numUserShares;     Number of users with shares
    struct userShares *userShares; User shares representation
};
struct userShares {
    char   *user;             User name
    int    shares;            Number of shares assigned to the user
}; 
    options                  The bitwise inclusive OR of some of the
                              following flags:

USER_GRP

Get the information of user group.

HOST_GRP

Get the information of host.

GRP_RECURSIVE

Expand the group membership recursively. That is, if a member of a group is itself a group, give the names of its members recursively, rather than its name, which is the default.

GRP_ALL

Get membership of all groups.

GRP_SHARES

Display the information in the long format.

lsb_hostgrpinfo() gets LSF host group membership, lsb_usergrpinfo() gets LSF user group membership.

lsb.users(5) and lsb.hosts(5) define LSF user and host groups, respectively.

On success, lsb_hostgrpinfo() and lsb_usergrpinfo() return an array of groupInfoEnt structures which hold the group name and the list of names of its members. If a member of a group is itself a group (i.e., a subgroup), then a '/' is appended to the name to indicate this. *numGroups is the number of groupInfoEnt structures returned.

On failure, lsb_hostgrpinfo() and lsb_usergrpinfo() returns NULL and sets lsberrno to indicate the error. If lsberrno is LSBE_BAD_GROUP, (*groups)[*numGroups] is not a group known to the LSF system. Otherwise, if *numGroups is less than its original value, *numGroups is the actual number of groups found.

Host partition in fairshare scheduling

To configure host partition fairshare, define a host partition in lsb.hosts. lsb_hostpartinfo() to gets the information on defined host partitions.

struct hostPartInfoEnt *lsb_hostpartinfo (hostParts, 
                                         numHostParts)
    char **hostParts;         Host partition names
    int  *numHostParts;       Number of host partition names

To get information on all host partitions, set hostParts to NULL; *numHostParts is the actual number of host partitions when this lsb_hostpartinfo() returns.

The next call reuses the storage for the array of hostPartInfoEnt structures.

lsb_hostpartinfo() returns a struct hostPartInfoEnt describing the host partitions:

struct hostPartInfoEnt {
    char hostPart[MAX_LSB_NAME_LEN]; Name of the host partition
    char *hostList;                 Names of hosts in the partition
    int  numUsers;                  Number of users sharing the partition
    struct hostPartUserInfo *users; Description of user in the partition
};

The string variable hostList contains the names of the host in the partition and each of the names has a foward slash character (/) appended. (See lsb_groupinfo(3).)

The struct hostPartUserInfo holds information on a specific user in the host partition.

struct hostPartUserInfo {
    char user[MAX_LSB_NAME_LEN]; User Name 
    int   shares;                Number of shares assigned to the user
    float priority;              Priority of user to use the host partition
    int   numStartJobs;          Number of started jobs on host partition
    float histCpuTime;           Normalized CPU time of finished jobs
    int   numReserveJobs;        Number of reserved job slots for pending
                                 jobs
    int   runTime;               Time unfinished jobs spend in RUN state 
}; 

For priority, the bigger values represent higher priorities. Jobs belonging to the user or user group with the highest priority are considered first for dispatch when resources in the host partition are being contended for. In general, a user or user group with more shares, fewer numStartJobs and less histCpuTime has higher priority.

On success, returns an array of hostPartInfoEnt structures which hold information on the host partitions, and sets *numHostParts to the number of hostPartInfoEnt structures.

On failure, lsb_hostpartinfo() returns NULL and sets lsberrno to indicate the error. If lsberrno is LSBE_BAD_HPART, (*hostParts)[*numHostParts] is not a host partition known to the LSF system. Otherwise, if *numHostParts is less than its original value, *numHostParts is the actual number of host partitions found.

Controlling hosts and daemons

The user can control the hosts and daemons through lsb_hostcontrol() and lsb_reconfig().

lsb_hostcontrol() opens or closes a host and restarts or shutdowns the slave batch daemon.

int  lsb_hostcontrol (struct hostCtrlReq *);
struct hostCtrlReq {
    char  *host;             Host to be controlled
    int    opCode;           Option for host control
    char  *message;          Message attached by the admin
};

If host is NULL, the local host is assumed.

lsbatch.h defines the opCode parameter containing the following control selection flags:

HOST_CLOSE

Closes the host so that no jobs can dispatched to it.

HOST_OPEN

Opens the host to accept jobs.

HOST_REBOOT

Restart the sbatchd on the host. The sbatchd will receive a request from the mbatchd and re-execute itself. This permits the sbatchd binary to be updated. This operation will fail if no sbatchd is running on the specified host.

HOST_SHUTDOWN

The sbatchd on the host will exit.

HOST_CLOSE_REMOTE

MultiCluster--Closes a leased host on the submission cluster

In order to use updated batch LSF configuration files, the user can use lsb_reconfig() to restart the master batch daemon, mbatchd.

int  lsb_reconfig (struct mbdCtrlReq *);
struct mbdCtrlReq {
    int    opCode;          Options for configuration
    char   *name;           Reserved for future use
    char  *message;         Message attached by the admin
};

The parameter opCode is defined in lsbatch.h and should be one of the following:

MBD_RESTART

Restarts a new mbatchd

MBD_RECONFIG

Reread the configuration files

MBD_CKCONFIG

Check validity of the mbatchd configuration files

lsb_reconfig() provides the following functionality to:

On success, both lsb_hostcontrol() and lsb_reconfig(). On failure, they return -1 and set lsberrno to indicate the error.

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: March 13, 2009
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.