bacct

Displays accounting statistics about finished jobs.

Synopsis

bacct [-b | -l] [-d] [-e] [-w] [-x] [-app application_profile_name] [-C time0,time1] [-D time0,time1] [-f logfile_name] [-Lp ls_project_name ...] [-m host_name ...|-M host_list_file] [-N host_name | -N host_model | -N cpu_factor] [-P project_name ...] [-q queue_name ...] [-sla service_class_name ...] [-S time0,time1] [-u user_name ... | -u all] [-f logfile_name] [job_ID ...] [-U reservation_ID ... | -U all]
bacct [-h | -V]

Description

Displays a summary of accounting statistics for all finished jobs (with a DONE or EXIT status) submitted by the user who invoked the command, on all hosts, projects, and queues in the Platform LSF system. bacct displays statistics for all jobs logged in the current Platform LSF accounting log file: LSB_SHAREDIR/cluster_name/logdir/lsb.acct.

CPU time is not normalized.

All times are in seconds.

Statistics not reported by bacct but of interest to individual system administrators can be generated by directly using awk or perl to process the lsb.acct file.

Throughput calculation

The throughput (T) of the Platform LSF system, certain hosts, or certain queues is calculated by the formula:

T = N/(ET-BT)

where:

  • N is the total number of jobs for which accounting statistics are reported

  • BT is the Start time :  when the first job was logged

  • ET is the End time: when the last job was logged

You can use the option -C time0,time1 to specify the Start time as time0 and the End time as time1. In this way, you can examine throughput during a specific time period.

Jobs involved in the throughput calculation are only those being logged (that is, with a DONE or EXIT status). Jobs that are running, suspended, or that have never been dispatched after submission are not considered, because they are still in the Platform LSF system and not logged in lsb.acct.

The total throughput of the Platform LSF system can be calculated by specifying -u all without any of the -m, -q, -S, -D or job_ID options. The throughput of certain hosts can be calculated by specifying -u all without the -q, -S, -D or job_ID options. The throughput of certain queues can be calculated by specifying -u all without the -m, -S, -D or job_ID options.

bacct does not show local pending batch jobs killed using bkill -b. bacct shows MultiCluster jobs and local running jobs even if they are killed using bkill -b.

Options

-b

Brief format.

-d

Displays accounting statistics for successfully completed jobs (with a DONE status).

-e

Displays accounting statistics for exited jobs (with an EXIT status).

-l

Long format. Displays detailed information for each job in a multiline format.

If the job was submitted with bsub -K, the -l option displays Synchronous Execution.

-w

Wide field format.

-x

Displays jobs that have triggered a job exception (overrun, underrun, idle, runtime_est_exceeded). Use with the -l option to show the exception status for individual jobs.

-app application_profile_name

Displays accounting information about jobs submitted to the specified application profile. You must specify an existing application profile configured in lsb.applications.

-C time0,time1

Displays accounting statistics for jobs that completed or exited during the specified time interval. Reads lsb.acct and all archived log files (lsb.acct.n) unless -f is also used.

The time format is the same as in bhist.

-D time0,time1

Displays accounting statistics for jobs dispatched during the specified time interval. Reads lsb.acct and all archived log files (lsb.acct.n) unless -f is also used.

The time format is the same as in bhist.

-f logfile_name

Searches the specified job log file for accounting statistics. Specify either an absolute or relative path.

Useful for offline analysis.

The specified file path can contain up to 4094 characters for UNIX, or up to 512 characters for Windows.

-Lp ls_project_name ...

Displays accounting statistics for jobs belonging to the specified License Scheduler projects. If a list of projects is specified, project names must be separated by spaces and enclosed in quotation marks (") or (’).

-M host_list_file
Displays accounting statistics for jobs dispatched to the hosts listed in a file (host_list_file) containing a list of hosts. The host list file has the following format:
  • Multiple lines are supported

  • Each line includes a list of hosts separated by spaces

  • The length of each line must be less than 512 characters

-m host_name ...

Displays accounting statistics for jobs dispatched to the specified hosts.

If a list of hosts is specified, host names must be separated by spaces and enclosed in quotation marks (") or ('), and maximum length cannot exceed 1024 characters.

-N host_name | -N host_model | -N cpu_factor

Normalizes CPU time by the CPU factor of the specified host or host model, or by the specified CPU factor.

If you use bacct offline by indicating a job log file, you must specify a CPU factor.

-P project_name ...

Displays accounting statistics for jobs belonging to the specified projects. If a list of projects is specified, project names must be separated by spaces and enclosed in quotation marks (") or (’). You cannot use one double quote and one single quote to enclose the list.

-q queue_name ...

Displays accounting statistics for jobs submitted to the specified queues.

If a list of queues is specified, queue names must be separated by spaces and enclosed in quotation marks (") or (’).

-S time0,time1

Displays accounting statistics for jobs submitted during the specified time interval. Reads lsb.acct and all archived log files (lsb.acct.n) unless -f is also used.

The time format is the same as in bhist.

-sla service_class_name

Displays accounting statistics for jobs that ran under the specified service class.

If a default system service class is configured with ENABLE_DEFAULT_EGO_SLA in lsb.params but not explicitly configured in lsb.applications, bacct -sla service_class_name displays accounting information for the specified default service class.

-U reservation_id ... | -U all

Displays accounting statistics for the specified advance reservation IDs, or for all reservation IDs if the keyword all is specified.

A list of reservation IDs must be separated by spaces and enclosed in quotation marks (") or (’).

The -U option also displays historical information about reservation modifications.

When combined with the -U option, -u is interpreted as the user name of the reservation creator. For example:

bacct -U all -u user2

shows all the advance reservations created by user user2.

Without the -u option, bacct -U shows all advance reservation information about jobs submitted by the user.

In a MultiCluster environment, advance reservation information is only logged in the execution cluster, so bacct displays advance reservation information for local reservations only. You cannot see information about remote reservations. You cannot specify a remote reservation ID, and the keyword all only displays information about reservations in the local cluster.

-u user_name ...|-u all

Displays accounting statistics for jobs submitted by the specified users, or by all users if the keyword all is specified.

If a list of users is specified, user names must be separated by spaces and enclosed in quotation marks (") or (’). You can specify both user names and user IDs in the list of users.

job_ID ...

Displays accounting statistics for jobs with the specified job IDs.

If the reserved job ID 0 is used, it is ignored.

-h

Prints command usage to stderr and exits.

-V

Prints Platform LSF release version to stderr and exits.

Default output format (SUMMARY)

Statistics on jobs. The following fields are displayed:

  • Total number of done jobs

  • Total number of exited jobs

  • Total CPU time consumed

  • Average CPU time consumed

  • Maximum CPU time of a job

  • Minimum CPU time of a job

  • Total wait time in queues

  • Average wait time in queue

  • Maximum wait time in queue

  • Minimum wait time in queue

  • Average turnaround time (seconds/job)

  • Maximum turnaround time

  • Minimum turnaround time

  • Average hog factor of a job (cpu time/turnaround time)

  • Maximum hog factor of a job

  • Minimum hog factor of a job

  • Total throughput

  • Beginning time: the completion or exit time of the first job selected

  • Ending time: the completion or exit time of the last job selected

The total, average, minimum, and maximum statistics are on all specified jobs.

The wait time is the elapsed time from job submission to job dispatch.

The turnaround time is the elapsed time from job submission to job completion.

The hog factor is the amount of CPU time consumed by a job divided by its turnaround time.

The throughput is the number of completed jobs divided by the time period to finish these jobs (jobs/hour).

Brief format (-b)

In addition to the default format SUMMARY, displays the following fields:

U/UID

Name of the user who submitted the job. If LSF fails to get the user name by getpwuid, the user ID is displayed.

QUEUE

Queue to which the job was submitted.

SUBMIT_TIME

Time when the job was submitted.

CPU_T

CPU time consumed by the job.

WAIT

Wait time of the job.

TURNAROUND

Turnaround time of the job.

FROM

Host from which the job was submitted.

EXEC_ON

Host or hosts to which the job was dispatched to run.

JOB_NAME

The job name assigned by the user, or the command string assigned by default at job submission with bsub. If the job name is too long to fit in this field, then only the latter part of the job name is displayed.

The displayed job name or job command can contain up to 4094 characters.

Long format (-l)

Also displays host-based accounting information (CPU_T, MEM, and SWAP) for completed jobs when LSF_HPC_EXTENSIONS="HOST_RUSAGE" in lsf.conf.

In addition to the fields displayed by default in SUMMARY and by -b, displays the following fields:

JOBID

Identifier that LSF assigned to the job.

PROJECT_NAME

Project name assigned to the job.

STATUS

Status that indicates the job was either successfully completed (DONE) or exited (EXIT).

DISPAT_TIME

Time when the job was dispatched to run on the execution hosts.

COMPL_TIME

Time when the job exited or completed.

HOG_FACTOR

Average hog factor, equal to "CPU time" / "turnaround time".

MEM

Maximum resident memory usage of all processes in a job. By default, memory usage is shown in MB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for display (MB, GB, TB, PB, or EB).

CWD

Current working directory of the job.

SWAP

Maximum virtual memory usage of all processes in a job. By default, swap space is shown in MB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for display (MB, GB, TB, PB, or EB).

INPUT_FILE

File from which the job reads its standard input (see bsub -i input_file).

OUTPUT_FILE

File to which the job writes its standard output (see bsub -o output_file).

ERR_FILE

File in which the job stores its standard error output (see bsub -e err_file).

EXCEPTION STATUS
Possible values for the exception status of a job include:
idle

The job is consuming less CPU time than expected. The job idle factor (CPU time/runtime) is less than the configured JOB_IDLE threshold for the queue and a job exception has been triggered.

overrun

The job is running longer than the number of minutes specified by the JOB_OVERRUN threshold for the queue and a job exception has been triggered.

underrun

The job finished sooner than the number of minutes specified by the JOB_UNDERRUN threshold for the queue and a job exception has been triggered.

runtime_est_exceeded

The job is running longer than the number of minutes specified by the runtime estimation and a job exception has been triggered.

SYNCHRONOUS_EXECUTION

Job was submitted with the -K option. LSF submits the job and waits for the job to complete.

JOB_DESCRIPTION

The job description assigned by the user at job submission with bsub. This field is omitted if no job description has been assigned.

The displayed job description can contain up to 4094 characters.

Advance reservations (-U)

Displays the following fields:

RSVID

Advance reservation ID assigned by brsvadd command

TYPE

Type of reservation: user or system

CREATOR

User name of the advance reservation creator, who submitted the brsvadd command

USER

User name of the advance reservation user, who submitted the job with bsub -U

NCPUS

Number of CPUs reserved

RSV_HOSTS

List of hosts for which processors are reserved, and the number of processors reserved

TIME_WINDOW

Time window for the reservation.

  • A one-time reservation displays fields separated by slashes (month/day/hour/minute). For example:

    11/12/14/0-11/12/18/0

  • A recurring reservation displays fields separated by colons (day:hour:minute). For example:

    5:18:0 5:20:0

Termination reasons displayed by bacct

When LSF detects that a job is terminated, bacct -l displays one of the following termination reasons. The corresponding integer value logged to the JOB_FINISH record in lsb.acct is given in parentheses.

  • TERM_ADMIN: Job killed by root or LSF administrator (15)

  • TERM_BUCKET_KILL: Job killed with bkill -b (23)

  • TERM_CHKPNT: Job killed after checkpointing (13)

  • TERM_CWD_NOTEXIST: current working directory is not accessible or does not exist on the execution host (25)

  • TERM_CPULIMIT: Job killed after reaching LSF CPU usage limit (12)

  • TERM_DEADLINE: Job killed after deadline expires (6)

  • TERM_EXTERNAL_SIGNAL: Job killed by a signal external to LSF (17)

  • TERM_FORCE_ADMIN: Job killed by root or LSF administrator without time for cleanup (9)

  • TERM_FORCE_OWNER: Job killed by owner without time for cleanup (8)

  • TERM_LOAD: Job killed after load exceeds threshold (3)

  • TERM_MEMLIMIT: Job killed after reaching LSF memory usage limit (16)

  • TERM_OWNER: Job killed by owner (14)

  • TERM_PREEMPT: Job killed after preemption (1)

  • TERM_PROCESSLIMIT: Job killed after reaching LSF process limit (7)

  • TERM_REQUEUE_ADMIN: Job killed and requeued by root or LSF administrator (11)

  • TERM_REQUEUE_OWNER: Job killed and requeued by owner (10)

  • TERM_RUNLIMIT: Job killed after reaching LSF run time limit (5)

  • TERM_SWAP: Job killed after reaching LSF swap usage limit (20)

  • TERM_THREADLIMIT: Job killed after reaching LSF thread limit (21)

  • TERM_UNKNOWN: LSF cannot determine a termination reason—0 is logged but TERM_UNKNOWN is not displayed (0)

  • TERM_WINDOW: Job killed after queue run window closed (2)

  • TERM_ZOMBIE: Job exited while LSF is not available (19)

Tip:

The integer values logged to the JOB_FINISH record in lsb.acct and termination reason keywords are mapped in lsbatch.h.

Example: Default format

bacct 
Accounting information about jobs that are: 
  - submitted by users user1. 
  - accounted on all projects.
  - completed normally or exited.
  - executed on all hosts.
  - submitted to all queues.
  - accounted on all service classes.
--------------------------------------------------------- -----
SUMMARY:      ( time unit: second ) 
 Total number of done jobs:      60      Total number of exited jobs:   118
 Total CPU time consumed:    1011.5      Average CPU time consumed:     5.7
 Maximum CPU time of a job:   991.4      Minimum CPU time of a job:     0.0
 Total wait time in queues: 134598.0
 Average wait time in queue:  756.2
 Maximum wait time in queue: 7069.0      Minimum wait time in queue:    0.0
 Average turnaround time:      3585 (seconds/job)
 Maximum turnaround time:     77524      Minimum turnaround time:         6
 Average hog factor of a job:  0.00 ( cpu time / turnaround time )
 Maximum hog factor of a job:  0.56      Minimum hog factor of a job:  0.00
 Total throughput:             0.67 (jobs/hour)  during  266.18 hours
 Beginning time:       Aug  8 15:48      Ending time:          Aug 19 17:59

Example: Jobs that have triggered job exceptions

bacct -x -l
Accounting information about jobs that are: 
  - submitted by users user1, 
  - accounted on all projects.
  - completed normally or exited
  - executed on all hosts.
  - submitted to all queues.
  - accounted on all service classes.
---------------------------------------------------------
Job <1743>, User <user1>, Project <default>, Status <DONE>, Queue <normal>,  Command<sleep 30>
Mon Aug 11 18:16:17 2009: Submitted from host <hostB>, CWD <$HOME/jobs>, Output File </dev/null>;
Mon Aug 11 18:17:22 2009: Dispatched to <hostC>;
Mon Aug 11 18:18:54 2009: Completed <done>.
 EXCEPTION STATUS:  underrun 
Accounting information about this job:
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP
      0.19       65            157     done         0.0012     4M      5M
--------------------------------------------------- --------
Job <1948>, User <user1>, Project <default>, Status <DONE>, Queue <normal>,Command <sleep 550>, Job Description <This job is a test job.>
Tue Aug 12 14:15:03 2009: Submitted from host <hostB>, CWD <$HOME/jobs>, Output File </dev/null>;
Tue Aug 12 14:15:15 2009: Dispatched to <hostC>;
Tue Aug 12 14:25:08 2009: Completed <done>.
 EXCEPTION STATUS:  overrun  idle 
Accounting information about this job:
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP
      0.20       12            605     done         0.0003     4M      5M
-------------------------------------------------------------
Job <1949>, User <user1>, Project <default>, Status <DONE>, Queue <normal>,Command <sleep 400>
Tue Aug 12 14:26:11 2009: Submitted from host <hostB>, CWD <$HOME/jobs>, Output File </dev/null>;
Tue Aug 12 14:26:18 2009: Dispatched to <hostC>;
Tue Aug 12 14:33:16 2009: Completed <done>.
 EXCEPTION STATUS:  idle 
Accounting information about this job:
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP
      0.17        7            425     done         0.0004     4M      5M
Job <719[14]>, Job Name <test[14]>, User <user1>, Project <default>, Status <EXIT>, Queue <normal>, Command </home/user1/job1>, Job Description <This job is another test job.>
Mon Aug 18 20:27:44 2009: Submitted from host <hostB>, CWD <$HOME/jobs>, Output File </dev/null>;
Mon Aug 18 20:31:16 2009: [14] dispatched to <hostA>;
Mon Aug 18 20:31:18 2009: Completed <exit>.
 EXCEPTION STATUS:  underrun 
Accounting information about this job:
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP
      0.19      212            214     exit         0.0009     2M      4M
--------------------------------------------------- -----------
SUMMARY:      ( time unit: second ) 
 Total number of done jobs:      45      Total number of exited jobs:    56
 Total CPU time consumed:    1009.1      Average CPU time consumed:    10.0
 Maximum CPU time of a job:   991.4      Minimum CPU time of a job:     0.1
 Total wait time in queues: 116864.0
 Average wait time in queue: 1157.1
 Maximum wait time in queue: 7069.0      Minimum wait time in queue:    7.0
 Average turnaround time:      1317 (seconds/job)
 Maximum turnaround time:      7070      Minimum turnaround time:        10
 Average hog factor of a job:  0.01 ( cpu time / turnaround time )
 Maximum hog factor of a job:  0.56      Minimum hog factor of a job:  0.00
 Total throughput:             0.59 (jobs/hour)  during  170.21 hours
 Beginning time:       Aug 11 18:18      Ending time:          Aug 18 20:31

Example: Advance reservation accounting information

bacct -U user1#2
Accounting for:
  - advance reservation IDs: user1#2
  - advance reservations created by user1
-------------------------------------------------------- -----------
RSVID       TYPE      CREATOR    USER    NCPUS       RSV_HOSTS     TIME_WINDOW
user1#2     user        user1   user1      1           hostA:1    9/16/17/36-9/16/17/38
SUMMARY:
Total number of jobs:               4
Total CPU time consumed:      0.5 second
Maximum memory of a job:     4.2 MB
Maximum swap of a job:         5.2 MB
Total duration time:                 0 hour    2 minute    0 second

Example: LSF job termination reason logging

When a job finishes, LSF reports the last job termination action it took against the job and logs it into lsb.acct.

If a running job exits because of node failure, LSF sets the correct exit information in lsb.acct, lsb.events, and the job output file.

Use bacct -l to view job exit information logged to lsb.acct:
bacct -l 7265
Accounting information about jobs that are: 
  - submitted by all users.
  - accounted on all projects.
  - completed normally or exited
  - executed on all hosts.
  - submitted to all queues.
  - accounted on all service classes.
--------------------------------------------------------- ---------
Job <7265>, User <lsfadmin>, Project <default>, Status <EXIT>, Queue <normal>, Command <srun sleep 100000>, Job Description <This job is also a test job.>
Thu Sep 16 15:22:09 2009: Submitted from host <hostA>, CWD <$HOME>;
Thu Sep 16 15:22:20 2009: Dispatched to 4 Hosts/Processors <4*hostA>;
Thu Sep 16 15:23:21 2009: Completed <exit>; TERM_RUNLIMIT: job killed after reaching LSF run time limit.
Accounting information about this job:
     Share group charged </lsfadmin>
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP
      0.04       11             72     exit         0.0006     0K      0K
---------------------------------------------------------- ----------
SUMMARY:      ( time unit: second ) 
 Total number of done jobs:       0      Total number of exited jobs:     1
 Total CPU time consumed:       0.0      Average CPU time consumed:     0.0
 Maximum CPU time of a job:     0.0      Minimum CPU time of a job:     0.0
 Total wait time in queues:    11.0
 Average wait time in queue:   11.0
 Maximum wait time in queue:   11.0      Minimum wait time in queue:   11.0
 Average turnaround time:        72 (seconds/job)
 Maximum turnaround time:        72      Minimum turnaround time:        72
 Average hog factor of a job:  0.00 ( cpu time / turnaround time )
 Maximum hog factor of a job:  0.00      Minimum hog factor of a job:  0.00

Example: Resizable job information

Use bacct -l to view resizable job information logged to lsb.acct:
  • The autoresizable attribute of a job and the resize notification command if bsub -ar and bsub -rnc resize_notification_command are specified.

  • Job allocation changes whenever a JOB_RESIZE event is logged to lsb.acct.

When an allocation grows, bacct shows:
Additional allocation on num_hosts Hosts/Processors host_list
When an allocation shrinks, bacct shows
Release allocation on num_hosts Hosts/Processors host_list

For example, assume, a job submitted as

bsub -n 1, 5 -ar myjob

and the initial allocation is on hostA and hostB. The first resize request is allocated on hostC and hostD. A second resize request is allocated on hostE. bacct -l displays:

bacct -l 205
Accounting information about jobs that are:
  - submitted by all users.
  - accounted on all projects.
  - completed normally or exited
  - executed on all hosts.
  - submitted to all queues.
  - accounted on all service classes.
--------------------------------------------- ----
Job <1150>, User <user2>, Project <default>, Status <DONE>, Queue <normal>, Command <sleep 10>, Job Description <This job is a test job.>
Mon Jun  2 11:42:00 2009: Submitted from host <hostA>, CWD <$HOME>;
Mon Jun  2 11:43:00 2009: Dispatched to 2 Hosts/Processors <hostA> <hostB>;
Mon Jun  2 11:43:52 2009: Additional allocation on 2 Hosts/Processors <hostC> <hostD>;
Mon Jun  2 11:44:55 2009: Additional allocation on <hostE>;
Mon Jun  2 11:51:40 2009: Completed <done>.
...

Files

Reads lsb.acct, lsb.acct.n.

See also

bhist, bsub, bjobs, lsb.acct, brsvadd, brsvs, bsla, lsb.serviceclasses