Fixed Bugs for Platform LSF™ Version 7 Update 5

Release Date:   March 2009

 

The following bugs have been fixed in the March 2009 update (LSF 7 Update 5) since the October 2008 update (LSF 7 Update 4):

 

122405

Date

2009-03-20

 

Description

Customer job remains in RUNNING state after execution host crashes and restarts.

 

Component

sbatchd

 

Platform

All

 

Impact

Incorrect job status and host underutilization.

 

117599

Date

2009-03-20

 

Description

Following a brequeue, the job fails due to missing job buffer files.

 

Component

sbatchd

 

Platform

Cray

 

Impact

brequeue command doesn't work.

 

114627

Date

2009-03-20

 

Description

According to documentation 3 parameters are required to turn on dynamic adding of hosts:

  • LSF_MASTER_LIST
  • LSF_HOST_ADDR_RANGE
  • LSF_DYNAMIC_HOST_WAIT_TIME

In reality only the first 2 parameters are required to add hosts dynamically.

 

Component

lim

 

Platform

All

 

Impact

Customers are unintentionally adding hosts to the cluster dynamically.

 

121696

Date

2009-03-18

 

Description

Customer job remains in the RUN state without running if the pre-exec fails once.

 

Component

sbatchd

 

Platform

All

 

Impact

Job doesn't run and host resources are wasted.

 

118519

Date

2009-03-16

 

Description

When jobs allocate memory and fork child processes, the memory usage showed by bjobs -l is not correct since the shared memory is calculated more than once.

 

Component

 

Platform

All

 

Impact

The exact physical memory usage information is incorrect for jobs using shared memory between parent and child processes.

 

120639

Date

2009-03-13

 

Description

After installing the latest IBM service pack for POE over InfiniBand, User Space POE jobs always fail in LSF due to negative node numbers.

 

Component

 

Platform

AIX

 

Impact

Users cannot run POE jobs.

 

122383

Date

2009-03-09

 

Description

lsload reports incorrect utilization values.

 

Component

lim

 

Platform

Solaris

 

Impact

CPU utilization cannot be reported correctly.

 

122240

Date

2009-03-09

 

Description

Parameter MAX_JOB_PREEMPT does not control the number of times a License Scheduler job can be preempted.

 

Component

mbschd, License Scheduler package mbatchd

 

Platform

All

 

Impact

Jobs using limited licenses may be preempted many times.

 

120018

Date

2009-03-09

 

Description

The bjobs command reports a job is in the RUNNING state although the job has already exited leaving the application res stuck on a win2003 server.

 

Component

res

 

Platform

Windows

 

Impact

Job does not finish.

 

118801

Date

2009-03-09

 

Description

lspasswd times out on receiving a reply from the lim.

 

Component

lim

 

Platform

All

 

Impact

lspasswd fails and jobs aren't submitted.

 

113707

Date

2009-03-09

 

Description

MPI jobs get different results when run both externally and within LSF.

 

Component

intelmpi_wrapper mpich2_wrapper

 

Platform

Linux

 

Impact

The MPI local option is set to global and program may run wrong arguments.

 

111018

Date

2009-03-09

 

Description

The command bjobs in SAS LSF Version 7 Update 2 does not work well on Windows.

 

Component

bjobs

 

Platform

All

 

Impact

Cannot use bjobs.

 

110814

Date

2009-03-09

 

Description

SGI-MPI mpirun options are not recognized by pam.

 

Component

pam

 

Platform

Linux IA64/x86-64

 

Impact

Cannot use SGI-MPI mpirun options on pam command line when using pam SGI-MPI integration.

 

115193

Date

2009-03-08

 

Description

Cannot transfer files locally on a Windows machine or to a shared directory.

 

Component

lsrcp.exe

 

Platform

Windows

 

Impact

File transfer fails.

 

119899

Date

2009-03-06

 

Description

bmod -bn fails to remove the specified begin time on Windows-64 machines.

 

Component

mbatchd

 

Platform

All 64 bit architectures

 

Impact

Job continues to pend.

 

121605

Date

2009-03-05

 

Description

LSF_TMPDIR defined in lsf.conf isn't effective on Windows systems.

 

Component

res

 

Platform

Windows

 

Impact

Temporary files are saved to the directory defined in registry instead of the directory set in LSF_TMPDIR.

 

117812

Date

2009-03-05

 

Description

Perl and Python user programs fail when linked with LSF APIs; the C code calling the APIs is fine.

 

Component

liblsf.so, libbat.so

 

Platform

All

 

Impact

Customer programs cannot be used.

 

117247

Date

2009-03-05

 

Description

When a License Scheduler project name is misspelled or has mismatched capitalization, the job runs under the default project or pends if the default project is not defined.

 

Component

 

Platform

All

 

Impact

Under-utilization of License Scheduler tokens. Jobs running under the default project accidentally have the wrong user share allotment.

 

99327

Date

2009-03-04

 

Description

Jobs pend indefinitely with reason “Unable to determine user account for execution” if the user account cannot be resolved.

 

Component

mbatchd

 

Platform

All

 

Impact

Jobs pend.

 

118421

Date

2009-03-04

 

Description

Excessive lim debug messages are logged.

 

Component

lim, pim

 

Platform

All

 

Impact

When debug is on, the lim log is excessive.

 

117368

Date

2009-03-04

 

Description

Jobs are not dispatched with the pending reason "System is unable to schedule the job".

 

Component

mbschd

 

Platform

All

 

Impact

Jobs pend.

 

116145

Date

2009-03-04

 

Description

Jobs pend with reason "New job is waiting for scheduling" even when the dependent condition is satisfied.

 

Component

mbschd, mbatchd

 

Platform

All

 

Impact

Jobs pend.

 

115276

Date

2009-03-04

 

Description

Sourcing profile.lsf returns "Cannot detect the binary type" from profile.perf.

 

Component

profile.perf

 

Platform

Linux

 

Impact

Binary type failed, and PMC does not run properly since some environment variables are not updated.

 

120817

Date

2009-03-03

 

Description

Jobs submitted to LSF 7 Update 4 from older versions of an LSF client (either pre-7.0.4 or applications built with a pre-7.0.4 LSF library) can have the following problems:

  • If the job is submitted from a subdirectory under HOME with a relative path (e.g. ./job), the job goes to EXIT.
  • If the job is submitted with -k "chkpnt dir", the chkpnt dir is changed to HOME (although checkpoint still works).

 

Component

libbatch.so, libbat.a, mbatchd, bmod, bsub

 

Platform

All

 

Impact

Jobs may fail and the chkpnt dir is wrong.

 

120745

Date

2009-03-03

 

Description

The SUBMIT_TIME displayed by bjobs occupies 13 characters (although it only needs 12). The extra space results in one blank line following every record on terminals with a column width.

 

Component

bjobs

 

Platform

All

 

Impact

Extra blank lines in output.

 

120163

Date

2009-03-03

 

Description

When XC_LIBLIC is configured and working with permanent licenses, the client host cannot get the license that converts from lsf_base.

 

Component

lim

 

Platform

All

 

Impact

Client host cannot obtain a license.

 

119722

Date

2009-03-03

 

Description

Duplicated definition of license names in the lsf.licensescheduler and lsf.shared files cause jobs to pend.

 

Component

mbatchd

 

Platform

All

 

Impact

License Scheduler jobs cannot run.

 

119595

Date

2009-03-03

 

Description

Port number is missing in LSF daemon log files.

 

Component

liblsf.so nios libbat.a liblsf.a res lim libbat.so mbd sbd

 

Platform

All

 

Impact

The unknown communication port number makes debugging harder.

 

119550

Date

2009-03-03

 

Description

Unable to determine user account for execution.

 

Component

 

Platform

All

 

Impact

Customer need to kill the job and resubmit it for the job to run.

 

121747

Date

2009-03-02

 

Description

The data purger will purge all data with TIME_STAMP_GMT of 10 bits if there are both 10 and 13 bit TIME_STAMP_GMT in some tables (CONSUMER_RESOURCELIST, CONSUMER_DEMAND, LSF_BHOSTS, RESOURCE_METRICS, HOST_GROUP, etc.).

 

Component

 

Platform

All

 

Impact

Affects data when updating from LSF 7.0 to LSF 7 Update 4.

 

121155

Date

2009-03-02

 

Description

Interactive jobs with bsub -Is fail with a "broken pipe" error in the LDAP environment.

 

Component

res

 

Platform

All

 

Impact

In the LDAP environment, interactive jobs are killed by the SIGPIPE signal.

 

122151

Date

2009-03-01

 

Description

Usernames and passwords exposed in catalina.out when Webgui debug logging is turned on by editing logj.properties and a user tries to log on.

 

Component

 

Platform

All

 

Impact

Security risk.

 

96667

Date

2009-02-27

 

Description

Space bar doesn't work when using bpeek | more.

 

Component

bpeek.exe bpeek

 

Platform

Windows

 

Impact

bpeek usability is compromised.

 

121865

Date

2009-02-26

 

Description

Some parameter values such as LSF_EGO_DAEMON_CONTROL are contained in quotes while others are not.

 

Component

 

Platform

All

 

Impact

Inconsistent parameter definition formats.

 

120092

Date

2009-02-26

 

Description

${GUI_TOP}/${GUI_VERSION}/tomcat/bin/profile.ocs does not exist, but it is used by pmc_daemons.sh at line 23, 28 and 33.

 

Component

 

Platform

All

 

Impact

Cannot automatically start the PMC when the OS boots, and cannot use the service PMC command.

 

119197

Date

2009-02-18

 

Description

LSF windows installer deletes license.dat during the upgrade process.

 

Component

 

Platform

Windows

 

Impact

LSF license file is lost.

 

117717

Date

2009-02-18

 

Description

LSF Version 7 Update 4 Windows quiet installation fails.

 

Component

 

Platform

Windows

 

Impact

Unsuccessful installation.

 

115968

Date

2009-02-18

 

Description

Jobs containing multiple launches of mpirun.lsf can fail because resources are not cleaned correctly in the previous launch. Job level post exec does not help.

 

Component

mpirun.lsf

 

Platform

All

 

Impact

Job fails due to clean up issue; job post-exec does not help.

 

118924

Date

2009-02-17

 

Description

A customer using RPM's to install LSF in Linux machines has found many references to /bin/sh5 in linux which break the install (using 'rpm -Uvh --force).

 

Component

 

Platform

All

 

Impact

Unsuccessful installation.

 

117477

Date

2009-02-17

 

Description

lsf7Update3_win32.msi has errors in the install.bat script created when installing LSF 7.0.4 on Windows XP 32bits.

 

Component

lsf7Update3_win32.msi

 

Platform

Windows XP 32bits

 

Impact

install.bat cannot be used for batch installation.

 

117473

Date

2009-02-17

 

Description

%CLUSTERID%is set in install.bat, but is not handed over by the LSF7.0.4 install package.

 

Component

install package

 

Platform

All

 

Impact

Parameter CLUSTERID is missed when set during the installation;

 

114605

Date

2009-02-13

 

Description

Replacing ssh with blaunch, some mpich_mx job tasks processes disappeared after running bstop and then bresume.

 

Component

res

 

Platform

All

 

Impact

With some jobs tasks gone, job results are not correct.

 

122152

Date

2009-02-11

 

Description

Jobs submitted through PMC result in output files with incorrect permission settings.

 

Component

PMC

 

Platform

All

 

Impact

Output files may have incorrect permission settings.

 

119639

Date

2009-02-11

 

Description

lim exits without an error message when users detach the processor from the host while running on Solaris.

 

Component

lim

 

Platform

Solaris

 

Impact

Cannot execute jobs.

 

118208

Date

2009-02-11

 

Description

blcollect fails to parse lmstat output.

 

Component

blcollect

 

Platform

UNIX

 

Impact

A license without a handle from the lmstat output is ignored by blcollect.

 

117721

Date

2009-02-11

 

Description

The description of USE_SUSP_SLOTS is incorrect.

 

Component

bparams

 

Platform

All

 

Impact

Misleading USE_SUSP_SLOTS parameter.

 

117290

Date

2009-02-11

 

Description

If the new job refresh feature is turned on, the jobs group information shown by bjobs is lost.

 

Component

mbatchd

 

Platform

All

 

Impact

The new job refresh feature cannot be used with job groups.

 

117215

Date

2009-02-11

 

Description

blcollect does not parse lmstat output correctly leading to an incorrect license count.

 

Component

blcollect

 

Platform

UNIX

 

Impact

blcollect reports incorrect token usage to bld.

 

116604

Date

2009-02-11

 

Description

LSF 6.2 Windows installer fails on an Intel64 machine.

 

Component

lsf6.2_win.exe

 

Platform

Windows

 

Impact

LSF 6.2 Windows installer fails on an Intel64 machine.

 

116600

Date

2009-02-11

 

Description

bread gives the wrong output when using bkill for Session Scheduler jobs.

 

Component

ssched_real

 

Platform

All

 

Impact

bread cannot get correct information about session scheduler jobs.

 

115124

Date

2009-02-11

 

Description

sbatchd adds LSF_BINDIR into PATH even if it is inside PATH already.

 

Component

install

 

Platform

UNIX

 

Impact

PATH length increases unnecessarily.

 

114379

Date

2009-02-11

 

Description

Users of remote desktops using Windows 2008 to run lsadmin reconfig see an error message.

 

Component

bstop, bkill, lsadmin, bsub

 

Platform

Windows

 

Impact

Cannot use remote desktop connections to run a program under Windows 2008 to control LSF.

 

118014

Date

2009-02-09

 

Description

License Scheduler fails to install a second binary type when using standalone mode.

 

Component

setup

 

Platform

UNIX

 

Impact

Customer can manually create the necessary directories and re-run the setup script as a work around.

 

118474

Date

2009-02-04

 

Description

Detailed reasons not in log file.

 

Component

res.exe

 

Platform

Windows

 

Impact

Difficult to troubleshoot cmd.exe permission issues.

 

101754

Date

2009-01-23

 

Description

Fairshare queues will reject a job submission if the fairshare user group has all members defined in it and there is another user group defined with the submission user as a specific member.

 

Component

mbatchd

 

Platform

All

 

Impact

Some users can not submit jobs to a queue.

 

116122

Date

2009-01-22

 

Description

sbatchd service shuts down when the job array index of a submitted job array is greater than 232830

 

Component

sbatchd.exe

 

Platform

Windows

 

Impact

sbatchd shuts down.

 

115297

Date

2009-01-21

 

Description

With MultiCluster enabled and a RemoteClusters section defined, "bhosts remote_cluster_name" can shut down the master lim.

 

Component

lim

 

Platform

All

 

Impact

Local master lim shuts down.

 

113134

Date

2009-01-21

 

Description

With MultiCluster enabled, if a job is suspended and then resumed after it is forwarded to a remote cluster it is possible for the job to run twice, once on the submission cluster and once on the remote cluster.

 

Component

mbschd bhist bjobs mbatch

 

Platform

All

 

Impact

Job slots are misused.

 

116305

Date

2009-01-19

 

Description

sbatchd sets a null entry "::" in LD_LIBRARY_PATH if EGO is disabled.

 

Component

sbatchd

 

Platform

All

 

Impact

User applications (scripts) that use LD_LIBRARY_PATH may fail due to the null entry.

 

115955

Date

2009-01-19

 

Description

Condensed pending reason behavior changed in LSF 7.0 (individual host based reasons are not used).

 

Component

schmod_default.so mbschd bparams mbatchd

 

Platform

All

 

Impact

Customer pending reason monitoring script broken on upgrade.

 

113919

Date

2009-01-19

 

Description

Cannot get correct LSF binaries PATH using source on the environment file on RHEL4U7 x86,because /proc/xen exists.

 

Component

install

 

Platform

Linux

 

Impact

Cannot get correct LSF binaries PATH using source on the environment file.

 

118814

Date

2009-01-15

 

Description

In 7.0.3 a backfill job CAN be preempted even when PREEMPT_JOBTYPE=BACKFILL is not set.

 

Component

mbatchd

 

Platform

All

 

Impact

Backward compatibility broken.

 

118276

Date

2009-01-14

 

Description

LSF batch job ignores SIGHUP so SIGHUP cannot be delivered to the job.

 

Component

sbatchd res

 

Platform

UNIX

 

Impact

Customer applications cannot catch SIGHUP and make use of it.

 

117769

Date

2009-01-13

 

Description

sbatchd tries to transfer the job input file even when the file is shared.

 

Component

sbatchd

 

Platform

All

 

Impact

Unnecessary error messages in the job error file.

 

117377

Date

2009-01-09

 

Description

Inconsistent allocation/deallocation for a job with a job group may cause potential problems.

 

Component

schmod_limit.so schmod_preemption.so

 

Platform

All

 

Impact

Error messages in mbschd log indicate the inconsistent allocation/deallocation may cause potential scheduling problems.

 

115963

Date

2009-01-09

 

Description

mbatchd spends a lot of time checking for dependencies, affecting the mbatchd performance.

 

Component

mbatchd

 

Platform

All

 

Impact

Performance impact for mbatchd event replay.

 

114194

Date

2009-01-09

 

Description

The cpu time displayed by bjobs is not right.

 

Component

pim

 

Platform

All

 

Impact

Dead PGIDs cpu time not accumulated for LSF jobs. Jobs may be incorrectly considered idle.

 

115080

Date

2009-01-08

 

Description

Condensed host list not working with bjobs.

 

Component

bjobs

 

Platform

All

 

Impact

Condensed host group output from bjobs is not available.

 

113880

Date

2009-01-08

 

Description

Rerunnable jobs cannot rerun after being dispatched to a host, if a network glitch such as a temporary loss of communication occurs.

 

Component

mbschd

 

Platform

All

 

Impact

Job pends.

 

76955

Date

2009-01-06

 

Description

Dynamic slave lim calls the master lim 20 times if the slave lim host name is not resolvable, adding to the master lim load.

 

Component

lim

 

Platform

UNIX

 

Impact

Master lim performance is affected.

 

117065

Date

2009-01-04

 

Description

rusage debugging is turned off automatically by sbatchd causing inconvenience for debugging.

 

Component

sbatchd

 

Platform

All

 

Impact

Difficult to capture debug data.

 

116800

Date

2009-01-04

 

Description

Duration string beyond that defined at queue level returns the message 'Bad resource requirement. Job not submitted'.

 

Component

mbatchd

 

Platform

All

 

Impact

Cannot submit jobs with desired resource duration.

 

116333

Date

2009-01-04

 

Description

Job status from bjobs –Al is PEND when job is complete.

 

Component

bjobs

 

Platform

All

 

Impact

Confusing job status.

 

115419

Date

2008-12-31

 

Description

When using Kerberos for authentication but not using the Platform Kerberos integration, sbatchd deletes the /tmp file mentioned in KRB5CCNAME env variable after batch job complete.

 

Component

sbatchd res

 

Platform

Linux

 

Impact

Customer manually has to unset the KRB5CCNAME parameter in a job starter to run jobs in LSF.

 

114282

Date

2008-12-30

 

Description

The command bjobs -u displays all user's jobs in lsf7.0.2 and later versions although LSB_SECURE_JOBINFO_USERS=Y is set in lsf.conf.

 

Component

bjobs

 

Platform

All

 

Impact

LSB_SECURE_JOBINFO_USERS does not work as expected in lsf7.0.2 and later versions.

 

116466

Date

2008-10-29

 

Description

Customer gets a java exception after logging into the PMC console.

 

Component

install

 

Platform

All

 

Impact

Customer is unable to use PMC.

 

115542

Date

2008-09-27

 

Description

bparams returns an error code.

 

Component

bparams mbatchd

 

Platform

All

 

Impact

Cannot use bparams.

 

109128

Date

2008-09-16

 

Description

No email is sent for job idle exceptions.

 

Component

mbatchd

 

Platform

Windows

 

Impact

LSF admin is not notified for idle job exceptions.

Technical Support

support@platform.com

www.platform.com

 

North America: +1 905 948 4297

Europe: +44 1256 370 530

Asia: +86 10 6238 1125

Toll-free: 1-877-444-4573

 

Platform Support

Platform Computing Corporation

3760 14th Avenue

Markham, Ontario

Canada L3R 3T7

Copyright

© 1994 - 2009 Platform Computing Corporation

All Rights Reserved.

Although the information in this document has been carefully reviewed, Platform Computing Corporation  (“Platform”) does not warrant it to be free of errors or omissions. Platform reserves the right to make corrections, updates, revisions or changes to the information in this document.

UNLESS OTHERWISE EXPRESSLY STATED BY PLATFORM, THE PROGRAM DESCRIBED IN THIS DOCUMENT IS PROVIDED “AS IS” AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL PLATFORM COMPUTING BE LIABLE TO ANYONE FOR SPECIAL, COLLATERAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING WITHOUT LIMITATION ANY LOST PROFITS, DATA, OR SAVINGS, ARISING OUT OF THE USE OF OR INABILITY TO USE THIS PROGRAM.

 

Document redistribution policy : This document is protected by copyright and you may not redistribute or translate it into another language, in part or in whole. You may only redistribute this document internally within your organization (for example, on an intranet).

Trademarks

LSF is a registered trademark of Platform Computing Corporation in the United States and in other jurisdictions.

 

ACCELERATING INTELLIGENCE, THE BOTTOM LINE IN DISTRIBUTED COMPUTING, PLATFORM COMPUTING, CLUSTERWARE, PLATFORM ACTIVECLUSTER, IT INTELLIGENCE, SITEASSURE, PLATFORM SYMPHONY, PLATFORM JOBSCHEDULER, PLATFORM INTELLIGENCE, PLATFORM INFRASTRUCTURE INSIGHT, PLATFORM WORKLOAD INSIGHT, and the PLATFORM and LSF logos are trademarks of Platform Computing Corporation in the United States and in other jurisdictions.

 

UNIX is a registered trademark of The Open Group in the United States and in other jurisdictions.

 

Microsoft is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries.

Windows is a registered trademark of Microsoft Corporation in the United States and other countries.

 

Other products or services mentioned in this document are identified by the trademarks or service marks of their respective owners.