Fixed Bugs for Platform LSF™ Version 7 Update 3

Release Date:   May 2008

 

The following bugs have been fixed in the May 2008 update (LSF 7 Update 3) since the November 2007 update (LSF 7 Update 2):

 

95571

Date

2008-04-29

 

Description

brsvdel returns an exit code 0 when users try to delete a reservation ID which is not valid

 

Component

brsvdel

 

Platform

All

 

Impact

Affects customer scripts which expect a non zero return value.

 

105479

Date

2008-04-29

 

Description

bjobs shows "exit" status, bjobs -l shows "zombi" status

 

Component

bjobs mbatchd

 

Platform

All

 

Impact

Cannot tell which status is correct, which may affect other running jobs

 

104503

Date

2008-04-29

 

Description

The report of Active Job States Statistics by Queue is inaccurate.

 

Component

mbatchd

 

Platform

All

 

Impact

Inaccurate data in LSF reports

 

91838

Date

2008-04-28

 

Description

User job failed to query registry key of HKEY_CURRENT_USER

 

Component

sbatchd.exe

 

Platform

Windows

 

Impact

User job may fail

 

 

72032

Date

2008-04-28

 

Description

In AIX, lsrcp cannot work properly when source and target files are identical.

 

Component

libbase.so liblsf.a res

 

Platform

AIX

 

Impact

The same file cannot be detected correctly on AIX

 


 

103932

Date

2008-04-28

 

Description

Advance reservation files are not created under LSB_LOCALDIR when the duplicate event logging feature is turned on

 

Component

mbatchd

 

Platform

All

 

Impact

Users may lose advance reservation definitions

 

102929

Date

2008-04-28

 

Description

brequeue is not executing quickly because mbd always handle pending signals on other jobs first and keep retrying for them

 

Component

mbatchd

 

Platform

All

 

Impact

Jobs cannot be brequeued because of being blocked by some jobs with re-trying pending signals.

 

99763

Date

2008-04-25

 

Description

Both LS and LSF have to do scheduling to launch a job which has a requirement for an LS managed token. Often, there is contention for more than one type of resource (licenses, slots, memory are the usual ones). In a possible scenario, the job can satisfy all resource requirements but not the required LS token. In this case, mbatchd asks bld to reallocate for the demand. When bld does allocate sufficient tokens for the job, the scheduler may have consumed all slots already and the pending reason now changes to "not enough slots", and mbatchd removes the token demand. bld then deallocates based on lack of demand and the process can start all over when slots free up.

 

Component

mbatchd

 

Platform

UNIX

 

Impact

Jobs cannot be dispatched

 

105542

Date

2008-04-25

 

Description

mbatchd silently shuts down sbatchd connection

 

Component

mbatchd

 

Platform

All

 

Impact

Reduced cluster usage and LSF mbatchd performance

 

104394

Date

2008-04-24

 

Description

Project name is not assigned to user group name for job array when ENFORCE_FAIRSHARE_PROJ is enabled

 

Component

mbatchd

 

Platform

All

 

Impact

User jobs may be charged with unexpected SAAP

 


 

102799

Date

2008-04-24

 

Description

LSF integration with Linux-PAM to set resource limit for individual user/usergroups

 

Component

bqueues bjobs sbatchd mbatchd

 

Platform

All

 

Impact

None

 

100453

Date

2008-04-24

 

Description

lim under Xen always reports hardware information through Xen command

 

Component

lim

 

Platform

All

 

Impact

Information reported by lim may be wrong

 

104141

Date

2008-04-23

 

Description

"lsadmin limstartup" cannot start Symphony multi-head cluster

 

Component

badmin lsadmin

 

Platform

All

 

Impact

Cannot start multi-head cluster

 

104026

Date

2008-04-23

 

Description

Misleading application profile error message from bsub

 

Component

n/a

 

Platform

All

 

Impact

Unclear error message

 

106144

Date

2008-04-22

 

Description

Interactive job is terminated after sbatchd restart in LSF HPC/SLURM integartion.

 

Component

sbatchd

 

Platform

linux2.4-glibc2.3-ia64-slurm linux2.4-glibc2.3-x86-slurm linux2.4-glibc2.3-x86_64-slurm linux2.6-glibc2.3-ia64-slurm linux2.6-glibc2.3-x86_64-slurm

 

Impact

Interactive jobs are terminated prematurely after sbatchd restarts

 

105498

Date

2008-04-22

 

Description

Code in xdr_parameterInfo() breaks backward compability

 

Component

bparams mbatchd

 

Platform

All

 

Impact

bparams may fail in a partially upgraded cluster

 


 

104510

Date

2008-04-21

 

Description

mbatchd is killed by SIGKILL but no relevant messages are logged.

 

Component

mbatchd

 

Platform

All

 

Impact

No relevant mbatchd logs makes the troubleshooting difficult and time consuming

 

106243

Date

2008-04-20

 

Description

With duplicate event logging enabled, mbatchd data replication child is slow.

 

Component

sbatchd

 

Platform

All

 

Impact

Batch system not available

 

105385

Date

2008-04-18

 

Description

Queue level and application level order[] string should not be ignored by any resource requirement

 

Component

mbatchd

 

Platform

All

 

Impact

Incorrect resource requirements processing

 

91148

Date

2008-04-17

 

Description

LSF dispatches a job submitted with a RUNLIMIT even though the job run time overlaps with an existing advance reservation.

 

Component

bsub

 

Platform

All

 

Impact

Advance reservation cannot be used reliably.

 

96936

Date

2008-04-16

 

Description

ntblstatus command exists on a system not equipped with Federation switches.

 

Component

poejob

 

Platform

AIX 5-32 AIX 5-64

 

Impact

Job fails because LSF detects wrong integration type

 

105445

Date

2008-04-15

 

Description

mbatchd hangs when duplicate event switching is turned on

 

Component

mbatchd

 

Platform

Linux 2.6

 

Impact

mbatchd cannot start

 


 

92592

Date

2008-04-10

 

Description

mbschd schedules jobs slowly with more than 300 resource requirements

 

Component

schmod_default.so mbschd

 

Platform

All

 

Impact

Job dispatch is slow

 

105549

Date

2008-04-10

 

Description

lim is unlicensed on HP-XC on linux2.6-glibc2.3-x86_64

 

Component

lim

 

Platform

linux2.6-glibc2.3-x86_64

 

Impact

lim is unlicensed

 

103073

Date

2008-04-10

 

Description

Replace ssh with blaunch and the job fails right away.

 

Component

blaunch

 

Platform

All

 

Impact

After replacing ssh with blaunch, some jobs fail

 

102687

Date

2008-04-08

 

Description

Redirect command "cleartool setview -exec "sbatchd -d /sw/platform/lsf/conf -s 8:6 -2" view" stdout/stderr into /tmp/daemons.wrap.log

 

Component

daemons.wrap daemon.wrap

 

Platform

UNIX and Linux

 

Impact

None

 

99043

Date

2008-04-06

 

Description

res periodically posts requests to master lim, causing master lim to slow down

 

Component

res

 

Platform

All

 

Impact

Master lim slows down

 

96946

Date

2008-04-01

 

Description

Cannot limit the job range for fairshare queue dynamic user

 

Component

sbatchd mbatchd

 

Platform

All

 

Impact

Customer enhancement

 


 

103436

Date

2008-03-31

 

Description

lsbevents loader core dumps parsing events with parallel jobs. lsb.stream is read by the loader.

 

Component

LSF reporting

 

Platform

All

 

Impact

lsb.streams is not read by the lsbevents loader. All job related reports have no data.

 

105555

Date

2008-03-30

 

Description

lim logs incorrect license requirement when more than 10,000 licenses needed

 

Component

lim

 

Platform

All

 

Impact

Message in LIM log with incorrect license count

 

99731

Date

2008-03-27

 

Description

Single parameter value in form of space seperated multiple string is broken to multiple parameter values in mpichp4 integration

 

Component

mpirun.lsf mpichp4_wrapper

 

Platform

All

 

Impact

User application failure due to wrong parameters passed in

 

95661

Date

2008-03-27

 

Description

With SLOTS_PER_PROCESSOR is set in lsb.resources, A job using advance reservation cannot be dispatched after suspending another normal job for slot during the active perioid of the reservation.

 

Component

schmod_advrsv.so

 

Platform

All

 

Impact

Idle slots unusable, the job pends.

 

68729

Date

2008-03-27

 

Description

When an advance reservation is active, and non advance reservation jobs are suspended, but the queue-level slots are not released defined by "QJOB_LIMIT". So users still cannot use the cpu reserved by advanced reservation.

 

Component

 

 

Platform

All

 

Impact

Users cannot use slots reserved by advanced reservation.

 

87837

Date

2008-03-25

 

Description

mbatchd log is filled up with "rusageJob: Job fails in getJobData()" if a job is forced to be killed

 

Component

bkill mbatchd

 

Platform

All

 

Impact

mbatchd logs many error messages and job remains running in sbatchd

 

103632

Date

2008-03-24

 

Description

Support memory limit on both process level and job level in HPC cluster

 

Component

bapp libbatch.so libbat.a sbatchd mbatchd

 

Platform

All

 

Impact

Customer enhancement

 

96563

Date

2008-03-20

 

Description

NTBL_JOB_KEY in poejob is not generated correctly -- could be out of bounds

 

Component

poejob

 

Platform

AIX 5-32, AIX5-64

 

Impact

Job fails

 

96209

Date

2008-03-20

 

Description

InfiniBand integration on AIX needs to handle unavailable when IBM API returns zero ports

 

Component

lsnrt_windows

 

Platform

AIX 5-32, AIX 5-64

 

Impact

Job fails if no ports are available for POE jobs over InfiniBand

 

95121

Date

2008-03-19

 

Description

Network ID configured to be a large number with IPV6 enabled on AIX POE over InfiniBand becomes 0 when loading nrt windows

 

Component

lsnrt_windows

 

Platform

AIX 5.3-32, AIX 5.3-64

 

Impact

nrt windows cannot be loaded correctly

 

105152

Date

2008-03-17

 

Description

rla may die if it cannot delete a cpuset.

 

Component

rla

 

Platform

cpuset integration

 

Impact

Minimal impact observed by the user since a new rla is started by the sbatchd.

 

101658

Date

2008-03-12

 

Description

In MultiCluster environment, user submits a job to the remote cluster; - When the job has already done in the remote cluster, user still can see it from the local cluster. job usage information not sent back to submission cluster.

 

Component

mbatchd

 

Platform

All

 

Impact

Job is not returned from the remote execution cluster. The job is gone from the remote cluster but still in running status in the submission cluster. bkill and bkill -r do not remove the job.

 

"badmin mbdrestart" on the remote cluster needed to remove the job. Then, the job is in pending mode again in the submission cluster (and is dispatched again) and can be killed.

 

97668

Date

2008-03-11

 

Description

LSF 7.0 LIM appends lsf.conf parameters in ego.conf without any warning or notification

 

Component

LIM

 

Platform

All

 

Impact

Customer is not aware of changes made to ego.conf

 

102521

Date

2008-03-11

 

Description

Job information is not cleaned out of fairshare queue if EXPIRED_HOURS is greater than or equal to CLEAN_PERIOD

 

Component

mbatchd

 

Platform

All

 

Impact

Incorrect job information

 

103166

Date

2008-03-10

 

Description

Output file can be written to root directory violating permissions

 

Component

sbatchd

 

Platform

AIX

 

Impact

Security risk.

 

78745

Date

2008-03-10

 

Description

Job name with wildcard characters does not match all matching jobs if JOB_DEP_LAST_SUB is set.

 

Component

bparams mbatchd

 

Platform

All

 

Impact

Scripts that depend on the correct behavior do not work.

 

100261

Date

2008-03-10

 

Description

Code bug in mbatchd causing confusing debugging data

 

Component

mbatchd

 

Platform

All

 

Impact

Hard for support to do troubleshooting

 

102671

Date

2008-03-04

 

Description

bsub may reject job submission even with a valid resource requirement string

 

Component

bsub

 

Platform

All

 

Impact

Job fails to submit even with a valid resource requirement string

 


 

101470

Date

2008-03-03

 

Description

LSF HPC pam cannot start without setting LSF_LIBDIR.

 

Component

pam

 

Platform

All

 

Impact

LSF HPC jobs fail

 

102499

Date

2008-02-29

 

Description

Non-admin user with no permission to a specific queue can bmod -q jobs to it if the jobs are forwarded from another cluster in MultiCluster environment

 

Component

mbatchd

 

Platform

All

 

Impact

Non-admin user can use bmod -q to modify the jobs forwarded from submission cluster to a queue to which the user has no permission

 

102218

Date

2008-02-29

 

Description

bjobs -l and bacct -l show jobs consuming more memory and/or swap space than the max value the given host has.

 

Component

res

 

Platform

windows

 

Impact

Host is not available, affecting scheduling of other jobs.

 

102690

Date

2008-02-28

 

Description

Submitting a JSDL job with non-TTY mode fails

 

Component

bsub

 

Platform

All

 

Impact

Job fails

 

96667

Date

2008-02-25

 

Description

Output of bpeek on other machine is slower than the output of bpeek on job execution host

 

Component

bpeek

 

Platform

All

 

Impact

Performance issue

 

101311

Date

2008-02-21

 

Description

No RMS distribution tar file available on Platform FTP site

 

Component

all

 

Platform

rms2.82-linux2.6-glibc2.3-ia64

 

Impact

Customers cannot upgrade to LSF7.0 EP2

 


 

99696

Date

2008-02-19

 

Description

PAM HP MPI integration does not work

 

Component

pam

 

Platform

HP-UX, Linux2.6-glibc2.3-x86_64

 

Impact

If host IP addresses contain 0, pam loses track of remote processes and job is shut down prematurely.

 

102059

Date

2008-02-02

 

Description

LIM failed to start with "Run time error R6034" on Windows XP-x64 and Windows 2003-x64

 

Component

lsf6.2_win.exe

 

Platform

win2003-x64 win2003-ia64

 

Impact

LIM cannot start on win2003-x64 and win2003-ia64

 

100456

Date

2008-02-01

 

Description

First character of output is missing for jobs submitted with bsub -Ip/Is option

 

Component

res

 

Platform

Linux 2.4

 

Impact

Cannot get exact output of the job.

 

94763

Date

2008-01-31

 

Description

Cannot use lsrcp on files larger than 2 GB across AIX and HP-UX machines

 

Component

lsrcp res

 

Platform

AIX 5-32, AIX 5-64, HP/UX 11-32, HP/UX 11-64

 

Impact

lsrcp does not work

 

99364

Date

2008-01-28

 

Description

lstcsh missing in x86_64 or ia64 packages

 

Component

lstcsh

 

Platform

linux2.6-glibc2.3-ia64 linux2.6-glibc2.3-x86_64

 

Impact

lstcsh not available

 

100443

Date

2008-01-28

 

Description

With short jobs, EGO-SLA cannot get enough slots even there are free slots

 

Component

mbatchd

 

Platform

All

 

Impact

SLA performance is not good.

 


 

99662

Date

2008-01-27

 

Description

Jobs submitted to HP-UX IA64 exit with code 9

 

Component

sbatchd

 

Platform

HP-UX 11.31 IA64

 

Impact

Jobs fail

 

96900

Date

2008-01-25

 

Description

bsub on floating clients do not retry during lim restart, and get a wrong error message

 

Component

lim

 

Platform

All

 

Impact

bsub on floating clients do not retry during lim restart

 

96047

Date

2008-01-25

 

Description

Customer gets a confusing email when job is killed because "bsub -t" time constraint is earlier than "submission time +RUNLIMIT"

 

Component

mbatchd

 

Platform

All

 

Impact

Inaccurate email message

 

93768

Date

2008-01-17

 

Description

Adapter windows are not cleaned due to signal sent to the clean up program by LSF and conflicts in strtok() library calls

 

Component

ntbl_api lsntbl_api poe_w poejob lsnbl_api

 

Platform

AIX5-32, AIX5-64

 

Impact

Jobs fail or are not dispatched because of lack of adapter windows

 

101146

Date

2008-01-16

 

Description

brsvs fails for system advance reservations on a host with MXJ=0

 

Component

mbatchd

 

Platform

All

 

Impact

brsvs does not work.

 

99700

Date

2008-01-15

 

Description

Job does not dispatch after job is requeued several times

 

Component

mbschd

 

Platform

All

 

Impact

Jobs are not dispatched

 


 

100544

Date

2008-01-09

 

Description

After a job is DONE in a fairshare queue and EXPIRED_HOURS is set, mbatchd goes down and does not restart.

 

Component

mbatchd

 

Platform

All

 

Impact

Batch system not available

 

100642

Date

2008-01-08

 

Description

Chunk jobs submitted using job arrays stay in WAIT status. When job ID is equal to mbatchd PID on the remote execution host, sbatchd loses track of the job.

 

Component

sbatchd

 

Platform

All

 

Impact

Chunk job feature not available

 

97704

Date

2008-01-02

 

Description

Run limit that bhist displays on execution host is changed

 

Component

bhist

 

Platform

All

 

Impact

Incorrect run time is displayed by bhist

 

99424

Date

2007-12-26

 

Description

LSF does not schedule job according to host preference when resource_reserve and host partition fairshare is defined but not used

 

Component

schmod_reserve.so schmod_parallel.so

 

Platform

All

 

Impact

Jobs may be scheduled incorrectly to low preference host

 

98150

Date

2007-12-13

 

Description

pem/vemkd memory usage jumps to GB

 

Component

pem, vemkd

 

Platform

All

 

Impact

Machines running out of memory

 

97588

Date

2007-11-20

 

Description

Wrong dependency syntax may cause MBD hang.

 

Component

mbatchd

 

Platform

All

 

Impact

mbatchd system is down

 


 

95120

Date

2007-11-08

 

Description

poejob should export MP_MSG_API

 

Component

poejob

 

Platform

AIX 5.3-32, AIX 5.3-64

 

Impact

Job runs over IP instead of InfiniBand

 

90702

Date

2007-10-19

 

Description

Job starter cannot register task start event to pam due to host name resolution issue

 

Component

taskstarter pam

 

Platform

All

 

Impact

When task registration fails, pam kills the job

 


Technical Support

support@platform.com

www.platform.com

 

North America: +1 905 948 4297

Europe: +44 1256 370 530

Asia: +86 10 6238 1125

Toll-free: 1-877-444-4573

 

Platform Support

Platform Computing Corporation

3760 14th Avenue

Markham, Ontario

Canada L3R 3T7

Copyright

© 1994 - 2008 Platform Computing Corporation

All Rights Reserved.

Although the information in this document has been carefully reviewed, Platform Computing Corporation  (“Platform”) does not warrant it to be free of errors or omissions. Platform reserves the right to make corrections, updates, revisions or changes to the information in this document.

UNLESS OTHERWISE EXPRESSLY STATED BY PLATFORM, THE PROGRAM DESCRIBED IN THIS DOCUMENT IS PROVIDED “AS IS” AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL PLATFORM COMPUTING BE LIABLE TO ANYONE FOR SPECIAL, COLLATERAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING WITHOUT LIMITATION ANY LOST PROFITS, DATA, OR SAVINGS, ARISING OUT OF THE USE OF OR INABILITY TO USE THIS PROGRAM.

 

Document redistribution policy : This document is protected by copyright and you may not redistribute or translate it into another language, in part or in whole. You may only redistribute this document internally within your organization (for example, on an intranet).

Trademarks

LSF is a registered trademark of Platform Computing Corporation in the United States and in other jurisdictions.

 

ACCELERATING INTELLIGENCE, THE BOTTOM LINE IN DISTRIBUTED COMPUTING, PLATFORM COMPUTING, CLUSTERWARE, PLATFORM ACTIVECLUSTER, IT INTELLIGENCE, SITEASSURE, PLATFORM SYMPHONY, PLATFORM JOBSCHEDULER, PLATFORM INTELLIGENCE, PLATFORM INFRASTRUCTURE INSIGHT, PLATFORM WORKLOAD INSIGHT, and the PLATFORM and LSF logos are trademarks of Platform Computing Corporation in the United States and in other jurisdictions.

 

UNIX is a registered trademark of The Open Group in the United States and in other jurisdictions.

 

Microsoft is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries.

Windows is a registered trademark of Microsoft Corporation in the United States and other countries.

 

Other products or services mentioned in this document are identified by the trademarks or service marks of their respective owners.