PQ79004: WITH SERVICE LEVEL W401505 AND HIGHER, SM EUI ACTIVATE MAY FAIL WITH EDC5136I DIRECTORY NOT EMPTY.

 A fix may be available

Obtain the fix for this APAR



APAR status
Closed as program error.

Error description
Processing in Systems Management code for WebSphere service
level W401505 and higher uses the readdir() function to copy and
cleanup parts of the WebSphere configuration hfs that are
affected during the SM EUI Activate step for application
deployment / redeployment.  The Activate step may fail with the
following messages seen in the Systems Management server region
(BBOSMSS) job output or the CTRACE entries for the SM SR:
.
Trace: 2003/08/28 04:07:18.345 01 t=9E1288 c=25.A28 key=P8
(08100019)
  Description: Error removing directory
  Method name: CbHFSManager::cleanupHFS(const char*)
  Filename: ./bbomhfsm.cpp
  Linenum: 621
  Path name: /WebSphere390/CB390/apps/<servername>/<appname>/...
  error: EDC5136I Directory not empty.
  errno: 136
  errno2: 1532166416
Trace: 2003/08/28 05:00:26.712 01 t=9E1288 c=25.1143 key=P8
(00000006)
  Description: Throw CORBA user exception
  exception id: Ism_J2EEApplication::UnableToCleanupHFS
  from file: ./bbomib80.cpp
  at line: 1086
Trace: 2003/08/28 05:00:26.826 01 t=9E1288 c=25.1143 key=P8
(08030050)
  Description: CompleteBuild
  File: ./bbomsbo2.cpp
  Failed to cleanup HFS: 3357
.
The WebSphere code that performs the readdir() function in a
loop to process a directory's worth of files, needs to use
rewinddir() to ensure all file names are listed by readdir() so
that they may all be removed before removing the directory
itself.  This apar is taken to address this issue of not all the
files in a directory getting cleaned up before the directory
itself is attempted to be deleted.
.
This problem may be related to shared hfs only or a
shared hfs where the the system doing the activate has the
hfs mounted as a "remote" system.  If the hfs is a non-shared
hfs, or a shared hfs where the hfs is mounted "local", the
problem this apar fixes may not occur and the complete list of
directory file names may be returned by readdir().
.
Local fix
We have seen where unmounting and remounting the WebSphere
configuration hfs clears up the problem.
.
Based on the WebSphere service level you are currently at, it
may be possible for us to provide 2 DLLs to work around the
problem.  Please contact WebSphere L2 for these DLLs and to
see if they will work on the service level you are on.  For
any customers using these DLL workaround, they MUST be
removed when the PTF containing this APAR fix is applied.
Problem summary
****************************************************************
* USERS AFFECTED: All users of WebSphere Application Server    *
*                 version 4.0.1 for z/OS and OS/390.           *
****************************************************************
* PROBLEM DESCRIPTION: Application deployment fails because    *
*                      not all files were successfully deleted *
*                      during conversation activation.         *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
During activation, deployed application files are deleted from a
temporary directory.  In the case where the number of files in
the directory is large (or the file names are long), not all the
files are deleted from the temporary directory, which causes the
activation to fail.

The user may see errors similar to the following -
BBON0488E Activate for conversation XX failed.
BBON1141I The following activation steps already succeeded:
BBON1142I Environment files for conversation XX are successfully
written.
BBON1143I Conversation XX is now the active conversation.
BBON1144I Homes successfully queued for registration.
BBON1151I Failed to remove file /WebSphere390/CB390/apps/XX/L...
Problem conclusion
In W401505, code was added to return more error information when
deleting files.  Previously, a "rm -rf" command was executed to
delete all the files from a directory.  This command returns
only a success or failure error, not why or what file failed the
command.  The "rm -rf" command was replaced with an "opendir()",
"readdir()" and "unlink()" loop, followed by a "closedir()". Any
file which was unable to be removed was logged with the more
specific failure code. This enabled easier problem determination
when an error occurs.

Unfortunately, the looping of readdir()s and unlink()s does not
completely work (i.e. - not all files are deleted) in a Shared
HFS environment when the commands are sent from the client HFS.
The reason for this is explained in detail in the following
paragraphs - the solution was to issue a "rewinddir()" command
after the loop and re-execute the loop to completely delete all
the files in the directory.  (This is done until all the files
are deleted)

This problem results when several situations exist.  The first
situation that must exist is the size of the directory file list
must be larger than the 1 KByte buffer allocated by an opendir()
command.  Since all the file names aren't in the buffer, when
the readdir() command tries to read beyond what was stored in
the buffer, another request for the directory file names gets
sent to the File System. If the File System is HFS, then the
next set of names are returned to the buffer and everything
works successfully - all the files are deleted.

However in the situation of a Shared HFS, when the request for
additional file names is sent to a remote system, the XPFS will
specify the file name index of the last name in the original
buffer and thus the next set of names will be from that index
forward.  The problem is that since files were deleted, the
index specified doesn't match the file name of the buffer end
and hence the returned file name list has skipped some names.
This results in files not being deleted by the readdir() loop.

In the C/C++ Reference Guide under "General Description" of
readdir() is the following recommendation: "If the contents of
a directory have changed since the directory was opened (files
added or removed); a call should be made to rewinddir() so that
subsequent readdir() requests can read the new contents."

Although inserting a rewinddir() command inside the readdir()
loop would solve the problem, for performance reasons, the
solution implemented was to do a rewinddir() after the
readdir() loop and then re-execute the loop to catch any files
missed.  This saved the "expense" of doing a rewinddir() for
every file in the directory.  This process is repeated until
the directory is emptied or a maximum iteration of 20 is
reached (for the situation where the directory can't be
deleted because of other reasons).  Since roughly half the
directory is emptied for every iteration, 20 iterations would
easily handle over a million file names in a directory.

APAR PQ79004 is associated with SERVICE LEVEL W401604 of
WebSphere Application Server version 4.0.1 for z/OS and OS/390.
Temporary fix Comments
APAR information
APAR number PQ79004
Reported component name WASKBASE
Reported component ID 5655A9801
Reported release 401
Status CLOSED PER
PE YesPE
HIPER NoHIPER
Submitted date 2003-09-29
Closed date 2003-10-22
Last modified date 2003-11-02

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros
BBOUBINF          

Fix information
Fixed component name WASKBASE
Fixed component ID 5655A9801

Applicable component levels
R401 PSY UQ81361    UP03/10/28 P F310

  Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.


Document Information


Product categories: Software > Application Servers > Distributed Application & Web Servers > WebSphere Application Server for z/OS
Operating system(s):
Software version: 401
Software edition:
Reference #: PQ79004
IBM Group: Software Group
Modified date: Nov 2, 2003