Do you have feedback or comments about the IBM Spectrum Scale APAR information page? Click here to fill out the survey.

IBM Spectrum Scale APARs Resolved in 5.1.6.x

This document describes the authorized program analysis reports (APARs) resolved in IBM Spectrum Scale 5.1.6.x releases.

This document was last updated on 03rd February 2023.

Tips:

To see the definition of the severity of an APAR in a tooltip, hover the cursor over the Severity column for that APAR.
To search for a specific APAR, enter its ID in the Search APAR box at the top of the APAR column.

Show details for all APARs

APAR

Severity

Description

Resolved in

Feature Tags

IJ44909

afmRecoveryVer2 cannot trigger policy scan on the remote contact node in NSD backend AFM fileset. Policy scan code is disabled to be run on NSD backend filesets. (show details)

Symptom	Unexpected Results
Environment	Linux
Trigger	Trying to run afmRecoveryVer2 on NSD backend AFM fileset.
Workaround	Do not set the afmRecoveryVer2 tunable on NSD backend afm fileset.

5.1.6.1

AFM

IJ44890

Signal 11 happens when any dependent fileset is attempted to create under an AFM HPT independent fileset. (show details)

Symptom	Abend/Crash
Environment	ALL
Trigger	Creation of dependent fileset under an AFM HPT independent fileset
Workaround	None

5.1.6.1

AFM (HPT)

IJ44887

mmfsd daemon assert going off: logAssertFailed: dataBlockNum != lastDataBlock || !newDA.isALLOC() || newDA.getNSubblocks() == inode.getLastBlockSubblocks(isWideDAFS), resulting mmfsd daemon process crash. (show details)

Symptom	Abend/Crash
Environment	ALL
Trigger	Parallel writes to the same file and one write is updating to the last data block.
Workaround	None

5.1.6.1

All Scale Users

IJ44867

Kernel lockup due to dentries being added to lookup cache before complete initialization (show details)

Symptom	Kernel deadlock
Environment	Linux
Trigger	In order for the race condition to occur, multiple threads must be looking up the same file which does not exist, although this scenario does not guarantee that it will be reproduced.
Workaround	Disabling deferred negative dentry invalidation with mmchconfig deferNegativeDcacheInvalidation=0 --force

5.1.6.1

All Scale Users

IJ44857

cp command fails to copy data from AFM uncached file on RHEL 9.1 because the command tries to get data using lseek (SEEK_DATA) which fails on AFM uncached files. (show details)

Symptom	Unexpected results
Environment	ALL
Trigger	Usage of cp command to copy the AFM uncached files on RHEL 9.1
Workaround	Use dd command or any other command which does not seek data section to copy the data.

5.1.6.1

AFM

IJ44856

Lookup operation found create as dependent and pushing create operations to be completed and it caused deadlock because lookup has already acquired mutex on the file and create tried to do stat on the same file. (show details)

Symptom	waiters
Environment	Linux
Trigger	waiters are seen and fileset is stuck to show progress
Workaround	None

5.1.6.1

AFM-COS

IJ44839

A node (kernel) crash can occur when the vinfoLockOnWrite config option is enabled. (show details)

Symptom	Crash
Environment	ALL
Trigger	Timing hole when enabling the undocumented config option vinfoLockOnWrite, likely triggered by using snapshots
Workaround	Avoided by not enabling the undocumented vinfoLockOnWrite config option

5.1.6.1

Core GPFS

IJ44838

The special .afmctl file at home/secondary loses its Control attribute and is treated as a normal file. This returns a buffer of expected 2048 size - overflowing the 1100 buffer given for this at cache - expecting a CTL file treatment at the home/secondary (show details)

Symptom	Crash
Environment	Linux (AFM Gateway nodes)
Trigger	Invalid .afmctl control file at home.
Workaround	Manually disable and re-enable mmafmconfig at the home/secondary and then stop/start the cache fileset to pickup the new changes from home.

5.1.6.1

AFM

IJ44837

After the mmrestorefs command (and mmafmctl commands that internally calls file system or fileset restore functionality) that is used to restore a file system or an independent fileset completes successfully, at times, a segmentation fault error from the tsapolicy process can be observed if the cipherList configuration variable is set to AUTHONLY or real cipher value. (show details)

Symptom	Error output/message After executing the mmrestorefs command, system error log (dmesg in Linux or errpt in AIX) may display messages like these: "[16342239.952383] tsapolicy[1627940]: segfault at a ip 00007f2fda7915f5 sp 00007ffe171250c8 error 4 in libc-2.28.so[7f2fda634000+1bc000]"
Environment	AIX, Linux
Trigger	Run mmrestorefs with multiple nodes when the cipherList configuration variable is set to AUTHONLY or real cipher value
Workaround	1) This problem can be ignored because it has no impact to the mmrestorefs functionality 2) Run mmrestorefs with -N

5.1.6.1

mmrestorefs

IJ44836

After GPFS 5.1.2 release, on some token manager node, the memory from token management subpool may be leaked.
This can be observed from output of mmfsadm dump malloc:
Statistics for MemoryPool id 3 ("UNPINNED_TM") at 0xF1000012C00246C8:
...
Memory subpool 'HolderList' at 0xF1000012C00258B0
objSize 16 spObjectsPerChunk 65536 expandInProgress 0
inUse 140052583 free 63385 total 140115968 limit 2147483647
the "inUse" filed is increased gradually.
(show details)

Symptom	Out-of-memory, Unexpected Results/Behavior
Environment	ALL
Trigger	During token management, one type of object is missed freed when the token is destroyed.
Workaround	None

5.1.6.1

All Scale Users

IJ44832

The mmwatch plugin to mmhealth can print or log excess error messages if there is a filesystem that is offline for some reason. (show details)

Symptom	Error output/message
Environment	Linux
Trigger	Running mmhealth when there is an unmountable filesystem defined.
Workaround	The mmwatch plugin to mmhealth can be disabled.

5.1.6.1

Admin Commands

IJ44678

Remote error 2 while replicating Link operation if parent directory is deleted before replicating create/link operation. (show details)

Symptom	AFM Queue drop and Fileset goes to resync state.
Environment	Linux
Trigger	Create/Link/Parent dir remove operation in queue with Fast Create config option enabled.
Workaround	None

5.1.6.1

AFM

IJ44629

Due to a race condition between the RDMA software layer and IBM Spectrum Scale, it is possible that an application running on an IBM Spectrum Scale client may read incorrect data from files stored on GPFS under certain conditions. (show details)

Symptom	Unexpected Results/Behavior
Environment	Linux
Trigger	Race condition between the RDMA software layer and IBM Spectrum Scale.
Workaround	Disable RDMA.

5.1.6.1

RDMA

IJ44611

In GPFS backend, cleanup took the handlerListLockExclusive on SGPanic and at the same time, handler is trying to setup (setupctl) the fileset mount path by using HandlerMutex and this is waiting for SG cleanup. (show details)

Symptom	Long waiters
Environment	Linux
Trigger	waiters will be seen and fileset is stuck to show progress.
Workaround	None

5.1.6.1

AFM with GPFS backed

IJ44607

GNR RPCs fail when received by a GPFS daemon 5.1.3 or later from a GPFS daemon older than version 5.1.3. Kernel assert going off: privVfsP != NULL (show details)

Symptom	Hang in the command
Environment	ALL
Trigger	Any GNR-related command.
Workaround	None

5.1.6.1

GNR

IJ44574

afmRecoveryVer2 code needs the latest 5.1.6 release to be present at both cache/primary and the home/secondary. We have code to check if the home/secondary supports afmRecoveryVer2 but it fails to have effect and results in an error 121 when recovery is run against that home/secondary. (show details)

Symptom	Unexpected Behavior
Environment	Linux Only (acting as AFM Gateway nodes)
Trigger	Running afmRecoveryVer2 from cache with ahome/secondary site which doesn't support recoveryVer2 yet.
Workaround	Make sure the home/Secondary is running an afmRecoveryVer2 compatible version too when enabled at the cache/primary.

5.1.6.1

AFM

IJ44492

GPFS daemon could fail unexpectedly with assert: regP->owner!=fromNode,in allocM.C. This could happen as a result of file system unmounted on a node due to error. (show details)

Symptom	Abend/Crash
Environment	ALL
Trigger	File system unmounted due to error
Workaround	Disable the assert via disableAssert configuration

5.1.6.1

All Scale Users

IJ44491

Enable/disable ptrash local bit setting code through afmRevalOpWaitTimeout configurable. (show details)

Symptom	Unexpected Behavior
Environment	Linux (serving as AFM Gateway nodes)
Trigger	afmRevalOpWaitTimeout being set to a non-default value causing ptrash local bit setting code to not take effect.
Workaround	Setting the afmRevalOpWaitTimeout to its default value of 180 will ensure ptrash is set to local

5.1.6.1

AFM

IJ44489

4U102 IOM failure currently not calling home (MAPS), but it should do it. (show details)

Symptom	Unexpected Results/Behavior
Environment	Linux
Trigger	N/A
Workaround	Customer must create a ticket manually.

5.1.6.1

ESS/GNR

IJ44459

Updates to CLUSTER_PERF_SENSOR_CANDIDATES were not taken into account and certain failure scenarios did not trigger a CLUSTER_PERF_SENSOR failover. (show details)

Symptom	Component Level Outage
Environment	ALL
Trigger	Using CLUSTER_PERF_SENSOR_CANDIDATES nodeclass to control which nodes are considered to get that role and then updating that nodeclass. Failover scenario like mmshutdown on the current CLUSTER_PERF_SENSOR node.
Workaround	A "mmsysmoncotrol restart" does also force updates and pending failover actions

5.1.6.1

perfmon (Zimon)

IJ44441

Signal 11 happens on mmfsd process if there's one file system with original version <= 11.01 and it's upgraded to the latest one, resulting in mmfsd daemon crash and the file system becomes inaccessible. (show details)

Symptom	Abend/Crash
Environment	ALL
Trigger	Upgrade file system with original version <= 11.01 to the latest version.
Workaround	None

5.1.6.1

All Scale Users

IJ44440

We found the NVMe and SSD disks were put into one DA when RG creation. The disk size is almost the same, and both have spin = 0, then GNR thinks it's the same type of disk. (show details)

Symptom	Unexpected Results/Behavior
Environment	Linux
Trigger	NVMe and SSD disks have same size.
Workaround	None

5.1.6.1

ESS/GNR

IJ44322

When there are more than 64 IP addresses on the node, assert goes off when daemon starts up. (show details)

Symptom	Abend/Crash
Environment	ALL
Trigger	Having more than 64 IP addresses on the node
Workaround	Remove some IP addresses from the node

5.1.6.1

All Scale Users

IJ44067

System monitoring collects all information about a cluster by sending it to relevant nodes. It ignores cluster boundaries while doing so which does not work and creates spurious error messages in the logs. (show details)

Symptom	Unexpected Results/Behavior, Erroneous Log entries
Environment	Linux
Trigger	Setup with remote cluster integration
Workaround	Ensure the nodes in the home cluster are at a level >= code level of the remote clusters

5.1.6.1

System Health (mmfs.log.latest)

IJ43790

Commands like mmcrcluster or mmaddnode may hang in GSKIT layer on AMD EPYC family 25 processors. A particular model from family 25 that is known to hang in GSKIT layer is AMD EPYC 7343. (show details)

Symptom	Admin commands hangs
Environment	Linux
Trigger	This problem affects AMD EPYC family 25 processors.
Workaround	Add "ICC_SHIFT=3" line in /usr/lpp/mmfs/lib/gsk8/Cicc/icclib/ICCSIG.txt file on problem nodes.

5.1.6.1

Admin Commands, gskit

IJ44219

Files not replicated on create after failoverToSecondary. (show details)

Symptom	Unexpected Results/Behavior
Environment	Linux
Trigger	After failovertosecondary, if you create and write files and then changesecondary to sync with old primary.
Workaround	None

5.1.6.0

AFM-DRPFS