IBM Spectrum Scale APARs Resolved in 5.0.5.x

IJ38923

While updating the symlink target path on an AFM enabled fileset, the inode is not copied to the previous snapshot causing the assert. (show details)

Symptom	Crash
Environment	Linux
Trigger	AFM caching with symlinks and snapshots.
Workaround	None

5.0.5.14

AFM

IJ38924

Issuing io_uring IORING_OP_READ_FIXED requests to read data into preallocated buffers fails with an error. (show details)

Symptom	IO error
Environment	Linux
Trigger	No pre-conditions are necessary.
Workaround	When using io_uring, use IORING_OP_READ instead of IORING_OP_READ_FIXED. This would require changing the application issuing the requests and might come at a performance penalty.

5.0.5.14

Core GPFS

IJ38925

Today there is no command to bring an AFM Inactive fileset to active. (show details)

Symptom	AFM fileset moving to Inactive/dropped states.
Environment	All
Trigger	Fileset moving to Inactive state and needing recovery for any reason.
Workaround	Wait for an IO operation on the fileset or touch a file inside the fileset to simulate an incoming IO and trigger recovery on the fileset in question.

5.0.5.14

AFM

IJ38926

When the handler for AFM replication is created on the gateway node, the handler create time, the last replay time, and the last sync time are all initialized to now time. If for some reason the handler couldn't be mounted and replicate to home, this leads to AFM printing the last replay time as the same time as handler create time and gives a misconception that replication has actually happened. (show details)

Symptom	Error output
Environment	Linux
Trigger	Checking AFM replication handler for last replay and sync time, when there's a recovery pending and not happening on the fileset.
Workaround	None

5.0.5.14

AFM

IJ38927

When running IO through KNFS and file audit logging enabled, an invalid pointer might be accessed. (show details)

Symptom	Abend/Crash
Environment	Linux
Trigger	Certain patterns of KNFS IO with file audit logging enabled
Workaround	None

5.0.5.14

File audit logging

IJ38928

32bit GPFS API library not available in default path on Ubuntu. (show details)

Symptom	Error output/message
Environment	Linux (x86_64)
Trigger	Build an application with 32bit GPFS API library on Ubuntu.
Workaround	Modify the build process of the application to search for the 32bit GPFS API library in a different directory.

5.0.5.14

GPFS API

IJ38949

The SUID and SGID bits are not cleared after a successful write/truncate to a file by a non-owner. (show details)

Symptom	Unexpected Results/Behavior
Environment	Linux
Trigger	Create a file with the SUID and SGID bits set. As a non-root user or a non-group member user, write to the file with the write() system call or truncate the file with the truncate() system call.
Workaround	Ensure that only owners can write to an executable binary file that has the SUID/SGID bit set.

5.0.5.14

Core GPFS

IJ36560

If a workload involves opening and creating lots of files concurrently under the same directory, some of the opens may suffer high open times. (show details)

Symptom	Performance Impact/Degradation.
Environment	Windows (x86_64)
Trigger	Workload that creates and opens many file concurrently in the same directory path.
Workaround	None

5.0.5.14

Core GPFS

IJ39127

An error 22 is hit when trying to get the valid data blocks on a file in resync. (show details)

Symptom	Unexpected Behavior
Environment	Linux (AFM gateway nodes)
Trigger	Running Resync with Uncached (possibly evicted) files at the SW cache site.
Workaround	None

5.0.5.14

AFM

IJ38848

NFS mount point is not getting killed if home fileset is unresponsive or hung. This is causing multiple NFS mount to be created for the same fileset. (show details)

Symptom	Too much memory consumption on the NFS mount point.
Environment	Linux
Trigger	Gateway node is getting more memory consumption on the NFS mount due to existing multiple mount points of the fileset.
Workaround	None

5.0.5.14

AFM DR

IJ39144

DMAPI read event is generated on AFM deferred deletion files causing unnecessary recalls if there exists only AFM recovery snapshot. (show details)

Symptom	Unexpected results
Environment	Linux
Trigger	Deletion of migrated files on AFM fileset.
Workaround	None

5.0.5.14

AFM

IJ39384

AFM fails to upload the object if the name starts with a '-' character. (show details)

Symptom	Deadlock
Environment	Linux
Trigger	AFM+COS caching with special file names.
Workaround	None

5.0.5.14

AFM

IJ39388

If the system pool is also used for data, auto recovery mis-calculates avaiable metadata fg count and might trigger tsrestripefs -r wrongly. (show details)

Symptom	Unexpected Results/Behavior
Environment	Linux
Trigger	If the system pool is used for both data and metadata in a FPO cluster and if a disk/node failure causes the good failure group count to become less than the default metadata replication.
Workaround	Do not use system pool for data.

5.0.5.14

FPO

IJ39411

The IBM Spectrum Scale admin commands and handling of file system encryption keys require the use of more robust settings. (show details)

Symptom	None
Environment	All
Trigger	None
Workaround	None

5.0.5.14

Admin commands

IJ37872

Missing sqlite-3 packages on IBM Spectrum Scale Erasure Code Edition environments can cause admin command hangs. (show details)

Symptom	Hang
Environment	All
Trigger	Problem occurs in an IBM Spectrum Scale Erasure Code Edition environment when the sqlite-3 package is installed on some nodes but not on others.
Workaround	None

5.0.5.13

Admin commands

IJ37873

SGNotQuiesced assertion in dbshLockInode during file system quiesce (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Operations which do file system quiesce.
Workaround	None

5.0.5.13

Snapshots

IJ37875

An error message "Could not retrieve minReleaseVersion" is logged in the systemhealth monitor log file (mmsysmonitor.log). (show details)

Symptom	Error output/message
Environment	All (with performance monitoring installed)
Trigger	The error message is logged whenever an mmperfmon query is executed.
Workaround	None; The error message can be ignored.

5.0.5.13

System health

IJ36358

mmap reads from lots of threads might cause a deadlock in DeclareResourceUsage. (show details)

Symptom	Hang/Deadlock/Unresponsiveness/Long Waiters
Environment	All
Trigger	mmap reads from lots of threads.
Workaround	Disable mmap pagepoolresource usage declaration with the "mmchconfig mmapDeclarePageUsage=false" command.

5.0.5.13

Core GPFS

IJ37909

When there are multiple threads trying to flush the same file and the file is large with many blocks, there could be mutex contention which can lead to performance degradation. (show details)

Symptom	Performance Impact/Degradation
Environment	All
Trigger	Multiple threads trying to flush the same large file.
Workaround	Reduce the number of worker threads.

5.0.5.13

Core GPFS

IJ37910

SGNotQuiesced assertion in dbshLockInode during file system quiesce. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Operations which do file system quiesce.
Workaround	None

5.0.5.13

Snapshots

IJ37978

Inodes are not reclaimed after the hardlinks are corrected during the AFM prefetch. This causes more inodes to be in-use than actual number of files present in the fileset. (show details)

Symptom	Unexpected results
Environment	Linux
Trigger	AFM prefetch
Workaround	None

5.0.5.13

AFM

IJ37107

AFM fileset resync failed with EINVAL error (22) (show details)

Symptom	I/O error
Environment	Linux
Trigger	AFM fileset resync operation (mmafmctl command with resync subcommand)
Workaround	None

5.0.5.13

AFM

IJ37979

When SGPanic occurs, the dealloc queue subblocks count could be wrong and cause "(deallocHighSeqNum - deallocFlushedSeqNum) >= deallocQueueSubblocks" assertion failure. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	In rare cases, the block deallocation around SGPanic time might cause this assertion.
Workaround	None

5.0.5.13

Core GPFS

IJ38041

When a fileset is in chmodAndUpdateAcl permission change mode, creating a file with the open() system call under a parent directory with inherit entries and changing permission's of the newly created file via NFS results in duplicated and incorrect entries in the file's NFSv4 ACL. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Have a fileset in chmodAndUpdateAcl permission change mode and a parent directory with inherit entries. Using NFS, create a file with the open() system call and change the permissions of the file with chmod.
Workaround	Use chmodAndSetAcl permission change mode for filesets and avoid having inherit entries in the parent directory.

5.0.5.13

NFS

IJ38052

Due to a change in procps output in Cygwin version 3.3, IBM Spectrum Scale fails to start. (show details)

Symptom	Unexpected Results/Behavior
Environment	Windows (x86_64)
Trigger	IBM Spectrum Scale startup
Workaround	Downgrade Cygwin.

5.0.5.13

Admin commands

IJ38077

mmvdisk recovery group conversion might conflict with settings for nsdRAIDSmallBufferSize from the previous deployment scripts. mmvdisk will apply a value of -1 to this setting, which conflicts with the original value of 256KiB. The result is that the Daemon will print a warning message on start up, warning the user that nsdRAIDSmallBufferSize has been reduced to a value of 4KiB. This might impact performance. (show details)

Symptom	Error output/message, Performance Impact/Degradation
Environment	Linux
Trigger	mmvdisk recovery group conversion from the pre-2020 server config settings.
Workaround	Delete the old nsdRAIDSmallBufferSize setting of 256K in SDRFS, or delete any -1 values that were part of the mmvdisk rg conversion override.

5.0.5.13

ESS, GNR

IJ38081

AFM Prefetch with --dir-list-file option where the list contains encoded directory names is not being processed and queued. (show details)

Symptom	Unexpected Behavior
Environment	Linux (AFM gateway nodes)
Trigger	Running prefetch (with or without --metadata-only option) using a list of encoded directory names - like the one generated from checkUncached (during mchfileset command run).
Workaround	Decode the directory list by hand and feed it to prefetch.

5.0.5.13

AFM

IJ38148

Given a parent directory with the SGID bit set, a file created with the SGID bit specified by a user who does not belong to the same group as the directory can still have the SGID bit set. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Create a file with the SGID bit specified as a non-member group user in a directory with the SGID bit set.
Workaround	Remove the SGID bit from the directory.

5.0.5.13

Core GPFS

IJ36547

A newly mounting node either due to user mount or an expelled node rejoining the cluster can fail assert 'llfP->lockRangeNode != NodeAddr(-1U, 0, NodeAddr::naNormal)' if it happens in the middle of a, mmrestripefs, mmaddisk, mmdeldisk, or mmfsck operation. (show details)

Symptom	Node expel/Lost Membership
Environment	All
Trigger	Mounted node failure in the middle of an mmrestripefs operation
Workaround	None

5.0.5.12

Core GPFS

IJ34995

mmlsquota is reporting wrong results with:

1. extra output lines with "no limits" for users or groups that don't have usage on the fileset

2. extra output lines, all showing "no limits" when no limits (quotas) are set for a user or group in the fileset

(show details)

Symptom	The mmlsquota output for a user is showing "no limits" when there is quota usage but no limits set.
Environment	All
Trigger	Existing code error in mmlsquota when perfileset-quota is enabled.
Workaround	None

5.0.5.12

Quotas

IJ36553

When the last block of a file is not a full GPFS block, replica compare function could report false replica mismatch. (show details)

Symptom	Error output/message
Environment	All
Trigger	Running replica compare with mmrestripefs or mmrestripefile
Workaround	None

5.0.5.12

Core GPFS

IJ36556

If there are node failures during burst of file create or delete activity, then it is possible for the cached free inode counters on the file system manager to become out of date. (show details)

Symptom	Error output/message
Environment	All
Trigger	Node failures in the middle of large number of file creates or deletes.
Workaround	Run 'mmfsadm test imapWork <fs> inodeManager' or 'mmchmgr <fs> <another node>'

5.0.5.12

Core GPFS

IJ36855

The IBM Spectrum Scale HDFS Transparency connector version 3.1.0-9, 3.1.1.7 and 3.3.0-0 contain Apache Log4j libraries that are affected by the security vulnerabilities CVE-2019-17571 and CVE-2021-4104. (show details)

Symptom	NA
Environment	All
Trigger	The IBM Spectrum Scale HDFS Transparency connector is not vulnerable in default configurations.
Workaround	Manually patch the affected log4j libraries.

5.0.5.12

HDFS Connector

IJ36338

"More than 22 minutes searching for a free buffer in the pagepool" assertion failure. (show details)

Symptom	Abend/Long Waiters
Environment	All
Trigger	This problem is more likely to occur in a cluster which has file systems with both large block size and small block size (compared to scatter buffer size).
Workaround	Change 'scatterBufferSize' config to a smaller size.

5.0.5.12

Core GPFS

IJ36861

GPFS daemon could assert while running mmadddisk. This can only happen if a new storage pool is being created as a result of running mmadddisk and a storage pool had been deleted in the past by using mmdeldisk. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Creating a new storage pool by using mmadddisk command.
Workaround	Increase the number of disks being added by using mmadddisk command or avoid creating a new storage pool.

5.0.5.12

Core GPFS

IJ36862

When split reads are spawned out to helper gateway nodes and there is a remote error at the home site causing the fileset to be evasively put to unmounted state - there is a window where putting the fileset to unmounted can conflict with an ongoing split read message. (show details)

Symptom	Deadlock
Environment	Linux
Trigger	AFM fileset having to move to unmounted state when there are split reads in progress on the helper gateway node.
Workaround	None

5.0.5.12

AFM

IJ34184

Daemon assert going off when generating DMAPI event: addr.isReserved() || addr.getClusterIdx() == clusterIdx in file cfgmgr.h, resulting in a daemon crash. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	DMAPI is enabled and a remote cluster is used while a DMAPI event is being generated after a remote client node left the cluster.
Workaround	None

5.0.5.12

DMAPI

IJ36967

AFM gateway node crashes during the fileset recovery because invalid file handle are used to get inodes in the kernel. (show details)

Symptom	Crash
Environment	Linux
Trigger	AFM fileset recovery
Workaround	None

5.0.5.12

AFM

IJ36969

Certain characters such as newline (\n) or backslash (\), etc. were not escaped correctly resulting in invalid JSON. JSON parsers are not be able to read the event correctly. (show details)

Symptom	Unexpected Results/Behavior
Environment	Linux
Trigger	Filenames, ACLs, or xattrs with escape characters
Workaround	You can programmatically escape existing events to create valid JSON before the parser tries to ingest the event.

5.0.5.12

File audit logging, Watch folder

IJ36970

While trying to set extended attributes, SetXAttrHandlerThread could deadlock with itself trying to obtain a WW lock on the buffer while holding XW lock. (show details)

Symptom	Hang/Deadlock/Unresponsiveness/Long Waiters
Environment	All
Trigger	Changing extended attributes on a file or directory
Workaround	None

5.0.5.12

Core GPFS

IJ36558

When running file audit logging, signal 11 is possible at FileMetadata::set_mtimeUpdate(unsigned int) (show details)

Symptom	Signal 11
Environment	Linux
Trigger	Daemon crash
Workaround	None

5.0.5.12

File audit logging

IJ36557

If the number of quorum nodes in the cluster is not greater than the minQuorumNodes configure setting, the mmchconfig command fails without a clear message. (show details)

Symptom	Error message
Environment	All
Trigger	Problem arises when minQuorumNodes configuration value is greater than or equal to the number of quorum nodes in the cluster.
Workaround	If setting tiebreakerDisks parameter fails because the number of quorum nodes in the cluster is not great than minQuorumNodes, use the mmchconfig command to set minQuorumNodes to the default value or a value lower than the number of quorum nodes in the cluster.

5.0.5.12

Admin commands

IJ36536

Running workloads with many lookups done to GPFS in a highly concurrent way have a performance impact. (show details)

Symptom	Performance Impact/Degradation
Environment	Linux
Trigger	Lots of lookups to the file system. One known case is setting LD_LIBRARY_PATH to directories on a GPFS file system on zLinux. The zLinux dynamic linker issues a much higher number of lookups for each entry in LD_LIBRARY_PATH, making this scenario more likely to occur.
Workaround	Reduce the number of concurrent lookups.

5.0.5.12

Core GPFS

IJ36842

GPFS daemon could assert while mounting the file system on a client node with code level prior to V5.1.1.0. This can only happen if a new storage pool is being created by mmadddisk and the storage pool had been deleted in the past by using mmdeldisk. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Creating a new storage pool via mmadddisk command. Then mount the fs from a client node with code level prior to V5.1.1.0.
Workaround	Increase the number of disks being added by using the mmadddisk command or avoid creating a new storage pool.

5.0.5.12

Core GPFS

IJ37027

NULL pointer dereference in kxGanesha (show details)

Symptom	Abend/Crash
Environment	Linux
Trigger	It can happen on the Ganesha NFS server or the kernel NFS server over a GPFS file system.
Workaround	None

5.0.5.12

NFS

IJ34842

If the NFSv4 client holds a file lock for read/write operations, then client may report I/O error after CES-IP failover. (show details)

Symptom	I/O error
Environment	All
Trigger	If NFSv4 client holds a file lock for write operation, then CES-IP failover from current active NFS server(lets say protocol node1) to other server (protocol node2) may cause I/O failure on client.
Workaround	None

5.0.5.12

NFS

IJ34927

logAssertFailed: exclLockWord == 0 (show details)

Symptom	assert
Environment	POWER
Trigger	NA
Workaround	Disable the assert.

5.0.5.11

Core GPFS

IJ35248

One thread is trying to initialize the AFM relationship with the remote site and in the meanwhile another thread has initiated an unlink on the same fileset. The relationship initialization thread doesn't give up causing unlink thread to wait forever, causing the deadlock. (show details)

Symptom	Hang/Deadlock/Unresponsiveness/Long Waiters
Environment	Linux (AFM gateway nodes)
Trigger	Unlinking an AFM fileset while the AFM relationship for this fileset is being initialized.
Workaround	Make sure the fileset is in a Dirty/Active state when unlinking an AFM Fileset.

5.0.5.11

AFM

IJ35257

Eviction tries to create a pruned list file in the current working directory instead of putting it into internal IBM Spectrum Scale directories on /var. If the PWD is an RO directory, then the eviction command fails with a "Could not open file" error. (show details)

Symptom	Error output
Environment	AIX, Linux
Trigger	Run mmafmctl eviction subcommand with list-file option from inside a Read-Only current working directory.
Workaround	Keep the current working directory as some RW permission directory so that eviction can create the temporary eviction pruned list file in the PWD and proceed.

5.0.5.11

AFM

IJ35258

Reading an evicted file in a snapshot at the AFM cache site should return an EIO and fail always. (show details)

Symptom	Unexpected Behavior
Environment	AIX, Linux
Trigger	Evict a file from the fileset, Create a local snapshot, and then try to read the evicted file from the snapshot.
Workaround	Set the afmSnapUncachedRead config to yes at the cluster level and then read the file in the snapshot to get the EIO error.

5.0.5.11

AFM

IJ35259

mmrepquota command failure with message size too big due to empty quota entries. (show details)

Symptom	mmrepquota fails with message 'size too big'
Environment	All
Trigger	The problem is triggered when there are too many un-used quota entries accumulated in the quota file.
Workaround	None

5.0.5.11

Quotas

IJ35285

Running mmfsck command with --status-report will fail with error "Option '-N' is incorrect.", if the cluster has the config parameter defaultHelperNodes set on the cluster. (show details)

Symptom	Error message
Environment	All
Trigger	Running mmfsck command with --status-report on cluster that has the config parameter defaultHelperNodes set on the cluster.
Workaround	1) Set mmchconfig defaultHelperNodes=DEFAULT and re-execute the command. 2) Run "mmfsadm dump fsck" on all nodes running mmfsck.

5.0.5.11

FSCK

IJ35440

Drives on an ESS 3000 may not show up after a boot or reboot of a canister. You can detect these errors using lspci -s 0x87 | grep DpcSta | grep Trigger+ or lspci -s 0x3c | grep DpcSta | grep Trigger+ (show details)

Symptom	Component Level Outage
Environment	Linux (x86_64)
Trigger	ESS 3000 boot or reboot canister (very rare)
Workaround	You can use the setpci utility to manually clear the DPC error flag of the 0x87 and 0x3c busses. This will force the devices to attempt to train. If the drive still does not train, there is some other issue.

5.0.5.11

ESS, GNR

IJ35441

When offline fsck is run in repair mode (-y) on a file system having inode size less than 4K the following assert is possible:

"logAssertFailed: (dacIb[i] == (*DiskAddr::invalidDiskAddrP)) || (dacIb[i] == (*DiskAddr::dittoDiskAddrP)) || (dacIb[i] == (*DiskAddr::cdittoDiskAddrP)) || (dacIb[i] == (*DiskAddr::invalidZDiskAddrP)) || (dacIb[i] == (*DiskAddr::brokenZDiskAddrP)) || (dacIb[i] == (*DiskAddr::brokenDiskAddrP))"

(show details)

Symptom	Overwrite disk addresses/Node assert
Environment	All
Trigger	This issue will hit if offline fsck -y is run on a file system having inode size less than 4K (4096 bytes).
Workaround	Disable the patch queue feature by running the following command before running offline fsck -y: # mmdsh -N all mmfsadm test fsck usePatchQueue 0

5.0.5.11

FSCK

IJ33526

A mechanism is needed that, when mmfsd takes on a fatal signal (sigsegv, etc), the signal handler can either complete in finite time, or the daemon needs to be forcibly terminated. Lack of such a mechanism can result in a deadlock. (show details)

Symptom	Deadlock
Environment	Linux
Trigger	GPFS calls non-signal-safe function in signal handler
Workaround	None

5.0.5.11

Core GPFS

IJ35535

Race condition during daemon shutdown could lead to kernel crash if DIO workload is running. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	GPFS daemon shutdown while DIO workload is still running
Workaround	Unmount all file systems before shutting down the GPFS daemon.

5.0.5.11

Core GPFS

IJ35687

If a sufficiently large number of inodes are deleted on a node, then it is possible for the background deletion process to miss processing some of the inodes. (show details)

Symptom	Unexpected Behavior
Environment	All
Trigger	Large number of file deletions on a node
Workaround	Run 'mmfsadm test imapWork <fs> fullDeletion' or 'mmchmgr <fs> <another node>'

5.0.5.11

Core GPFS

IJ34384

Online fsck can report lost blocks that are false positives. Repairing this can allow the block to be used by multiple files at the same time causing corruption. (show details)

Symptom	Operation failure due to FS corruption
Environment	All
Trigger	Multi-pass fsck scan during file operations that update inode metadata.
Workaround	Run offline fsck.

5.0.5.11

Core GPFS

IJ35688

Copying an inode block which contains a bad deleted inode could trigger SIGFPE signal, then crash the mmfsd daemon. (show details)

Symptom	daemon crash
Environment	All
Trigger	This issue only happens on a bad deleted inode when snapshot is being used.
Workaround	None

5.0.5.11

Snapshots

IJ35689

ACL changed when running AFM failover in SW. (show details)

Symptom	Fileset root ACL gets changed
Environment	Linux
Trigger	Some fileset is being recreated or the fileset root metadata is changed while running failover.
Workaround	None

5.0.5.11

AFM

IJ35919

GPFS API calls from 32bit application fail on SLES15 SP3. (show details)

Symptom	Error output/message
Environment	Linux (x86_64 and s390x)
Trigger	Running on SLES15 SP3 and an application trying to issue 32bit GPFS API calls.
Workaround	1. Apply the fix manually by editing the file /usr/lpp/mmfs/src/gpl-linux/ss.c to remove the checks for HAVE_COMPAT_IOCTL 2. Run mmbuildgpl again. 3. Restart GPFS.

5.0.5.11

GPFS API

IJ35921

IBM Spectrum Scale ships several ilm samples. One of them is a mmfind tool and to use the tool, findUtil_processOutputFile.c needs to be compiled. But the compilation of findUtil_processOutputFile.c fails on some Linux distros. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Compiling mmfindUtil_processOutputFile.c
Workaround	Modify mmfindUtil_processOutputFile.c before compiling it.

5.0.5.11

Admin commands

IJ34331

When the mmchmgr command is used to assign a new file system manager, it could fail with "No log available" message after the current file system panics with "No log available" error. This can happen if file system is not externally mounted on any node. (show details)

Symptom	Error output/message
Environment	All
Trigger	Using the mmchmgr command to assign a new file system manager
Workaround	Mount the file system before issuing the mmchmgr command.

5.0.5.10

Core GPFS

IJ34346

If FIPS is enabled, call home uploads fail; manual call home uploads crash with an error, mentioning FIPS. (show details)

Symptom	Component Level Outage
Environment	Linux
Trigger	Enabling FIPS
Workaround	Disable FIPS.

5.0.5.10

Call home

IJ34351

On a cluster with two quorum nodes and tiebreaker disks, an unexpected quorum loss can be seen on the challenger node when the current cluster manager sees a mmshutdown (or node reboot). (show details)

Symptom	File System Outage (unexpected GPFS file system unmount for about 30 seconds)
Environment	All
Trigger	GPFS shutdown (mmshutdown) or node reboot of current cluster manager
Workaround	Move the cluster manager role by hand using the 'mmchmgr -c ' command.

5.0.5.10

Cluster Manager

IJ34354

When an application reads with an IO size that is a multiple of the GPFS block, prefetching doesn't start until the application issue a second read request unless the read starts at the beginning of the file or prefetchAggressiveness is set to prefetchOnFirstAccess. This can cause slow read performance when read IO size is very large. (show details)

Symptom	Performance Impact/Degradation
Environment	All
Trigger	Application issue read with IO size that is much larger than the GPFS block size.
Workaround	Set prefetchAggressiveness configuration to prefetchOnFirstAccess or reduce the read IO size to the GPFS block size.

5.0.5.10

Core GPFS

IJ34355

mmkeyserv client register, deregister or rkm change command will fail if the new RKM.conf contains expired certificates. (show details)

Symptom	Error output/message Unexpected Results/Behavior
Environment	Windows (x86_64), Linux
Trigger	This occurs when there is a client that is registered to multiple tenants and the certificate has expired or when there are multiple clients that are registered to at least one tenant and the their certificates have expired.
Workaround	Use the mmkeyserv client update to update the client certificates. Otherwise, shut down GPFS and retry the command.

5.0.5.10

Admin commands, Encryption

IJ34356

When a file system has a high number of block allocation regions, the processing of the allocation manager RPC could be slower than expected. (show details)

Symptom	Hang/Deadlock/Unresponsiveness/Long Waiters
Environment	All
Trigger	Running mmdf
Workaround	Avoid running mmdf.

5.0.5.10

Core GPFS

IJ34357

With thousands of client nodes mounted in the file system, adding some more disks serviced by ESS 3000 nodes can cause long waiters trying to get NSD disk information on each client node. (show details)

Symptom	Stuck mmadddisk command
Environment	ESS 3000, ECE
Trigger	Create new NSD disks from an ESS 3000 or an ECE cluster and add them to a file system before starting GPFS service.
Workaround	Restart the GPFS service on the ESS 3000 or ECE nodes, or fail over the RG master from one node to the other one.

5.0.5.10

Admin commands

IJ34381

The timestamps displayed by "mmdiag --iohist" on Windows nodes may show incorrect values, especially for the decimal part of the seconds. This may also cause misreporting of the duration of the affected I/O operations. (show details)

Symptom	Unexpected Results/Behavior
Environment	Windows (x86_64)
Trigger	Running "mmdiag --iohist" on Windows nodes
Workaround	None

5.0.5.10

Admin commands

IJ34391

Running online fsck in repair mode (-o -y) can cause it to detect and repair false positive lost blocks (i.e. blocks that are assigned to files) and mark it as free, and doing this can lead to duplicate block corruptions. (show details)

Symptom	Data corruption due to duplicate blocks
Environment	All
Trigger	Running online fsck in repair mode (-o -y)
Workaround	Use offline fsck to fix corruptions.

5.0.5.10

Online FSCK

IJ34783

Kernel assert: Signal 11 at SharedHashTab::htInit (show details)

Symptom	Abend/Crash
Environment	All
Trigger	A special combination of maxStatCache and maxFilesToCache configuration
Workaround	Change the maxStatCache/maxFilesToCache configurations.

5.0.5.10

Core GPFS

IJ34784

File data not synced after a recovery (show details)

Symptom	Recovery silently completed without syncing the data with the remote.
Environment	Linux
Trigger	When tspcachescan gets an error code of more than 255.
Workaround	None

5.0.5.10

AFM

IJ34785

File data not synced after a recovery (show details)

Symptom	Assert is logged.
Environment	Linux
Trigger	When write operation is happening in the NSD backend and checking the remotefs quiesce or not.
Workaround	None

5.0.5.10

AFM

IJ34805

Assert: SGNotQuiesced sgmrpc.C (show details)

Symptom	Scale mmfsd daemon process crash
Environment	All
Trigger	Snapshot create or delete operations
Workaround	None

5.0.5.10

Core GPFS

IJ34609

Deadlock while queueing PIO read if there is one active gateway node and home becomes unresponsive. (show details)

Symptom	Deadlock
Environment	Linux (AFM gateway nodes)
Trigger	PIO read with unresponsive home.
Workaround	Restart GPFS at the gateway node.

5.0.5.10

AFM

IJ34822

When multiple nodes are creating files in the same directory, creates can slow down during recovery. (show details)

Symptom	Long Waiters
Environment	All
Trigger	File system crash
Workaround	None

5.0.5.10

Core GPFS

IJ34813

Hard lockup between 2 pemsmod kernel threads can panic the kernel. A kernel panic will mean system down time and maybe quorum loss for the customer. Stack trace at vmcore-dmesg.txt will have something like this:

[88432.803601] CPU: 27 PID: 14563 Comm: pemsRollUpQueue Kdump: loaded Tainted: G

(show details)

Symptom	Kernel crash
Environment	Linux (x86_64)
Trigger	System running heavy I/O workload can hit this issue.
Workaround	None

5.0.5.10

ESS, GNR

IJ34943

AFM gateway node crashes if the home is not responding while mounting the fileset target path. (show details)

Symptom	Crash
Environment	Linux
Trigger	AFM caching with unresponsive home.
Workaround	None

5.0.5.10

AFM

IJ33370

If the disks for a file system are not ready to be used yet and the command "mmfsadm dump deferreddeletions" is run at the same time, the command will fail with the side effect of causing a long waiter 'waiting for SG cleanup' when the file system is deleted and recreated. (show details)

Symptom	Long Waiters
Environment	All
Trigger	None
Workaround	None

5.0.5.9

Core GPFS

IJ33371

GPFS allows the length of NSD names to be up to 255 characters and there are no rules that say it must contain an alpha. If there are NSD names with all digits and long enough, this can be a problem. With long digit names, two NSDs can incorrectly be identified as the same NSD. (show details)

Symptom	Error output/message Unexpected Results/Behavior
Environment	All
Trigger	Long NSD names of all digits
Workaround	Add an alphabetic character in to the NSD name.

5.0.5.9

Admin commands

IJ33372

During reconnect in the middle of a write operation, the below error may be reported:

2021-03-30_12:59:35.050-0400: [W] Encountered first checksum error on network I/O from NSD Client 10.10.10.10

(show details)

Symptom	IO error
Environment	Linux (s390x)
Trigger	Network is not good which can lead to TCP connections reconnect.
Workaround	None

5.0.5.9

Core GPFS

IJ33386

Command: err 46: tsunlinkfileset -f after mmunlinkfileset commands are invoked. (show details)

Symptom	Unable to unlink / delete the fileset which encountered this error
Environment	All
Trigger	Invoking the mmunlinkfileset command
Workaround	Reboot the node and retry the mmunlinkfileset command.

5.0.5.9

Filesets

IJ33392

With the introduction of the 5-level page tables, supported by Intel's Ice Lake processor generation, user space memory gets expanded by a factor of 512. This resulted in the change of kernel base address and due to this GPFS asserts with message "logAssertFailed: (UIntPtr)(vmallocStart)" while validating kernel addresses. (show details)

Symptom	Assert
Environment	Linux (x86_64)
Trigger	Systems that attempt to install IBM Spectrum Scale on a newer Intel x86_64 processor with 5-level page.
Workaround	Disable 5-level page table setting by adding no5lvl to the kernel command line and then rebooting the node. Check the documentation of the Linux distribution used for details on how to apply this change. For example, on RHEL8: # grubby --update-kernel=ALL --args="no5lvl" # cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-240.10.1.el8_3.x86_64 root=/dev/mapper/rhel-root ro crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv= rhel/swap rhgb quiet net.ifnames=0 biosdevname=0 no5lvl

5.0.5.9

Core GPFS

IJ33393

When users run the mmlsfileset command, it doesn't show the junction paths of some fileset randomly. (show details)

Symptom	Unexpected results
Environment	All
Trigger	One fileset's root directory has been corrupted for an unknown reason.
Workaround	None

5.0.5.9

Filesets

IJ33394

Assert "(verify == 0) || (ofP == __null) || (ofP->sgP == __null) || ofP->isRoSnap() || (ofP->metadata.getInodeStatus() != 1) || (!ofP->sgP->isFileIncludedInSnapshot(ofP->getInodeNum(), ofP->getSnapId(), getInodeStatus())) || (ofP->assertInodeWasCopiedToPrevSnapshot()) || (ofP->isBeingRestriped() || ofP->beenRestriped)". (show details)

Symptom	Daemon crash
Environment	All
Trigger	Operations triggering a statlite call on a node without sufficient stat file token
Workaround	Disable the statlite config parameter by "mmfschconfig statliteMaxAttrAge=0 -i".

5.0.5.9

Core GPFS

IJ33410

--safe-limit option is ignored when eviction is invoked manually using the "mmafmctl device evict" command. (show details)

Symptom	Unexpected Results
Environment	All
Trigger	AFM manual eviction with safe limit
Workaround	None

5.0.5.9

AFM

IJ33627

When a thread performing shutdown and a thread initiating startup run concurrently, it is possible that it could result in a kernel crash. (show details)

Symptom	Abend/Crash
Environment	Linux
Trigger	Very small race window in GPFS cleanup process
Workaround	None

5.0.5.9

Core GPFS

IJ33680

If a Linux node is overloaded and the thread cannot be scheduled quickly could result in a kernel panic: RIP list_del_entry_valid.cold (show details)

Symptom	Abend/Crash
Environment	Linux
Trigger	mmshutdown on a busy Linux node
Workaround	None

5.0.5.9

Core GPFS

IJ33702

cNFS does not work on RHEL8.x. This is due to a change in pidof commnand in RHEL8. (show details)

Symptom	Unexpected Results/Behavior Node Reboot
Environment	Red Hat Enterprise Linux 8.x
Trigger	Enable cNFS on RHEL8.x nodes.
Workaround	Downgrade or upgrade procps-ng to procps-ng-3.3.15-3.el8.

5.0.5.9

cNFS

IJ33704

Fileset path is taking the chars count to read the actual mount path from a given directory path. If the directory mount has the same chars till the count then prefetch starts processing successfully. (show details)

Symptom	dirpath is checked properly
Environment	All
Trigger	Prefetch starts working on an invalid path where the dir path doesn't belong to the same fileset.
Workaround	None

5.0.5.9

AFM

IJ33715

GPFS has fileset level permissions which can deny setting the mode or EAs on the fileset entities depending on which mode this targets. AFM doesn't consider this flag on the fileset and we end up getting E_PERM from the home which causes the queue to get stalled. Normal queue goes fine, but its mostly the recovery or resync queue that hits this. (show details)

Symptom	Unexpected Results
Environment	Linux
Trigger	Set the Same Fileset level permissions flag (setAclOnly, ChmodOnly, chmodAndUpdateAcl etc.) on both Cache/Primary and/or Home/Secondary sites and perform IO to the fileset and then run Recovery or Resync.
Workaround	Drop those operations that stall the queue when the Fileset level permissions are enabled only at one of the 2 sites.

5.0.5.9

AFM

IJ33740

AFM Prefetch is not generating the prefetch end callback event registered through the afmPrepopEnd event. (show details)

Symptom	Unexpected Results/Behavior
Environment	Linux
Trigger	Register for afmPrepopEnd callback event, and run AFM prefetch with list file or directory option.
Workaround	None

5.0.5.9

AFM

IJ33741

The automatic restart of NFS (remedy action) is blocked by an open unmounted_fs_check event which is not relevant for NFS/SMB exports. (show details)

Symptom	Performance Impact/Degradation
Environment	Linux (CES nodes running NFS)
Trigger	File systems with the automount flag and an unmounted file system
Workaround	Remove the "automount" flag from the "testFS" file system.

5.0.5.9

System health

IJ33759

The Mellanox firmware manager was called frequently (around every minute) by the system health monitor. That caused a high CPU load. (show details)

Symptom	Performance Impact/Degradation
Environment	All
Trigger	The Mellanox firmware check is executed too frequently by the system health monitor. There is no need for so much checking.
Workaround	None

5.0.5.9

System health

IJ33778

RAS event dir_sharedroot_perm_problem was received by mmhealth sometimes without a need and sometimes with a need, but the description of the event does not say what is wrong with the permissions and which permissions should be provided. (show details)

Symptom	Error output
Environment	Linux
Trigger	cesSharedRoot does not have permissions 'rx' for 'group' and 'others'.
Workaround	Provide the necessary permissions for cesSharedRoot ('rx' for 'group' and 'others').

5.0.5.9

System health

IJ33860

cNFS does not work on RHEL8.x. This is due to a change in pidof commnand in RHEL8. (show details)

Symptom	Unexpected Results/Behavior Node Reboot
Environment	Red Hat Enterprise Linux 8.x
Trigger	Enable cNFS on RHEL8.x nodes.
Workaround	Downgrade or upgrade procps-ng to procps-ng-3.3.15-3.el8.

5.0.5.9

cNFS

IJ33861

If a file system is set to maintenance mode then it is listed as 'SUSPENDED', but only a 'unmounted_fs_check' event is shown as the reason. It should say 'maintenance state' instead. (show details)

Symptom	Error output/message
Environment	All
Trigger	The 'fs_maintenance_mode' event is only at info-level, since it is a user intended state. Info-level events are in general not reported by 'mmhealth node show' since they do not indicate an issue or error state. A code change was done to allow the 'fs_maintenance_mode' event to be listed as a reason.
Workaround	None

5.0.5.9

System health

IJ33862

Ganesha fails to open files when over 1 million files are open. (show details)

Symptom	Check for logs "Futility count exceeded. Client load is opening FDs faster than the LRU thread can close them." and values of current_open and former_open.
Environment	Linux
Trigger	Whenever a client opens more than 1 million files.
Workaround	None

5.0.5.9

CES NFS

IJ32972

Manual procedure to decommission a DataNode is not supported. (show details)

Symptom	NA
Environment	All
Trigger	NA
Workaround	None

5.0.5.9

HDFS Transparency

IJ31047

Assertion (!OWNED_BY_CALLER(lockWordCopy, lockWordCopy) Failure at line 1275 in file dSynch.C when accessing snapshot files (show details)

Symptom	Daemon Crash
Environment	All
Trigger	Data prefetch is triggered after sequentially accessing snapshot files.
Workaround	None

5.0.5.8

Snapshots

IJ32501

When getting the stats of a file, users could run into the assert:

"Assert exp((verify == 0) || (ofP == __null) || (ofP->sgP == __null) || ofP->isRoSnap() || (ofP->metadata.getInodeStatus() != 1) || !ofP->sgP->isFileIncludedInSnapshot(ofP->getInodeNum(), ofP->getSnapId(), getInodeStatus())) || (ofP->assertInodeWasCopiedToPrevSnapshot()) || (ofP->isBeingRestriped() || ofP->beenRestriped)"

if there are writes to the same file from other nodes. (show details)

Symptom	Daemon Crash
Environment	All
Trigger	Getting the lite stat of a file while writes are in progress from other nodes.
Workaround	Run the mmchconfig command to reset the configuration "statliteMaxAttrAge=0", which will disable the statlite and avoid this problem, but it may also impact the writes performance on the other nodes as well.

5.0.5.8

gpfs_statlite API

IJ32365

AFM prefetch fails with "too many open files" error. (show details)

Symptom	Unexpected results
Environment	All
Trigger	AFM prefetch
Workaround	None

5.0.5.8

AFM

IJ32345

The systemhealth monitor did not detect all paths for RDMA support (libibverbs.so library) on Ubuntu Machines. Therefore, it reports a "ib_rdma_libs_wrong_path" issue. (show details)

Symptom	Error output/messages
Environment	Ubuntu Linux
Trigger	The issue shows up on Ubuntu machines with RDMA in use.
Workaround	None

5.0.5.8

System health

IJ32361

After convert legacy recovery group to mmvdisk managed recovery group, poor write performance observed from an application and the gpfs daemon did not come up because of OOM issue on the some nodes. (show details)

Symptom	Abend/Crash Performance Impact/Degradation
Environment	Linux
Trigger	When you convert legacy recovery group to mmvdisk managed recovery group by the following command: mmvdisk recoverygroup convert --recovery-group RgName[,RgName] --node-class NcName
Workaround	Use the following command to reset the pagepool to 60%: mmvdisk server change --node-class NcName --pagepool 60% --recycle one

5.0.5.8

ESS, GNR

IJ32375

Application performance degradation while running on AFM filesets. (show details)

Symptom	Performance Impact/Degradation
Environment	Linux
Trigger	AFM replication
Workaround	None

5.0.5.8

AFM, AFM DR

IJ32503

GPFS daemon could assert with: Assert exp(start + offsetToRef(elen) <= dhP->hashTabRef) when operate on a corrupted directory block. The assert also prevent repairs using mmfsck. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Corruption in the directory block
Workaround	None

5.0.5.8

Core GPFS

IJ32504

AFM recovery may incorrectly delete the files at home or secondary if there is any network issues while doing the home readdir. (show details)

Symptom	Unexpected Results
Environment	Linux
Trigger	AFM recovery
Workaround	Resync the fileset if there are any missing files at home.

5.0.5.8

AFM, AFM DR

IJ32581

When doing preallocation and writes (e.g., Spectrum Protect Plus copy restore), the block usage of the file system is more than the total data size of these files. (show details)

Symptom	More disk space usage than expected.
Environment	All
Trigger	Preallocate the data blocks of the file, and then write as much data as the file size.
Workaround	Use the following command: mmchattr --compact=fragment

5.0.5.8

Disk space preallocation of files

IJ32601

mmhealth reports degraded network with reason "ib_rdma_nic_unrecognized" even though all RDMA ports are operational. (show details)

Symptom	Error output/message
Environment	Linux
Trigger	Uninitialized data can lead to invalid input for mmhealth.
Workaround	None

5.0.5.8

RDMA

IJ32653

AFM prefetch fails with error 238 if the prefetch list file contains symlinks and if their target paths do not exist as part of the same fileset. (show details)

Symptom	Unexpected results
Environment	Linux
Trigger	AFM prefetch
Workaround	None

5.0.5.8

AFM

IJ32796

Operations requiring allocation of full metadata blocks. Examples: Expand number of allocated inode Create new independent fileset. (show details)

Symptom	Performance Impact/Degradation
Environment	All
Trigger	Operations requiring allocation of full metadata blocks. Examples: Expand number of allocated inode or create new independent fileset
Workaround	Add more disks to the system pool.

5.0.5.8

Core GPFS

IJ32797

The Linux fallocate(2) API doesn't work correctly on Spectrum Scale file systems when punching a hole beyond the end of file. (show details)

Symptom	Punching a hole beyond the end of a file fails with EINVAL(22) error.
Environment	Linux
Trigger	Punching a hole through Linux fallocate(2) API
Workaround	None

5.0.5.8

fallocate(2)

IJ32813

Issuing a "mmchnode --daemon-interface" attempts to change the cluster configuration repository (CCR). When this mmchnode is issued from a Windows node, CCR gets committed with invalid IPv4 information, rendering the cluster in a non-working state. (show details)

Symptom	The mmchnode command fails with a message trail resembling: 'mmchnode: Unable to commit new changes.' 'mmchnode: [E] The command was unable to reach the CCR service on any quorum node. Ensure the CCR service (mmfsd or mmsdrserv daemon) is running on all quorum nodes and the communication port is not blocked by the firewall.' 'mmchnode: 6027-1271 Unexpected error from function setRunningCommand. Return code: 149'
Environment	Windows (x86_64)
Trigger	Issuing "mmchnode --daemon-interface" command on a Windows node specifying an alternate IPv4 address.
Workaround	None. A manual CCR restore (mmsdrrestore --ccr-repair) may be necessary to restore the cluster to a working state.

5.0.5.8

CCR

IJ32814

Offline fsck will not be able to repair all corruptions when using the option of applying patch file ((i.e mmfsck FSchk -v --patch-file path-towrite-patchfile --patch) to repair the corruptions When repairing corruption by applying patch file the fsck output would show the below messages indicating the issue:

---------------- Invalid BlockType Inode. Skipping patch. ---------------

(show details)

Symptom	Error output/message and all corruptions not fixed
Environment	All
Trigger	Offline fsck repairing corruptions by applying patch file.
Workaround	Run fsck repair with the regular option -y to fix the corruptions.

5.0.5.8

FSCK

IJ32859

When the mmdf command is run from a directory where the current working directory has become stale (directory was deleted after going to it), the command states it was run from an invalid directory. (show details)

Symptom	Command states it was run from an invalid directory. But the command fails with various additional errors.
Environment	All
Trigger	Running the mmdf command from a directory that is stale (directory was deleted after going to it).
Workaround	Only use MM commands in a valid current working directory. Move to a directory that still exists within the nodes file systems.

5.0.5.8

Core GPFS

IJ33000

In the current implementation of Eviction on a file, the eviction program acquires a DMAPI lock first on the file first and punches a hole on it. The program can be terminated at any point without the DMAPI lock to be released - causing a lock leak and hence later DMAPI lock acquire on the file can deadlock and the only way to come out of this is to bounce the mmfsd. (show details)

Symptom	Deadlock
Environment	AIX, Linux
Trigger	Try to evict a file or list of files, and the eviction getting killed midway through.
Workaround	None

5.0.5.8

AFM

IJ32892

On an AIX node, in some occasions, including the /var file system becoming full, mmfsd is unable to run child processes, and that results in different failures, depending on the process which mmfsd attempts to run. Among the operations which have been seen to fail: - mmadddisk - mmauth Once the problem is triggered, it will remain until the mmfsd daemon is restarted. If the problem is initiated by the /var file system getting full, freeing up space on that file system is not enough to solve the problem. An indication that problem is taking place is in the output of command /usr/lpp/mmfs/bin/tslsfs nonexistent_FS (that is, passing the name of a nonexistent file system as parameter to the command above).

In a system where the problem is occurring, the output will be mmcommon getEFOptions nonexistent_FS failed. Return code 1.

While on a system without the problem, the output will be mmcommon: File system nonexistent_FS is not known to the GPFS cluster.

(show details)

Symptom	Unexpected Results/Behavior
Environment	AIX
Trigger	A likely trigger for the problem is the /var file system being filled, possibly around the time an operation is taking place that results in information being produced to the mmfs.log file.
Workaround	Once the issue in /var is resolved, restarte mmfsd.

5.0.5.8

Core GPFS

IJ32893

When a dependent fileset is created and linked under AFM independent fileset, ACLs form the home dependent fileset are not fetched and set at the cache dependent fileset. This happens only for the dependent fileset root path. (show details)

Symptom	Unexpected Results
Environment	Linux
Trigger	AFM caching with dependent filesets
Workaround	None

5.0.5.8

AFM

IJ32906

DEADLOCK PROBECLUSTERTHREAD WAITING FOR SG CLEANUP (show details)

Symptom	Deadlock
Environment	Linux (PPC64)
Trigger	Generating file audit logging or watch folder events on a ppc64 node and there are x86 or ppc64le nodes also using the same file system.
Workaround	None

5.0.5.8

File audit logging, Watch folder

IJ32929

The mmfs.log may be filled with netstat: not found messages on systems running SLES 15. This is the result of running mmdiag --network command explicitly or through mmhealth monitoring service which uses the netstat command. (show details)

Symptom	Error output/message
Environment	SLES 15
Trigger	When the mmdiag command is invokes with the --network option on SLES15 systems directly on the command line or through system health monitoring.
Workaround	Install the deprecated net-tools package.

5.0.5.8

GUI, System health

IJ30160

When mmbackup or tsapolicy is called to scan files, it could report "no such file or directory" for existing files. (show details)

Symptom	Unexpected behavior
Environment	All
Trigger	Running mmbackup or tsapolicy operation while there are file deletions in progress.
Workaround	Rerun the mmbackup or the tsapolicy command.

5.0.5.7

mmbackup, tsapolicy, GPFS API, DMAPI, AFM

IJ30308

mmcrfs fails to create file systems when the cluster is configured with minQuorumNodes greater than one and tiebreakerDisks are in use. (show details)

Symptom	Error Output/Message Unexpected Results/Behavior
Environment	All
Trigger	CCR Cluster minQuorumNodes parameter is set to more than one and tiebreakerDisks are in use.
Workaround	Either set minQuorumNodes back to the default or disable tiebreakerDisks.

5.0.5.7

Admin commands

IJ31134

mmbackup does not honor --max-backup-size in a snapshot backup. (show details)

Symptom	Performance Impact/Degradation
Environment	All
Trigger	Running snapshot backup with the --max-backup-size option.
Workaround	None

5.0.5.7

mmbackup

IJ30673

When aioSyncDelay config is enabled, the buffer steal and the aio writes that need to be done as buffered I/O may race with each other and causes log assert isSGPanicked in clearBuffer. (show details)

Symptom	Abend/Crash
Environment	Linux
Trigger	Enabling aioSyncDelay config and do aio writes.
Workaround	Issue mmchconfig aioSyncDelay=0 to disable it.

5.0.5.7

Core GPFS

IJ30700

GPFS command reports incorrect default for nsdRAIDMaxRecoveryRetries. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Command output
Workaround	None

5.0.5.7

Admin commands

IJ31060

State of Physical disk shown as "unknown" in mmhealth and GUI for ECE. (show details)

Symptom	Unexpected Results/Behavior
Environment	Linux
Trigger	ECE, this does not affect GNR/ESS
Workaround	None

5.0.5.7

System health, GUI

IJ31785

mmhealth has issues with high inode consumption. (show details)

Symptom	Error output/messages
Environment	All
Trigger	One fileset out of multiple filesets for the same file system ran out of inode space.
Workaround	If the file system contains multiple filesets (even the root file system) then check if any of them exceeds the defined limits and fix this issue. Use mmdf and mmlsfileset commands for more diagnostic.

5.0.5.7

System health

IJ31208

Daemon assert: (ofP == NULL) || (getPseudoIbdP() == ibdP) || (ibdP->assignedIndDA.isALLOC()) in file Metadata.h, resulting in mmfsd daemon crash. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Concurrent random writes from multiple nodes with new disk space allocation, and some of these writing nodes fail.
Workaround	None

5.0.5.7

Core GPFS

IJ30797

GPFS daemon could fail with logAssertFailed: getDeEntType() == detUnlucky when reading a directory block that contain unexpected data due to corruption. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Unexpected data corruption in directory block
Workaround	None

5.0.5.7

Core GPFS

IJ32008

The free space stats report through the "df" command or statfs API is very out of date on client nodes. (show details)

Symptom	Stale free space report on client nodes
Environment	All
Trigger	NA
Workaround	Run a mmdf command on the problematic client node or issue the "df" command from the fs manager node.

5.0.5.7

Core GPFS

IJ29239

logAssertFailed: ofP->mnodeStatusIs(0x4) (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Have many processes/threads on different nodes repeatedly writing the same data to the inode file.
Workaround	None

5.0.5.7

Core GPFS

IJ31851

While running offline fsck the node asserts with signal 11 when checking log files. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Running offline fsck while deleting the system pool disk.
Workaround	Patch the SGdesc using tsdbfs and set numlogitems to 0: # tsdbfs fs patch desc numlogitems 0

5.0.5.7

FSCK

IJ31852

Assert exp(isValidSocket(sock)) in line 2722 of file thread.C (show details)

Symptom	Abend/Crash
Environment	All
Trigger	mmhealth daemon is down.
Workaround	None

5.0.5.7

Core GPFS

IJ32009

GNR rebalance unable to complete after many days. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Add an enclosure to an existing RG or MES upgrade.
Workaround	None

5.0.5.7

ESS, ECE, GNR

IJ31853

The provided improvements result in a more robust functionality of the MM command interface. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	NA
Workaround	None

5.0.5.7

Core GPFS

IJ31880

GPFS daemon assert: retryCount <= 300 (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Running mmrestripefs command and disk related commands such as mmchdisk, mmdeldisk.
Workaround	Avoid running mmrestripefs and disk related commands such as mmchdisk, mmdeldisk at the same time.

5.0.5.7

Core GPFS

IJ29447

When performing IO on a very large file, contention for InodeCacheObjMutex could occur as a number of buffers for the file increases. This is more likely to happen on file systems with smaller block sizes. (show details)

Symptom	Performance Impact/Degradation
Environment	All
Trigger	Running IO to a large file.
Workaround	None

5.0.5.7

Core GPFS

IJ31902

There is a small possibility for both replica to be placed in the same failure group when there is a disk configuration change and one failure group is low on free space. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Disk configuration change via mmdeldisk, mmchdisk.
Workaround	Avoid delete disk when one failure group is low on free space.

5.0.5.7

Core GPFS

IJ31927

GPFS daemon could fail unexpectedly with assert: commitRanges[i].valid || commitRanges[i].numBytes <= lfP->sgP->getWriteCacheThreshold(). This could happen after mmchfs command was issued to reduce write cache threshold while applications are actively writing to the file system. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Reducing the write cache threshold using the mmchfs command.
Workaround	Do not reduce write cache threshold while there is write workload on the file system.

5.0.5.7

HAWC

IJ32035

Assert in mmfsd "Signal 6 at verbs::parseConfigVerbsPorts, at verbsInit.C:4407", resulting in a Spectrum Scale crash at start up. (show details)

Symptom	Abend/Crash
Environment	Linux
Trigger	RDMA is active.
Workaround	Disable RDMA.

5.0.5.7

RDMA

IJ32038

There appears to be an issues at the systemd layer that causes startup service to fail with connection time out during reboot. If auto load is set to yes, GPFS may not be able to start up or it may get stuck waiting for the environment to be initialized. (show details)

Symptom	GPFS does not start after a reboot.
Environment	Linux
Trigger	This issue affects clusters with auto load set to yes and hitting systemd connection time out during reboot.
Workaround	Manually restart GPFS.

5.0.5.7

GPFS startup, CCR, systemd

IJ30432

When mmdelnode is issued against a node whose mmfsd daemon is still up, several of the nodes in the cluster can fail with messages such as the following:

[E] Deleted node 169.28.113.36 (nodeX) is still up. [E] Node 169.28.113.36 (nodeX) has been deleted from the cluster

(show details)

Symptom	Abend/Crash
Environment	All
Trigger	Issuing mmdelnode against a node whose mmfsd daemon is still up on the target node, or against a node whose status cannot be determined.
Workaround	Ensure that the mmfsd is down on a given node before issuing a mmdelnode command against that node. If the status of the target node cannot be determined, ensure that the node gets powered down.

5.0.5.6

Cluster Membership

IJ30346

Some processes may not be woken up as they should during a cluster manager change. That might lead to potential deadlocks.

(show details)

Symptom	Long Waiters
Environment	All
Trigger	Cluster manager change
Workaround	None

5.0.5.6

Core GPFS

IJ30393

The GPFS daemon could fail with logAssertFailed: fromNode != regP->owner. This could occur when a file system's disk configuration is changed just as a new file system manager is taking over.

(show details)

Symptom	Abend/Crash
Environment	All
Trigger	Disk configuration change by using commands such as mmadddisk, mmdeldisk and mmchdisk
Workaround	Avoid disk related commands right after a new file system has been appointed.

5.0.5.6

Core GPFS

IJ30402

"Disk in use" error when using not partitioned DASD devices.

DASD '/dev/dasdk' is in use. Unmount it first! mmcrnsd: Unexpected error from fdasd -a /dev/dasd. Return code: 1 mmcrnsd: [E] Unable to partition DASD device /dev/disk/by-path/ccw-0.0.0500 mmcrnsd: Failed while processing disk stanza on node node01.abc.de %nsd: device=/dev/disk/by-path/ccw-0.0.0500 nsd=scale_data01 servers=node01.abc.de usage=dataAndMetadata

(show details)

Symptom	Upgrade/Install failure
Environment	Linux (s390x)
Trigger	Running the mmcrnsd command on not partitioned DASD devices.
Workaround	Before running the mmcrnsd command, partition the DASD devices with fdasd -a, refresh the partition table on all server nodes, and specify the partitions of the devices in the mmcrnsd stanza file. Then, run the mmcrnsd command.

5.0.5.6

Installation toolkit

IJ30408

AIO operations on encrypted files are handled as buffered IO, further decreasing the performance of the AIO operation in addition to the crytographic overhead introduced by the encryption of files in the file system.

(show details)

Symptom	Performance Impact/Degradation
Environment	AIX (POWER)
Trigger	Using AIO
Workaround	None

5.0.5.6

Encryption

IJ30451

When a user starts the mmrestripefile command against a big file with the -b option, it could take a long time (For example, more than 20 minutes) to return but no data movement is seen between disks. This is because the big file is already balanced.

(show details)

Symptom	Performance Impact
Environment	All
Trigger	Using mmrestripefile to rebalance big files.
Workaround	None

5.0.5.6

mmrestripefile command

IJ30409

Kernel v4.7 changes the inode ACLs cache mechanism, and GPFS (5.0.5.2+, 4.2.3.23+) does not adapt to the new kernel behaviors. The following two typical issues are observed:

1. normal user can access one file, and root removes the file access privilege from the user by chmod command => the user can still access the file

2. normal user cannot access one file, and root grants the file access privilege for the user by chmod command => the user cannot access the file either.

(show details)

Symptom	Unexpected Results/Behavior
Environment	Linux with kernel v4.7 or later
Trigger	GPFS does not adapt to the new kernel behavior.
Workaround	Remount the file system.

5.0.5.6

Core GPFS

IJ30458

skipRecall config does not work.

(show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	When skipRecall config is used.
Workaround	None

5.0.5.6

DMAPI

IJ30427

The mmfs.log shows several "sdrServ: Communication error" messages.

(show details)

Symptom	Error output/messages
Environment	All
Trigger	Running the systemhealth monitor.
Workaround	Restart the systemhealth monitor (mmsysmoncontrol restart).

5.0.5.6

System health

IJ30461

mmbackup could backup files unnecessary after failure.

(show details)

Symptom	Performance Impact/Degradation
Environment	All
Trigger	mmbackup failing during IBM Spectrum Protect client and server backup phase
Workaround	None

5.0.5.6

mmbackup

IJ30462

Memory leak on file system manager node during quota revoke storm

(show details)

Symptom	Cluster/File System Outage
Environment	All
Trigger	Quota usage reaches limit setting
Workaround	Restart GPFS.

5.0.5.6

Quotas

IJ30463

While migrating a file to the cloud, the gpfs daemon might hit a signal in StripeGroup::decNumAccessRights() (show details)

Symptom	Daemon hits a signal during file migration to the cloud.
Environment	Linux
Trigger	Migrating a file to the cloud.
Workaround	None

5.0.5.6

TCT

IJ30429

mmfsd crashed due to signal 11 when verifying the file system descriptor.

(show details)

Symptom	Abend/Crash
Environment	All
Trigger	Descriptor verification
Workaround	None

5.0.5.6

Core GPFS

IJ30466

mmsmb exportacl list doesn't show "@" of the SMB share name.

(show details)

Symptom	Error output/messages
Environment	Linux
Trigger	Using an '@' character in SMB share name.
Workaround	Do not use an '@' character in SMB share names.

5.0.5.6

SMB

IJ30397

mmvdisk throws an exception for a list operation when the daemon node name is not identical to the admin node name.

(show details)

Symptom	Error output/message
Environment	Linux
Trigger	Daemon node name doesn't match admin node name.
Workaround	None

5.0.5.6

GNR, ESS

IJ30465

Assert failure luEnclosureSlotP == __null

(show details)

Symptom	Abend/Crash
Environment	Linux
Trigger	A surprise disk pull
Workaround	None

5.0.5.6

GNR, ESS

IJ30352

logAssertFailed (*respPP != __null) cacheops.C (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Writing to the file in AFM read-only mode to a fileset which is exported using a ganesha NFS server.
Workaround	Export the AFM read-only fileset path in read-only mode from the ganesha NFS server.

5.0.5.6

AFM

IJ30493

The administrator is unable to change the page pool setting on the GNR recovery group server. The problem is seen only on recovery groups not managed by mmvdisk. The mmchconfig command will fail, and the following error message is displayed:

The --force-rg-server flag must be used to change the pagepool

(show details)

Symptom	Error output/message Unexpected Results/Behavior
Environment	Linux
Trigger	mmchconfig pagepool=newValue on a GNR server not administered by the mmvdisk command set.
Workaround	None

5.0.5.6

GNR, ESS

IJ30143

After dm_punch_hole call, a dm_get_allocinfo could return improper results for the information of the data blocks allocation.

(show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Calling dm_punch_hole to create hole in a file.
Workaround	None

5.0.5.6

DMAPI

IJ30621

Running "mmces events list" stdout prints many trailing white characters (empty spaces), unnecessarily.

(show details)

Symptom	Unexpected Results/Behavior
Environment	Linux
Trigger	Running mmces events list.
Workaround	None

5.0.5.6

CES

IJ30634

An RPC message could be handled twice when TCP reconnect happens This could cause log assertion, FS struct error or be silently handled depending on the type of RPC.

(show details)

Symptom	Abend/Crash
Environment	All
Trigger	Network is not good which leads to TCP connection reconnect.
Workaround	None

5.0.5.6

Core GPFS

IJ30785

If the mmvdisk command takes more than 60 seconds to complete, mmhealth reports all pdisks as vanished. On larger systems with many I/O nodes and pdisks, 60 second timeouts are not enough.

(show details)

Symptom	Unnecessary events shown in mmhealth / GUI
Environment	Linux, ESS I/O nodes
Trigger	Running mmvdisk command on a system with many I/O nodes and pdisks.
Workaround	Manually increase the time out of the mmvdisk command execution in the mmhealth monitoring.

5.0.5.6

System health, GUI

IJ30864

In a mixed AIX/Linux cluster, the mmbackup command could fail with gskit/ssl errors after upgrading IBM Spectrum Protect code to 8.1.11, which introduced new rpm for gskit 8.0-55.17 that is not compatible with gpfs.gskit version.

(show details)

Symptom	Performance Impact/Degradation
Environment	AIX and Linux mixed cluster
Trigger	Run mmbackup in a AIX/Linux mixed cluster with IBM Spectrum Protect 8.1.11.
Workaround	None

5.0.5.6

mmbackup

IJ29444

After Node-B successfully reestablishes a broken connection to Node-A, Node-A still shows the reconnect_start state (DEGRADED).

(show details)

Symptom	Error output/messages
Environment	All
Trigger	Reconnecting broken connections
Workaround	Restart the systemhealth monitor (mmsysmoncontrol restart).

5.0.5.6

System health

IJ30878

AFM gateway node crashes if the home is not responding and multiple threads are trying to read the same file. (show details)

Symptom	Crash
Environment	Linux
Trigger	Reading the same file from multiple threads when the home is not responding.
Workaround	Use the undocumented config option "mmfsadm afm readbypass -1" on the gateway node.

5.0.5.6

AFM

IJ30973

AFM gateway node asserts if the home is not responding and multiple threads are trying to read the same file. (show details)

Symptom	Crash
Environment	Linux
Trigger	Reading the same file from multiple threads when the home is not responding.
Workaround	Use the undocumented config option "mmfsadm afm readbypass -1" on the gateway node.

5.0.5.6

AFM

IJ30675

Daemon (AFM) assert goes off: getReqP->r_length <= ksP->r_bufSize (show details)

Symptom	Unexpected Results
Environment	Linux
Trigger	AFM caching mode and a read of an uncached file.
Workaround	Increase the afmReadBufferSize config value.

5.0.5.6

AFM

IJ30976

Revalidation on a AFM fileset fails on a RHEL 8.3 gateway node and home changes may not be detected causing the data or metadata mismatch between cache and home. (show details)

Symptom	Unexpected Results
Environment	Red Hat Enterprise Linux 8.3
Trigger	AFM caching, resync and failover.
Workaround	None

5.0.5.6

AFM, AFM DR

IJ30778

With async refresh enabled, file system quiesce is blocked during the remote operation and it might result in a deadlock if the remote is not responding. (show details)

Symptom	Deadlock
Environment	Linux
Trigger	AFM caching mode with async refresh
Workaround	None

5.0.5.6

AFM

IJ29433

The systemhealth monitor reported a gpfs_down event and triggered a failover even though the system was fine. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Unknown
Workaround	None

5.0.5.5

System health

IJ29517

When an uncached file is renamed in the local-updates mode, the file is not copied to the previous snapshot causing the setInodeDirtyAndVerify assert. (show details)

Symptom	Crash
Environment	All
Trigger	Rename of a uncached file in local-updates mode fileset
Workaround	None

5.0.5.5

AFM

IJ29434

While the GPFS daemon is shutting down, there is chance that a specific trace will be logged and it may crash the kernel.

(show details)

Symptom	Abend/Crash
Environment	Linux
Trigger	mmshutdown when trace is enabled and there is still a workload which accesses the GPFS file system
Workaround	Stop the workload before mmshutdown, or disable mmtrace before mmshutdown.

5.0.5.5

Core GPFS

IJ29530

IBM Spectrum Scale has core dump triggered in dAssignSharedBufferSpace() due to segmentation fault hit by the mmfsd or the lxtrace daemon.

(show details)

Symptom	Daemon crash
Environment	Linux
Trigger	Enabling overwrite tracing and using internal lxtrace command to recycle the trace data.
Workaround	You should not directly use the lxtrace command. Instead, you should use mmtrace or mmtracectl command to recycle the trace data.

5.0.5.5

Trace

IJ29435

On zLinux, while running an mmap workload with traceIoData configuration enabled, the trace code may trigger a page fault and cause the kernel to crash.

(show details)

Symptom	Abend/Crash
Environment	Linux (s390x only)
Trigger	Running mmap workload On zLinux with traceIoData configuration enabled.
Workaround	Do not enable traceIoData.

5.0.5.5

mmap

IJ25754

Quota clients request quota shares based on the workload and most of the time the quota shares given to an active client is much larger than the previously pre-defined amount (e.g. 20 file system blocks). The unused or excess quota shares are returned to the quota manager periodically. At the quota manager side, when the quota usage exceeds the established soft quota limits, the grace period is triggered. At this event, the quota shares are reclaimed and the quota share distribution falls back to a more conservative fashion (based on predetermined amount). In certain workloads, when the partial quota shares are returned to the manager along with the usage updates and as a result it triggers the soft quota limit exceeded event, some amount of quota shares are lost due to mismanagement of quota shares between the client and the manager, leading to a permanent loss of quota shares correctable by using the mmcheckquota command.

(show details)

Symptom	Quota shares loss, thus increasing the in-doubt values, caused by the soft quota exceeded events. The loss of shares can't be reclaimed without running the mmcheckquota command.
Environment	All
Trigger	Timing and workload specific caused by when the quota usage exceeds the soft limits.
Workaround	None. Establishing soft quota limits such that the usage is less likely to trigger repeatedly the "soft quota limit exceeded" events minimizes the timing window of hitting this issue.

5.0.5.5

Quotas

IJ29453

In cases which have small pagepool size and large file system block size, GPFS may wait for page reservation unnecessarily because GPFS tends to reserve more pages than necessary.

(show details)

Symptom	Hang/Deadlock/Unresponsiveness/Long Waiters
Environment	Linux
Trigger	Workloads which access a file both in mmap and regular read/write method
Workaround	None

5.0.5.5

mmap

IJ29535

After mmimgrestore, the mmfsd could assert when handling the mmlsfileset command for a dependent fileset: logAssertFailed: fsOfP->getDirLayoutP() != __null (show details)

Symptom	Operation failure due to file system corruption
Environment	All
Trigger	Running mmimgrestore
Workaround	None

5.0.5.5

DMAPI

IJ29490

Under heavy workload (especially with file creation/deletion involved) with quota function enabled, some race issues are exposed such that the filesetId is not handled correctly, causing a GPFS daemon assert.

(show details)

Symptom	Cluster/File System Outage
Environment	All
Trigger	Heavy workload with file creation/deletion involved
Workaround	None

5.0.5.5

Quotas

IJ29495

mmchnode fails when more than the current number of quorum nodes becomes quorum nodes again.

(show details)

Symptom	Cluster/File System Outage
Environment	All
Trigger	Running the 'mmchnode --quorum -N' command with number of new quorum nodes greater than the number of current quorum nodes.
Workaround	Restart the GPFS daemon on the nodes which should become quorum nodes before attempting the 'mmchnode --quorum ...' command.

5.0.5.5

mmchnode --quorum, CCR

IJ29502

If the cluster is configured with a separate daemon and admin interfaces, the -Y output of mmgetstate only shows the admin node name.

(show details)

Symptom	Command output
Environment	All
Trigger	Running mmgetstate -Y in a cluster with separate daemon and admin interfaces.
Workaround	Get the daemon node name from other commands such as mmlscluster.

5.0.5.5

Admin commands

IJ29678

logAssertFailed: !"Cleanup hit contended Fileset lock."

(show details)

Symptom	Abend/Crash
Environment	All
Trigger	File system specific sync, fileset cleanup actions like unlink fileset, or unmount the file system on a given node.
Workaround	None

5.0.5.5

Filesets

IJ29679

The mmkeyserv command displays the latest expiration date from the KMIPT certificate chain. It should display the expiration date of the end-entity certificate.

(show details)

Symptom	Error output/message
Environment	Linux, AIX
Trigger	mmkeyserv command "show" output
Workaround	Use openssl to display and view each certificate in the chain to determine the correct expiration date.

5.0.5.5

Admin commands, Encryption

IJ29682

When truncating a migrated immutable file with DMAPI interfaces, the data of the file becomes zero, although the file is immutable. (show details)

Symptom	Data of the file becomes zero
Environment	All
Trigger	Truncate operation against a migrated immutable file
Workaround	None

5.0.5.5

Immutable and append-only files

IJ29514

File system manager could assert with exp(isStoragePoolIdValid(poolId)) during log recovery if a node fails shortly after running mmdeldisk. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Node failure shortly after deleting a disk from the file system by using mmdeldisk.
Workaround	None

5.0.5.5

Core GPFS

IJ29515

Incorrect quota check result due to OpenFile reuse/updateShadowTab

(show details)

Symptom	Unexpected Results
Environment	All
Trigger	Running mmcheckquota
Workaround	None

5.0.5.5

Quotas

IJ29686

On clusters having minReleaseLevel at 5.0.1, with mixed version nodes available from 5.0.1.X through till 5.0.5.X nodes and where the gateway node is at level 5.0.5.X, the newer level of gateway nodes finds it hard to co-exist with the older level nodes causing a recovery failure repeatedly.

(show details)

Symptom	Unexpected Behavior
Environment	Linux
Trigger	Triggering AFM Fileset recovery on a mixed version cluster with gateway node at the 5.0.5.X level and minReleaseLevel on the cluster is at the 5.0.1 level.
Workaround	None

5.0.5.5

AFM

IJ29719

While reading the file, the file can be evicted and its captured checksum shows inconsistency for this opened file.

(show details)

Symptom	File gets evicted even though the file is opened for operations.
Environment	Linux
Trigger	Read/write file and run manual evict on the same file.
Workaround	None

5.0.5.5

AFM

IJ29533

AFM Prefetch doesn't print how many files/inodes it has completed processing for queueing. It would help understand the progress because the actual queueing happens when enough files/inodes have been accumulated and it takes longer to update any progress to the user.

(show details)

Symptom	Unexpected Behavior
Environment	Linux
Trigger	Running an AFM listfile or directory prefetch sub-command from the mmafmctl command.
Workaround	None

5.0.5.5

AFM

IJ29683

A NFS client might be blocked for a while after a failover before it continues I/O. (show details)

Symptom	Performance Impact/Degradation
Environment	Linux (CES nodes with Ganesha/NFS installed)
Trigger	A Ganesha/NFS failover
Workaround	None

5.0.5.5

System health

IJ29685

I/O hangs with mirrored disk while recovery group resigns repetitively due to vdisk fault tolerance exceeded.

(show details)

Symptom	I/O hang due to long waiters 'waiting for stateful NSD server error takeover (2)'
Environment	All
Trigger	Mirrored disks and a failed disk
Workaround	None

5.0.5.5

GNR

IJ29690

The systemhealth monitor reports file systems used with NFS/SMB exports as unmounted even when they are mounted and functional.

(show details)

Symptom	Unexpected Results/Behavior
Environment	Linux (CES nodes with Ganesha/NFS or SMB installed)
Trigger	mmhealth running and file systems having NFS/SMB exports
Workaround	Not available. To suppress the misleading events altogether try the following: Edit file /var/mmfs/mmsysmon/mmsysmonitor.conf In section [events], add (if not exist) the following "ignore" entry: ignore =local_exported_fs_unavail,nfs_exports_down,smb_exports_down Change that file for all CES nodes and then restart mmsysmon (mmsysmoncontrol restart)

5.0.5.5

System health

IJ29542

Several RAS events had inconsistent values in their SEVERITY and STATE. For instance, the event "network_bond_degraded", which STATE=DEGRADED, has SEVERITY=INFO. As a result, related failures were not propagated properly.

(show details)

Symptom	Unexpected Results/Behavior
Environment	Linux
Trigger	Conditions for raising of one of the following RAS events are met: - bond_degraded - ces_bond_degraded - ces_ips_all_unassigned - gnr_pdisk_unknown - heartbeat_missing - scale_mgmt_server_failed - tct_cs_disabled exports
Workaround	None. You need to look up the RAS event definition each time to understand if it is a problem.

5.0.5.5

System health, GUI

IJ29910

In IBM Spectrum Scale Erasure Code Edition, it is possible for all of the server's pdisks (physical disks) to become missing, either due to network failure, node failure, or through a planned "node suspend" maintenance procedure. When this happens, the system will continue to function if there is sufficient remaining fault tolerance. However, smaller configurations with less ECE nodes are exposed to a race condition where pdisk state changes can interrupt a system-wide descriptor update which causes the recovery group to resign. It is also possible to experience this problem with higher probability when using small ESS configurations, such as the GS1 or GS2 enclosures. For both ESS and ECE, a possible symptom may appear in the mmfs.log in this form when a pdisk state change is quickly followed by a resign message claiming VCD write failures before the system fault tolerance is exceeded: 2020-12-01_19:01:36.696-0400: [D] Pdisk n004p005 of RG rg1 state changed from ok/00000.180 to missing/ suspended/00050.180. 2020-12-01_19:01:36.697-0400: [E] Beginning to resign recovery group rg1 due to "VCD write failure", caller err 217 when "updating VCD: RGD"

Note that a "VCD write failure" with err 217 is a generic message issued when fault tolerance is exceeded during critical system updates, but in this case the race condition resigns the system when only a handful of missing disks are found.

(show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	pdisk state updates and system-wide descriptor updates on configurations with smaller recovery groups, such as a small number of nodes on ECE or a small Enclosure on ESS.
Workaround	Allow the system to resign and recover on its own. For smaller ECE configurations, avoid leaving a node suspended for long periods of time during maintenance tasks.

5.0.5.5

GNR, ESS

IJ29916

When the file system is in panic on a quota client node, the outstanding quota share is not relinquished. Quota share Indoubt value is reported and the shares can only be reclaimed by mmcheckquota.

(show details)

Symptom	Unexpected Results
Environment	All
Trigger	File system is in panic
Workaround	None

5.0.5.5

Quotas

IJ29919

Incorrect quota check results on small files with fragments

(show details)

Symptom	Unexpected Results
Environment	All
Trigger	Newly created file contains fragments during mmcheckquota execution
Workaround	None

5.0.5.5

Quotas

IJ28891

In a kernel >4.10 and file-sizes being a multiple of page sizes, a false error is returned once the read offset reaches file size.

(show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Having a file size that is a multiple of the page size
Workaround	Do not use a file size that is a multiple of the page size.

5.0.5.5

Core GPFS

IJ29312

Node crash if tremendous parallel access to file with NFS

(show details)

Symptom	Abend/Crash
Environment	Linux
Trigger	File operation lease is broken.
Workaround	None

5.0.5.5

kNFS

IJ29826

AFM gateway nodes runs out of memory during resync glibc is known to use as many arenas as 8 times the number of CPU threads a systems has. This makes a multi-threaded program like AFM which allocates memory for queues to use a lot more memory than actually needed. (show details)

Symptom	Abend/Crash
Environment	Linux
Trigger	AFM resync under heavy workload
Workaround	None

5.0.5.5

AFM

IJ29939

Assertion 'exp(lfVersion != other.lfVersion || lfVersion == 0 || tailLsn == other.tailLsn)' in line 99 of file openlog.C (show details)

Symptom	Abend/Crash/Lost Membership Data/metadata corruption
Environment	Linux
Trigger	New cluster manager election started by a quorum node other than the current CC Mgr. This cluster manager election gets started by a so-called challenge written to the tiebreaker disks which must be answered by the current CC manager.
Workaround	None

5.0.5.5

File IO Cluster manager election

IJ29763

The tsfindinode utility incorrectly reports file path as not found for valid inodes.

(show details)

Symptom	Incorrect output
Environment	All
Trigger	There is no problem trigger as this is a tsfindinode issue.
Workaround	Use mmapplypolicy to get file paths using the following procedure: Create a policy rule file: RULE LIST 'paths' DIRECTORIES_PLUS WHERE INODE=inum1 OR INODE=inum2 OR ... Then, run the policy scan using the preceding rule: mmapplypolicy -f /tmp -P policy.rule -I defer -m 64 View the result in /tmp/list.files

5.0.5.5

tsfindinode

IJ29209

logAssertFailed: ofP->isInodeValid() at mnUpdateInode when doing stat() or gpfs_statlite()

(show details)

Symptom	Abend/Crash
Environment	All
Trigger	A file is actively written on one node and stat() or gpfs_statlite() are repeatedly called on the file on another node.
Workaround	None

5.0.5.5

Core GPFS

IJ30125

When there are many threads doing sync writes through the same file descriptor, a contention on inodeCacheOjbMutex between them could impact the performance of writes.

(show details)

Symptom	Sync write performance becomes bad after upgrading to 5.0.x.
Environment	All
Trigger	Multiple threads are doing sync writes on the same opened file concurrently.
Workaround	Enable the config parameter "prefetchAggressisvenessWrite" to work around the problem if the workload is only doing sync writes.

5.0.5.5

Sync writes

IJ30076

Inodes not getting freed after user deleted them

(show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	A lot of inodes are deleted on a node which overflow background deletion queues.
Workaround	mmfsadm dump deferreddeletions mmfsadm test imapWork fs fullDeletion mmfsadm dump deferreddeletions

5.0.5.5

Core GPFS

IJ29780

mmvdisk --replace command results in message: Location XXX contains multiple disk devices.

(show details)

Symptom	Error output/message
Environment	Linux
Trigger	The problem occurs when hardware errors cause the lower logical block addresses on a disk to become unreadable.
Workaround	Contact IBM support for a workaround disk replacement procedure, or upgrade to a code level with the fix.

5.0.5.5

GNR, ESS

IJ29106

The IBM Spectrum Scale HDFS Transparency connector version 3.1.0-6 contains 2 NullPointerExceptions in the HDFS NameNode service. The application accessing the data is not impacted, but these exceptions are seen in the NameNode log file.

(show details)

Symptom	Error output/message
Environment	All
Trigger	Running the HDFS Transparency workload
Workaround	None

5.0.5.5

HDFS Connector

IJ29133

The IBM Spectrum Scale HDFS Transparency connector version 3.1.0-6 modified the label for the open operation when the configuration is set to "Scale" for the ranger.enabled parameter. When retrieving the JMX stats, the open is reported as GetBlockLocations.

(show details)

Symptom	Open counts are reported under the label GetBlockLocations
Environment	All
Trigger	Retrieving JMX stats when ranger.enabled = scale
Workaround	Use GetBlockLocation values instead of Open.

5.0.5.5

HDFS Connector

IJ28470

The CLiC cryptographic engine by IBM Spectrum Scale has been sunset.

(show details)

Symptom	None
Environment	Linux (PPC64), AIX
Trigger	None
Workaround	None

5.0.5.4

Encryption

IJ26212

mmdumpkthreads is stuck in zombie state.

(show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Run gpfs.snap or the mmdumpkthreads command.
Workaround	The zombie process is harmless. Running the "mmfsadm dump thread" command can clear these mmdumpkthreads zombie processes.

5.0.5.4

System health

IJ27098

mmfsd daemon asserting with Assert exp(cfg) in FSFlashDevice.C.

(show details)

Symptom	Abend/Crash
Environment	Linux
Trigger	On nodes with LROC-devices
Workaround	None

5.0.5.4

LROC

IJ27087

When QoS throttling is in use, and an application uses ionice with certain IO priorities, it is possible for not only that application to experience degraded performance due to the throttline, but also for other file system operations to be delayed or to fail.

(show details)

Symptom	I/O hang
Environment	All
Trigger	Application runs with lower I/O priority when QoS is being used.
Workaround	None

5.0.5.4

QoS

IJ27923

When a user turns off the file system maintenance mode, the file system cannot be mounted.

(show details)

Symptom	Can not mount the file system.
Environment	All
Trigger	Perform some file system operations while the maintenance mode is being turned off.
Workaround	Stop the attempts of file system operations while the maintenance mode is being turned off.

5.0.5.4

Core GPFS

IJ28498

Long waiters

(show details)

Symptom	Long waiters
Environment	All
Trigger	Large pagepool size with most of it used by a single file doing mixed buffered IO and DIO, with other buffered IOs needing to steal buffers.
Workaround	Use a smaller pagepool size. Try to avoid doing buffered IO on a file opened for DIO.

5.0.5.4

Core GPFS

IJ28505

Opening or closing parenthesis not being accepted in bind_dn password.

(show details)

Symptom	Error output/message
Environment	Linux
Trigger	Using '(' or ')' in the bind_dn password.
Workaround	Do not use '(' or ')' in the bind_dn password.

5.0.5.4

Authentication

IJ28604

Umount file system operation is not successful because of open files.

(show details)

Symptom	Umount is not successful on the remote file system.
Environment	Linux
Trigger	Using open control files on remote.
Workaround	Use the mmumount -C all_remote command.

5.0.5.4

AFM, AFM DR

IJ28605

mmlsmount with the --report and -Y options may not take into account nodes which do not have the file system mounted.

(show details)

Symptom	Unexpected results
Environment	All
Trigger	File system is mounted on some nodes and not on others.
Workaround	None

5.0.5.4

Core GPFS

IJ28606

mmhealth cluster show: faster heartbeat_missing

(show details)

Symptom	Network outage
Environment	Linux, AIX
Trigger	Network outage
Workaround	Running "mmhealth config interval high" would have a similar effect on heartbeat events but at a cost of a lot of system resources.

5.0.5.4

System health

IJ28434

mmgetstate -Y format is showing a negative value

(show details)

Symptom	Negative value in mmgetstate -Y
Environment	Linux
Trigger	mmgetstate -Y format
Workaround	None

5.0.5.4

AFM, AFM DR

IJ28611

AFM uses special control files to replicate ACLs, EAs and to check fileset mode at the home/secondary site. This special control file is not being recognized correctly thus affecting the EA and ACLs replication and fileset mode recognition. (show details)

Symptom	Unexpected results
Environment	Linux
Trigger	AFM and AFM DR replication.
Workaround	None

5.0.5.4

AFM, AFM DR

IJ28609

In mmcheckquota, when a quota entry is processed from a deleted fileset, the quota entry is correctly skipped, but this makes the mmcheckquota process exit with an error.

(show details)

Symptom	Error output/message
Environment	All
Trigger	Error handling
Workaround	Rerun mmcheckquota.

5.0.5.4

Core GPFS

IJ28584

File and memory leaks in the kernel (show details)

Symptom	Memory leak
Environment	Linux
Trigger	AFM migration on newer kernel versions >= 3.13
Workaround	None

5.0.5.4

AFM

IJ27905

Failback performance issues (show details)

Symptom	Failback performance issues
Environment	Linux
Trigger	FIO 2MSeqwrite Node suspend/resume. AZ CLI vm stop/start, spontaneous vm reboots and error injects
Workaround	None

5.0.5.4

CES

IJ28184

mmfsd daemon assert going off: Assert exp(rmsgP != __null) in file llcomm.C, resulting in a daemon crash.

(show details)

Symptom	Abend/Crash
Environment	Linux (CES nodes)
Trigger	The RPC component may hit this assert because of missing a memory barrier.
Workaround	None

5.0.5.4

Core GPFS

IJ27087

Application runs with I/O priority mapping into a not supported QoS class, which does have IOPS limitation with 1 IOPS, thus leading to I/Os being queued to wait for enough tokens to service the I/O operation. This causes long waiters.

(show details)

Symptom	I/O hang
Environment	Linux (CES nodes)
Trigger	Application runs with lower I/O priority when QoS is being used.
Workaround	None

5.0.5.4

QoS

IJ28608

If call home data collection process was interrupted because of a the power loss, the following data collection of the same schedule will fail due to the directory already existing.

(show details)

Symptom	Component level outage
Environment	Linux
Trigger	Power loss or restart of the Sysmonitor (mmsysmon.py) during a call home data collection
Workaround	Run "mmdsh -N all rm -rf /var/mmfs/tmp/gather//".

5.0.5.4

Call home

IJ28626

Read-only offline fsck reports references to down disk as corrupt which is not correct behavior.

(show details)

Symptom	Error output/message
Environment	All
Trigger	Read only offline fsck with a down disk
Workaround	Ignore fsck report about corrupt references to down disks.

5.0.5.4

FSCK

IJ27414

mmfsck --estimate-only panics at the end of fsck and a new stripe group manager disallows new mounts.

(show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Run multi-node mmfsck --estimate-only while file system is mounted.
Workaround	Start and abort an online fsck for all the nodes that had run the mmfsck --estimate-only command.

5.0.5.4

FSCK

IJ28631

Given an IBM Spectrum Scale cluster with 'verbsRdmaCm' set to 'enable' and configured to use RDMA via RoCE, individual nodes will fail to establish a RDMA connection to other nodes when the IP addresses configured on the RDMA adapters include a non-link local IPv6 address.

(show details)

Symptom	Performance Impact/Degradation
Environment	Linux
Trigger	An IBM Spectrum Scale cluster with 'verbsRdmaCm' set to 'enable' configured to use RDMA via RoCE and RDMA adapters configured with non-link local IPv6 addresses.
Workaround	Remove all non-link local IPv6 addresses from all RDMA adapters which are configured to be used by IBM Spectrum Scale.

5.0.5.4

RDMA

IJ28610

While a node is tryiing to join a cluster, mmfsd start could encounter a null pointer reference and crash with a signal 11 with a backstack that looks like this:

[D] #0: 0x0000559601506BCE RGMaster::getNode FullDomainName(NodeAddr, char**) + 0xAE at ??:0 [D] #1: 0x000055960150CAA2 RGMaster::rgListServers(int, unsigned int) + 0x212 at ??:0 [D] #2: 0x000055960145F21C runTSLsRecoveryGroupV2 (int, StripeGroup*, int, char**) + 0xA8C at ??:0 [D] #3: 0x0000559601460371 runTSLsRecoveryGroup (int, StripeGroup*, int, char**) + 0xB1 at ??:0

(show details)

Symptom	Daemon crashes with signal 11
Environment	Linux
Trigger	A node trying to join a cluster.
Workaround	None

5.0.5.4

GNR

IJ28848

Can not create SMB share using utf8 chars through CLI.

(show details)

Symptom	The mmsmb share name is not created and the command fails.
Environment	Linux
Trigger	A user issuing the mmsmb create command on a non-CES node.
Workaround	You can create the SMB share name (using utf8 character set) on a CES node directly.

5.0.5.4

SMB

IJ28801

A bad extended encryption attribute is in a snapshot file, then attempt to delete that snapshot.

(show details)

Symptom	Unable to delete snapshot.
Environment	Linux
Trigger	A bad extended encryption attribute is in a snapshot file, then attempt to delete that snapshot.
Workaround	None

5.0.5.4

Snapshots, Encryption

IJ28849

On file system with HAWC enabled, data written to disk could be lost after a node failure when using the system call fdatasync() or Ganesha. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Node failure such as file system panic, quorum loss, etc. with HAWC enabled file system
Workaround	Disable HAWC or avoid using the system call fdatasync() to flush the data to disk.

5.0.5.4

HAWC

IJ28877

When file audit logging or watch folder is enabled on a file system, unmounting the file system might result in a waiter that will not clear. This may cause other commands to hang.

(show details)

Symptom	A persistent waiter that causes commands to hang.
Environment	Linux
Trigger	Unmounting the file system that has file audit logging or watch folder enabled.
Workaround	Recycle the daemon.

5.0.5.4

File audit logging, Watch folder

IJ28889

mmhealth does not work on AIX. (show details)

Symptom	Component level outage
Environment	AIX (POWER)
Trigger	All new IBM Spectrum Scale installations on AIX 7.1 and AIX 7.2 are generally affected in a major way (Sysmonitor cannot startup). Upgraded installations have some degraded functions.
Workaround	1. Touch mmsysmon.json 2. Run mmccr fput mmsysmon.json mmsysmon.json 3. Run mmdsh -N all mmsysmoncontrol restart After this workaround, Sysmonitor will be able to startup, but some of its features (e.g. hiding events) will stay broken.

5.0.5.4

System health

IJ29002

If the default replication (-m or -r) setting for a file system is set to 1 and mmvdisk is used to add an additional vdisk set to the file system, an exception will be hit if the --failure-groups option is not used. (show details)

Symptom	Error output/message
Environment	Linux
Trigger	File systems using vdisk based NSDs which have either data or metadata replication set to 1.
Workaround	In some cases, using the --failure-groups option avoids this issue.

5.0.5.4

ESS, GNR

IJ29004

The systemhealth monitor reports data and name nodes as down for the HadoopConService. In fact, both were running.

(show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Running the systemhealth monitor and HadoopConServices
Workaround	None

5.0.5.4

System health

IJ28890

AFM metadata prefetch does not handle hardlinks. (show details)

Symptom	Unexpected results
Environment	Linux
Trigger	AFM metadata prefetch
Workaround	None

5.0.5.4

AFM

IJ28897

When using --skip-inode-check option of offline fsck, it reports false positive extendedAcl corruption.

(show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Offline fsck with --skip-inode-check option
Workaround	Do not use --skip-inode-check option of offline fsck.

5.0.5.4

FSCK

IJ29182

The --metadata-only option hit the assert Assert exp(!"Assert on Structure Error") in prefetch. (show details)

Symptom	Unexpected results
Environment	Linux
Trigger	Using the --metadata-only option with a prefetch list-file
Workaround	None

5.0.5.4

AFM, AFM DR

IJ29210

mmvdisk recovery group fails when creating log vdisks when creating a new recovery group in a cluster with preexisting recovery groups. An error message "Disk XXX is already registered for use by GPFS" will appear on the command console, and the recovery group creation will fail. Once the problem condition is hit, IBM support must be contacted to correct the conflicting cluster information.

(show details)

Symptom	Upgrade/Install failure
Environment	Linux
Trigger	Users who have used mmchrecoverygroup --servers or mmvdisk recoverygroup to change --primary --backup on a preexisting recovery groups in a cluster may be affected when adding a new recovery group to that cluster.
Workaround	Avoid using the "--servers" option in mmchrecoverygroup or the "--primary" and "--backup" options in mmvdisk recovery group change to invoke planned temporary recovery group reassignment for maintenance operations. Instead, use the "--active" option in mmvdisk or mmchrecoverygroup to specify temporary recovery group reassignment.

5.0.5.4

Admin commands, ESS, GNR

IJ29313

Running prefetch stats is failing with err 22.

(show details)

Symptom	Prefetch stats fail with err 22 (Invalid Argument)
Environment	Linux
Trigger	Prefetch stats are triggered on inactive cache fileset state.
Workaround	None

5.0.5.4

AFM

IJ29337

GPFS maintains EA (Extended Attribute) registry to verify the EA priority. Due to incorrect EA registry addition without SG format version check, policy and inode scans might fail in the mixed node cluster environment. This problem could occur while running policy or inode scans in a mixed node environment running with 5.0.5.2, 5.0.5.3, and 5.1.0.0 and other old version nodes as the file system manager. (show details)

Symptom	Unexpected results
Environment	All
Trigger	Running policy or inode scans in a mixed node cluster environment running with 5.0.5.2, 5.0.5.3, and 5.1.0.0 along with older version nodes as the file sytsem manager.
Workaround	None

5.0.5.4

AFM, Core GPFS

IJ28321

logAssertFailed:isValidSocket(sock) line 2661 of file thread.C (show details)

Symptom	mmfsd crash
Environment	All
Trigger	File system unmount operations and disk path rediscovery or NSD disk stats changes
Workaround	There is no stable workaround to avoid this, but stopping the unmount of the file system and a stable NSD server node with stable disks could help.

5.0.5.3

Core GPFS

IJ28305

When the fileset is in the stopped state (mmafmctl device stop -j fileset), modifying the file's metadata such as owner permissions and file times may not be replicated to the home/secondary site after the fileset is restarted (mmafmctl device start -j fileset). (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	AFM fileset is in stopped state and recovery is run later.
Workaround	None

5.0.5.3

AFM, AFM DR

IJ28314

In a distributed IBM Spectrum Scale environment in the presence of repetitive node failures can result in the declustered array becoming stuck in-transition with long waiters. Long waiters may occur, and file system operations may become stalled.

(show details)

Symptom	Hang/Deadlock/Unresponsiveness/Long Waiters
Environment	Linux
Trigger	Occurs in IBM Spectrum Scale RAID configurations involving multiple log groups, such as IBM Spectrum Scale Erasure Code Edition or ESS 3000, that experience repetitive node shutdowns or network partitioning affecting a small subset of nodes in the cluster.
Workaround	Restart the daemon.

5.0.5.3

GNR

IJ28316

AFM list file prefetch hangs. (show details)

Symptom	Unresponsiveness
Environment	Linux
Trigger	AFM list file prefetch
Workaround	Split the list file and rerun prefetch.

5.0.5.3

AFM

IJ28162

AFM replicating extended attributes and ACLs can cause resync/recovery performance issues. (show details)

Symptom	Performance Impact/Degradation
Environment	Linux
Trigger	AFM resync and recovery; also occurs during the regular AFM replication.
Workaround	None

5.0.5.3

AFM, AFM DR

IJ28161

Changing the file system name via mmchfs -W option does not work if the file system manager is on another node that is not a Linux node. This is due to the new device name not being created in /dev at the time the daemon open the file system with the new device name.

(show details)

Symptom	Command failure error message in mmfs.log
Environment	AIX on Power, Windows on x86_64
Trigger	Change the device name of a file system when the file system manager is on another node and that node is not a Linux node.
Workaround	Ensure the next file system appointment is on a Linux node. This can be done by rearranging the list of quorum nodes. If there are no Linux nodes in the cluster, run the mmchfs command on the file system manager node.

5.0.5.3

Core GPFS

IJ27709

When a node becomes a quorum node (via mmchnode command) and this node was a quorum node sometime in the past and a mmfsd/mmsdrserv restart has not occurred on this node since; In that case, the mmchnode command will fail with output like "initialize (1, 'node-11', ('192.168.1.1', 1191)) failed (err 73) mmchnode: 6027-1639 Command failed. Examine previous error messages to determine cause." Also an assertion of the following type occurs on the node which should become a quorum node (GPFS log): "ccrmmfsd.C:806: assertion 'nodeId == ccrNodeId'"

(show details)

Symptom	mmchnode command fails GPFS daemon asserts on the node which should become a quorum node.
Environment	All
Trigger	Changing a node, which was a quorum node sometime in past and is now a nonquorum node to a quorum node again via the mmchnode command. During this entire time the GPFS daemon (mmfsd/mmsdrserv) hasn't seen a restart.
Workaround	Restart the mmfsd (if possible) on the node which should become a quorum node before the mmchnode command is attempted.

5.0.5.3

mmchnode, CCR

IJ27801

The GPFS kernel module exports an ioctl interface used by the mmfsd daemon and some of the mm* commands. The provided improvements result in a more robust functionality of the kernel module.

(show details)

Symptom	Unexpected Results/Behavior
Environment	Linux
Trigger	NA
Workaround	None

5.0.5.3

Core GPFS

IJ27929

When recycling the recovery group nodes, access to the recovery groups can be lost, even if the nodes are recycled one at a time.

(show details)

Symptom	File system outage
Environment	Linux
Trigger	Recycling active recovery group servers
Workaround	Recycle the nodes manually.

5.0.5.3

GNR

IJ27922

A race condition within GNR fast write logging mechanism could lead to a thread incorrectly considering itself as the candidate to start zone flushing operations, while there are still other threads that have not yet completed the fast (short) writing operation yet in this zone. The assert stopped the flushing operation. (show details)

Symptom	Assert will be triggered when this problem occurs: lbZoneP[zone].zHoldWord == 0
Environment	Linux
Trigger	The problem normally occurs when multiple short write requests are in process at the same time. Some threads with bigger request size, while others with smaller request size.
Workaround	None

5.0.5.3

GNR, ECE, ESS 3000

IJ27711

When a metanode receives many RPC requests from the clients nodes, it is possible for mutex contention to occur which in turn can lead to high CPU usage. This could happen when many nodes share access to the same file/directory such as the root directory of the file system.

(show details)

Symptom	Performance Impact/Degradation
Environment	All
Trigger	Many nodes share read access to same file and or directory
Workaround	Set the preferDesignatedMnode configuration variable to 1. Set the mnodeRemoteAccessThreadhold configuration variable to 0.

5.0.5.3

Core GPFS

IJ25651

GPFS shuts down with the following message "logAssertFailed: holdCount > 0" found in the GPFS log file

(show details)

Symptom	GPFS will abruptly shut down with the following message "logAssertFailed: holdCount > 0" found in the GPFS log file.
Environment	All
Trigger	Heavy pagepool activity, particularly with heavy page steal activity and also low memory configurations with compacted objects
Workaround	Reduce the pagepool contention by tuning the workload and increasing the pagepool size.

5.0.5.3

Core GPFS

IJ26952

logAssertFailed: isValidSocket(sock): This assertion goes off when the sock file descriptor is too large

(show details)

Symptom	Abend/Crash
Environment	AIX on Power, Linux
Trigger	Socket file descriptor leak
Workaround	None

5.0.5.3

Core GPFS

IJ26702

mmauth grant sets the local file system (which is remote for cache) access to Read-only access. So when you access the fileset over the control file, it returns a E_ROFS error from remote file system and it logs irrelevant information.

(show details)

Symptom	Logged the error messsage "AFM is not enabled".
Environment	All
Trigger	When the access grant for the file system is Read-only at home.
Workaround	None

5.0.5.3

AFM

IJ27144

When reconfiguring Object protocol authentication using the mmuserauth command, the command may occasionally hang while waiting for systemctl to shutdown a service. Looking at the process table may show systemctl waiting for the child process "pkttyagent" to complete.

(show details)

Symptom	Hang/Deadlock/Unresponsiveness/Long Waiters
Environment	Red Hat Enterprise Linux versions 7.7 and later
Trigger	This issue affects customers running IBM Spectrum Scale with Object protocol on Red Hat Enterprise Linux version 7 at versions 7.7 and later.
Workaround	Since the problem is intermittent, stopping the mmuserauth command and retrying it may result in the command successfully completing.

5.0.5.3

Object

IJ26697

When multiple applications on a single node perform readdir and lookups on the same directory in a loop, it could lead to token starvation on other nodes trying to perform rename/create operation on the same directory. This will show up as slow application performance on affected nodes.

(show details)

Symptom	Performance Impact/Degradation
Environment	All
Trigger	Running multiple application that repeatedly perform readdir/lookup (ls -l) on the same directory.
Workaround	None

5.0.5.3

Core GPFS

IJ26830

If a file is unlinked after opening it, fallocate(2) on that fd will fail with ENOENT. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Unlinking a file after opening it, then call fallocate(2) on that fd.
Workaround	None

5.0.5.3

Core GPFS

IJ27150

Disable enableIpv6 by using mmchconfig enableIpv6=no does not work. The command treats the no value like yes value.

(show details)

Symptom	Error output/message
Environment	All
Trigger	Run mmchconfig enableIpv6=no
Workaround	Instead of mmchconfig enableIpv6=no, issue mmchconfig enableIpv6=default

5.0.5.3

Admin commands

IJ27241

On a Linux node defined to have the GPFS node role of snmp_collector, the GPFS control scripts might not properly determine if the GPFS Net-SNMP subagent is running. Consequently, multiple GPFS Net-SNMP subagent processes might be running at the same time. This might affect the ability of the snmp_collector node to respond to SNMP queries or send SNMP traps. Recurring error messages might be found in GPFS log files showing a GPFS Net-SNMP subagent is unable to register with the Net-SNMP agent (snmpd).

(show details)

Symptom	Error output/message
Environment	All
Trigger	GPFS Net-SNMP subagent processes left over from shutdown.
Workaround	Manually kill all left over GPFS Net-SNMP subagent processes on shutdown.

5.0.5.3

SNMP, Admin commands

IJ27249

The ECE disk inventory utility to list out all the disk slots in the system could hit an exception when one of the LSI adapters doesn't have a disk. This may result in the missing slot locations for the pdisks. The slot location is important to identify the disk drive in disk replacement, so it may cause the disk replacement to fail. If multiple disks fail without being replaced, it may risk the data reliability of the ECE storage system.

(show details)

Symptom	Error output/message Cluster/File System Outage
Environment	Red Hat Enterprise Linux
Trigger	NA
Workaround	When disk replacement fails due to this problem, contact IBM support to apply work around steps for disk replacement.

5.0.5.3

GNR, ECE

IJ26972

Grafana bridge returns "- ERROR - Metric mynode|GPFSNSDFS|exfld|gpfs_nsdfs_bytes_read cannot be found. Please check if the corresponding sensor is configured - "

(show details)

Symptom	Error message
Environment	All
Trigger	Using grafana bridge
Workaround	None

5.0.5.3

Core GPFS

IJ27264

The check for mismatched NSDs in a storage pool may fail if the regular NSD was created with a pool designation different than the one used in the mmadddisk stanza file.

(show details)

Symptom	Error output/message
Environment	All
Trigger	Adding regular NSDs to a filesystem which contains vdisk-based NSDs into a new pool which does not match the pool used when the NSD was created.
Workaround	Use the --force-nsd-mismatch option.

5.0.5.3

GNR, ESS

IJ27236

In certain conditions, the monitoring code will not work properly resulting in an erroneous state being shown.

(show details)

Symptom	Error output/message Unexpected Results/Behavior
Environment	All
Trigger	CES policy is "node-affinity" and ces-groups are declared.
Workaround	Switch CES Policy to "even-coverage".

5.0.5.3

System health

IJ27285

Remote mount is not responsive and the control file failed(-1) to set. Due to this, it is returning E_STALE and the lookup gets requeued with E_RESTART every time.

(show details)

Symptom	ls command hung and failed to return.
Environment	All
Trigger	ls command is executed on the fileset root.
Workaround	None

5.0.5.3

AFM

IJ27339

Notification messages for TLS socket disconnect as a result of a peer's idle disconnect were printed in the mmfs log file and the sys log, Creating confusion whether there is a real problem or not.

(show details)

Symptom	Error output/message
Environment	All
Trigger	Expiration of the idleSocketTimemout timer
Workaround	None

5.0.5.3

Authentication

IJ27366

With parallel IO enabled, it is possible that all the afmMaxWorkerThreads are used and there are no threads to handle the parallel IO responses. (show details)

Symptom	Deadlock
Environment	All
Trigger	AFM with parallel IO under heavy workload
Workaround	Disable parallel IO.

5.0.5.3

AFM, AFM DR

IJ27375

Windows offline bit is set on the directory after AFM replication.

(show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	AFM revalidation in modes such as IW/LU/RO
Workaround	None

5.0.5.3

AFM

IJ27003

AFM sets incorrect creation time on symlinks during the migration. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	AFM caching and migration
Workaround	None

5.0.5.3

AFM

IJ27038

AFM incorrectly sets secondary mode fileset permissions to primary mode fileset permissions during a resync operation. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Running changeSecondary command when secondary and primary fileset have different permissions.
Workaround	Change the secondary fileset permissions to the primary fileset permissions before running the changeSecondary command.

5.0.5.3

AFM

IJ27646

mmadquery fails with error - size limit exceeded

(show details)

Symptom	Error output/message
Environment	Linux
Trigger	AD server having large number of entries (> 1000)
Workaround	None

5.0.5.3

Authentication

IJ27683

With parallel IO enabled, the PIO response handler thread on MDS might deadlock due to an EIO error from the helper gateway node. (show details)

Symptom	Deadlock
Environment	Linux
Trigger	AFM with parallel IO under heavy workload
Workaround	Disable parallel IO.

5.0.5.3

AFM, AFM DR

IJ16663

When multiple applicatons on a single node perform readdir and lookups on the same directory in a loop, it could lead token starvation on other nodes trying to operate on the same directory. This will result in as slow application performance on affected nodes.

(show details)

Symptom	Performance Impact/Degradation
Environment	All
Trigger	Running multiple application that repeatedly perform readdir/lookup (ls -l) on the same directory.
Workaround	None

5.0.5.2

Core GPFS

IJ22152

There is no automatic method to generate the GPL installable package for customized RHEL release.

(show details)

Symptom	NA
Environment	RHEL
Trigger	NA
Workaround	Follow the steps in /usr/lpp/mmfs/src/README.

5.0.5.2

Core GPFS

IJ26349

For a DMAPI enabled file system, migrating files into an external storage pool may cause problems with snapshot files. In some cases, it might assert "okToIncreaseIndLevel". In another case when DMAPI is not enabled, adding extended attributes to a sparse file may trigger the same assert. (show details)

Symptom	Daemon crash with assert "okToIncreaseIndLevel"
Environment	All
Trigger	Using snapshots with a DMAPI enabled file system and migrating files to external storage pools.
Workaround	None

5.0.5.2

Snapshots, DMAPI

IJ24387

For mmces related log files, set mode bits to 622.

(show details)

Symptom	Non root user can accidentally delete these files.
Environment	Linux
Trigger	NA
Workaround	Manually change the file access.

5.0.5.2

CES

IJ25803

open(2) with O_NOATIME flag may not work as expected. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	open(2) with O_NOATIME flag
Workaround	None

5.0.5.2

Core GPFS

IJ26355

Inode operations ->set_acl and ->get_acl are not supported in GPFS, and starting with kernel v3.14 some commands such as nfs4_setfacl may fail.

(show details)

Symptom	Error output/message
Environment	Linux with kernel version 3.14 or later
Trigger	Missing functions in the GPFS kernel portability layer
Workaround	None

5.0.5.2

Core GPFS

IJ26510

GPFS daemon crashes and file system gets unmounted. The GPFS daemon crashes because of daemon code hit an assert that indicates "the allocated disk address is not expected". (show details)

Symptom	GPFS daemon crashes with logAssertFailed: dataBlockNum == lastDataBlock \|\| !newDA.isALLOC(), file system unmounted
Environment	All
Trigger	The assert went off during file sync as it detected that a middle data block of a file has a fragment block address (DA) - it's illegal; only the last data block can be a fragment.
Workaround	None

5.0.5.2

Core GPFS

IJ26341

GPFS Access Control Lists (ACL) can only store limited types of Access Control Entries (ACE), specifically plain Access-Allowed-ACE, Access-Denied-ACE, System-Audit-ACE, and System-Alarm-ACE. GPFS does not support storing of any of the Object-specific-ACEs corresponding to ACL_REVISION_DS. An attempt to set an ACL (containing the unsupported ACE types, such as the Object-specific-ACEs), can result in a kernel bugcheck.

(show details)

Symptom	Abend/Crash
Environment	Windows
Trigger	The ACL being set on a GPFS file or directory contains an unsupported ACE type, such as Object-specific-ACE.
Workaround	None

5.0.5.2

Windows, ACLs

IJ26342

If incorrect arguments are passed to tschcarrier, a cleanup routine trys to remove some in-memory objects which are not created and this leads to a segmentation fault. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Customers running tschcarrier directly
Workaround	None

5.0.5.2

GNR

IJ26356

When a data management application receives a DMAPI postrename event, it fails to get the file handle for the renamed file with a "no such file" error. This is because Spectrum Scale is delivering a DMAPI postrename event before the Linux kernel updates its directory lookup cache for the file being renamed.

(show details)

Symptom	The data management application gets a ENOENT error if it is trying to get the file handle for the renamed file immediately after receiving a DMAPI postrename event.
Environment	All
Trigger	DMAPI enabled Spectrum Scale file system and a DMAPI postrename event
Workaround	Let your data management application take a short sleep and retry to get the file handle of the renamed file name.

5.0.5.2

DMAPI

IJ26022

After upgrading the Spectrum Scale version to 5.0.4.4 or later, hit this assert (logAssertFailed: status == ASumWriting) on the upgraded node when it takes over the file system manager role.

(show details)

Symptom	Abend/Crash
Environment	All
Trigger	Upgrade to 5.0.4.4 or later and move the file system manager to the newly upgraded node.
Workaround	None

5.0.5.2

Core GPFS

IJ26358

Eviction is failing to execute if there is a space character in the path name.

(show details)

Symptom	Data is not evicted from the site.
Environment	All
Trigger	A space character in the path name
Workaround	None

5.0.5.2

AFM

IJ26423

When adding a vdisk set which contains multiple node classes to a file system, some node classes may be omitted.

(show details)

Symptom	Unexpected result
Environment	Linux
Trigger	Vdisk sets which contain multiple node classes.
Workaround	Run the mmvdisk fs add command again.

5.0.5.2

GNR

IJ26438

For pre-4.1 file systems, after the mmquotaoff command deactivates user/group/fileset quota, the old quota file will be deinstalled and converted to a normal file. If the system pool cannot contain the data, the old quota file will need to be moved from the system pool to a data pool. If the file system has DMAPI enabled (-z yes), the deinstallation process will encounter assert exp(context != unknownOp) in moveDataBetweenPools.

(show details)

Symptom	Abend/Crash
Environment	All
Trigger	mmquotaoff, a pre-4.1 file system, DMAPI enabled
Workaround	Use mmchfs -z no to disable DMAPI first before running mmquotaoff command.

5.0.5.2

Quotas

IJ26436

The Spectrum Scale mmbackup command translates include/exclude options into policy rules for backup candidate file lists. If the path name specified in the include/exclude option contains any white space, mmbackup translates it incorrectly because space is used as the default delimiter in "dsmc query inclexcl" output.

(show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Using path name with white space in Spectrum Protect include/exclude options
Workaround	Use customized policy rule.

5.0.5.2

mmbackup

IJ26348

A race condition between the RG master resign/recovery and the mdi operation on the worker side could lead to a bug in the RG master recovery and cause the the working index (WI) entry to be stuck in the Assigned state. This would further cause the integrity manager thread to block and cause long waiter "wait for working index entry to be committed" on the RG master node. Under this state, it could lead to data integrity issues when the next RG master resign/recovery event occurs. Note that this could only happen in ECE/ESS 3000 environment, and not in legacy ESS environment.

(show details)

Symptom	The immediate effect will be that on the RG (root owner node) a long waiter will be observed: "wait for working index entry to be committed". Under this state, if the metadata block related to this working index is updated, RG recovery could cause metadata roll back and cause data integrity issues, or the other possible outcome is recovery failure which renders a failure message as "MDI VCD recovery failure in build used metaslot bitmaps: 16". Note that this long waiter could be caused by other situations, and not all cases will result data integrity issue.
Environment	Linux
Trigger	This is normally exposed by a race condition when the RG master resign/recovery event happens in the middle of the worker side mdi operation, specifically when worker's RPC which modifies mdi state, interleaves with RG master's RPC which inquires about the mdi state.
Workaround	The immediate bug does not cause a problem right away, but it does expose a condition that additional events could lead to a recovery failure. When the recovery failure is detected, data has already been corrupted. It will require special steps to allow recovery to complete but in the meantime a certain amount of data will be lost.

5.0.5.2

GNR, ECE, ESS 3000

IJ26789

This is a deadlock issue between the file deletion thread and another thread that is finding the inode from the Linux kernel inode hash list while holding the scale file lock. The other thread is mostly doing DMAPI calls. (show details)

Symptom	Abend/Crash
Environment	Linux with kernel version 4 and later
Trigger	Deleting files managed by DMAPI
Workaround	None

5.0.5.2

DMAPI

IJ26835

Calling dm_path_to_handle API from DMAPI RENAME event handler could run into a deadlock on the i_mutex lock that is obtained when the file's parent is renamed on V3.0.x Linux kernel versions.

(show details)

Symptom	Deadlock
Environment	Linux
Trigger	Calling dm_path_to_handle against a file being renamed from a DMAPI RENAME event handler
Workaround	None

5.0.5.2

DMAPI

IJ26679

Rare deadlock causing mmrestripefs to hang under high load

(show details)

Symptom	Hang/Deadlock/Unresponsiveness/Long Waiters
Environment	All
Trigger	Running mmrestripefs under high load.
Workaround	Run mmrestripefs during off-hours.

5.0.5.2

Policy, ILM restripe, Rebalance

IJ25547

GPFS daemon crashes, file system gets unmounted on this node. The GPFS daemon crashes because of daemon code hit an assert which indicates the allocated disk address is not expected. (show details)

Symptom	GPFS daemon crashes with logAssertFailed: dataBlockNum == lastDataBlock \|\| !newDA.isALLOC(), file system unmounted.
Environment	All
Trigger	The assert went off during file sync as it detected that a middle data block of the file has a fragment block address (DA) - it's illegal, we only can have the last data block be a fragment.
Workaround	None

5.0.5.2

Core GPFS

IJ25843

If a multi-threaded program reads or writes to a file in regular or mmap mixed mode, it may assert with "logAssertFailed: TokenPermits(get_token_mode(), lm_want)"

(show details)

Symptom	Abend/Crash
Environment	All
Trigger	Multi-threaded program reads or writes to a file in regular or mmap mixed mode
Workaround	None

5.0.5.2

Core GPFS

IJ26434

The EA overflow block is a metadata block that should be read using a continuous buffer, but due to a code error, it is considered to be a data block, so a scatter buffer is used which causes a log assert failure.

(show details)

Symptom	Abend/Crash
Environment	All
Trigger	If the file's EA entries cannot be placed in the inode, GPFS will place it in a per file EA overflow block. The size of the EA overflow block depends on how many EA entries the file has. If the scatterBufferSize value is smaller than the EA overflow block size, GPFS may hit log assert when the user accesses the file.
Workaround	If the size of scatter buffer is bigger than or equal to the EA overflow block size, the log assert can be avoid. Before GPFS version 5.0.0, the default scatterBufferSize value is 32KB that is small and more likely to hit this log assert. Changing the scatterBufferSize value to 256KB could be a work around.

5.0.5.2

Core GPFS

IJ26921

When deleting an old snapshot or accessing snapshot files from an old snapshot, the operations could run into a out-of-stack space error if a snapshot file contains an overflow block. This causes the mmfsd process to crash with a memory fault error. (show details)

Symptom	Daemon crash
Environment	All
Trigger	Accessing snapshot files from older snapshots
Workaround	None

5.0.5.2

Snapshots

IJ25905

When a new fileset is created, the in-doubt quota shares for that fileset should start out at 0. In some situations, a fileset can be created and start out with a non-zero in-doubt value. This can occur when a fileset had been deleted previously, and then the new fileset re-uses the same ID that the deleted fileset had used.

(show details)

Symptom	NA
Environment	All
Trigger	NA
Workaround	Running mmcheckquota will reset all of the in-doubt values.

5.0.5.2

Quotas

IJ26301

mmadddisk hits assert exp(isStoragePoolIdValid(poolId)) when trying to open a disk due to stripe group descriptor update.

(show details)

Symptom	Abend/Crash
Environment	All
Trigger	Creating and or deleting storage pools
Workaround	None

5.0.5.2

Core GPFS

IJ25906

After an upgrade from 5.0.4.3 to 5.0.5.0, none of the CES IP addresses can be assigned. (show details)

Symptom	Hang of "mmces address add". Other internal CES IP processing could also hang.
Environment	Linux (protocol nodes)
Trigger	In 5.0.4.3: 2020-04-10_20:44:52.462-0400: [I] mmcesop: run: /usr/sbin/ip a add 192.168.100.254/24 dev enp11s0f1.2500:0 rc=0 In 5.0.5.0: 2020-06-09_17:36:34.014-0400: [I] mmcesop: run: /usr/sbin/ip a add 192.168.100.254/24 dev enp11s0f1.2500:0 label enp11s0f1.2500:0 RTNETLINK answers: Numerical result out of range rc=2
Workaround	None

5.0.5.2

CES

IJ34824

The Ganesha process crashes or the Ganesha work pool threads hang. The crash occurs when nfs4_acl_release_entry() calls hashtable_getlatch(). See the Symptom section of this APAR to view the stack traces for both scenarios. (show details)

Symptom	Abend/Crash The Ganesha process crashed with this stack trace: (gdb) bt #0 raise (sig=11) #1 0x0000000000443529 in crash_handler (signo=11, info=0x7f7a4cdc7570, ctx=0x7f7a4cdc7440) #2 #3 0x000000000052ac40 in UNALIGNED_LOAD64 #4 0x000000000052ac7b in Fetch64 #5 0x000000000052b46c in CityHash64 (s=0x0, len=20225737280) #6 0x00000000004fe949 in fsal_acl_hash_both (hparam=0x1c81700, key=0x7f7a4cdca2e0, index=0x7f7a4cdca264, rbthash=0x7f7a4cdca250) #7 0x00000000004f1f5a in compute (ht=0x1c81700, key=0x7f7a4cdca2e0, index=0x7f7a4cdca264, rbt_hash=0x7f7a4cdca250) #8 0x00000000004f2946 in hashtable_getlatch (ht=0x1c81700, key=0x7f7a4cdca2e0, val=0x7f7a4cdca2c0, may_write=true, latch=0x7f7a4cdca2a0) #9 0x00000000004ff415 in nfs4_acl_release_entry (acl=0x7f7e3c005600) #10 0x000000000053cda2 in fsal_release_attrs (attrs=0x7f7dec285798) #11 0x000000000053e2fa in mdcache_lru_clean (entry=0x7f7dec2856d0) #12 0x00000000005424d0 in mdcache_lru_get (sub_handle=0x7f7b9c9e2b10) #13 0x0000000000551884 in _mdcache_alloc_handle (export=0x1ceebc0, sub_handle=0x7f7b9c9e2b10, fs=0x1cd9120, reason=MDC_REASON_DEFAULT, func=0x5c2cb0 <__func__.24082> "mdcache_new_entry", line=709) #14 0x0000000000553563 in mdcache_new_entry (export=0x1ceebc0, sub_handle=0x7f7b9c9e2b10, attrs_in=0x7f7a4cdcaec0, attrs_out=0x0, new_directory=false, entry=0x7f7a4cdcae38, state=0x0, reason=MDC_REASON_DEFAULT) #15 0x0000000000547c6f in mdcache_alloc_and_check_handle (export=0x1ceebc0, sub_handle=0x7f7b9c9e2b10, new_obj=0x7f7a4cdcafd0, new_directory=false, attrs_in=0x7f7a4cdcaec0, attrs_out=0x0, tag=0x5c1a4c "lookup ", parent=0x7f7e3ccb4f80, name=0x7f7cf47b03c0 "ndjsmsdxscm05_2106250501_NfsStats.1.gz", invalidate=0x7f7a4cdcaebf, state=0x0) #16 0x000000000055572f in mdc_lookup_uncached (mdc_parent=0x7f7e3ccb4f80, name=0x7f7cf47b03c0 "ndjsmsdxscm05_2106250501_NfsStats.1.gz", new_entry=0x7f7a4cdcb140, attrs_out=0x0) #17 0x000000000055bdfb in mdcache_readdir_chunked (directory=0x7f7e3ccb4f80, whence=1853889278, dir_state=0x7f7a4cdcb2f0, cb=0x434608 , attrmask=122830, eod_met=0x7f7a4cdcb51f) #18 0x000000000054981f in mdcache_readdir (dir_hdl=0x7f7e3ccb4fb8, whence=0x7f7a4cdcb2d0, dir_state=0x7f7a4cdcb2f0, cb=0x434608 , attrmask=122830, eod_met=0x7f7a4cdcb51f) #19 0x0000000000434f1b in fsal_readdir (directory=0x7f7e3ccb4fb8, cookie=1853889278, nbfound=0x7f7a4cdcb510, eod_met=0x7f7a4cdcb51f, attrmask=122830, cb=0x47c32a , opaque=0x7f7a4cdcb4b0) #20 0x000000000047d9cb in nfs4_op_readdir (op=0x7f7b9c14ea10, data=0x7f7a4cdcbdd0, resp=0x7f7b9cb64440) #21 0x00000000004614f2 in nfs4_Compound (arg=0x7f7b9c74f868, req=0x7f7b9c74f160, res=0x7f7b9c1426f0) #22 0x000000000045e355 in nfs_rpc_process_request (reqdata=0x7f7b9c74f160) (gdb) The Ganesha work pool threads hang with this stack trace: (gdb) bt #0 futex_abstimed_wait (private=0, abstime=0x0, expected=3, futex_word=0x7feca00186dc) #1 __pthread_rwlock_wrlock_full (abstime=0x0, rwlock=0x7feca00186d0) #2 __GI___pthread_rwlock_wrlock (rwlock=0x7feca00186d0) #3 0x00000000004ff09e in nfs4_acl_release_entry (acl=0x7feca00186c0) #4 0x000000000053cda2 in fsal_release_attrs (attrs=0x7feb5064b688) #5 0x000000000053e2fa in mdcache_lru_clean (entry=0x7feb5064b5c0) #6 0x00000000005424d0 in mdcache_lru_get (sub_handle=0x7fea4c54d860) #7 0x0000000000551884 in _mdcache_alloc_handle (export=0xede340, sub_handle=0x7fea4c54d860, fs=0xeccc00, reason=MDC_REASON_SCAN, func=0x5c2cb0 <__func__.24082> "mdcache_new_entry", line=709) #8 0x0000000000553563 in mdcache_new_entry (export=0xede340, sub_handle=0x7fea4c54d860, attrs_in=0x7fe8c8cc9940, attrs_out=0x0, new_directory=false, entry=0x7fe8c8cc9818, state=0x0, reason=MDC_REASON_SCAN) #9 0x000000000055840b in mdc_readdir_chunk_object (name=0x7fe8c8cc9ae3 "jbonura1.core.21884.gz", sub_handle=0x7fea4c54d860, attrs_in=0x7fe8c8cc9940, dir_state=0x7fe8c8cc9ef0, cookie=30487614) #10 0x0000000000559530 in mdc_readdir_chunked_cb (name=0x7fe8c8cc9ae3 "jbonura1.core.21884.gz", sub_handle=0x7fea4c54d860, attrs=0x7fe8c8cc9940, dir_state=0x7fe8c8cc9ef0, cookie=30487614) #11 0x00007fece215d203 in read_dirents () #12 0x0000000000559d2f in mdcache_populate_dir_chunk (directory=0xec76f0, whence=28136498, dirent=0x7fe8c8cca160, prev_chunk=0x0, eod_met=0x7fe8c8cca15f) #13 0x000000000055b10b in mdcache_readdir_chunked (directory=0xec76f0, whence=28008299, dir_state=0x7fe8c8cca2f0, cb=0x434608 , attrmask=122830, eod_met=0x7fe8c8cca51f) #14 0x000000000054981f in mdcache_readdir (dir_hdl=0xec7728, whence=0x7fe8c8cca2d0, dir_state=0x7fe8c8cca2f0, cb=0x434608 , attrmask=122830, eod_met=0x7fe8c8cca51f) #15 0x0000000000434f1b in fsal_readdir (directory=0xec7728, cookie=28008299, nbfound=0x7fe8c8cca510, eod_met=0x7fe8c8cca51f, attrmask=122830, cb=0x47c32a , opaque=0x7fe8c8cca4b0) #16 0x000000000047d9cb in nfs4_op_readdir (op=0x7fea4c38d780, data=0x7fe8c8ccadd0, resp=0x7fea4cf0b6a0) #17 0x00000000004614f2 in nfs4_Compound (arg=0x7fea4cd3a338, req=0x7fea4cd39c30, res=0x7fea4cb18cf0) #18 0x000000000045e355 in nfs_rpc_process_request (reqdata=0x7fea4cd39c30)
Environment	Linux
Trigger	The issue mostly occurs when the system is loaded with heavy workloads and mdcache hits high watermark.
Workaround	None

5.0.5.2

NFS

IJ25321

Signal 11 occurs while mmdiag --threads is running. For example: [E] Signal 11 at location 0x55F6FF53A47D in process 7840 (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Running mmdiag --threads
Workaround	None

5.0.5.1

Core GPFS

IJ25146

A race condition between disks having errors and recovery groups or log groups resigning could lead to a bug in the GNR log: 'vtrack recovery failed to scrub and repair the stale data on the disk'. It can further lead to data corruption if all good copies of the mirrored data are lost. (show details)

Symptom	This problem could occur either to the fast write log or the VCD log, and the impact will be failure to complete recovery. The log will indicate: 2020-05-29_16:25:21.693-0700: [E] Recovery failure: at least 3400 sectors of the fast-write log for LG root of RG RG1 could not be recovered. 2020-05-29_16:25:21.693-0700: [E] Beginning to resign log group root in recovery group RG1 due to "recovery failure", caller err 333 when "recovering log group worker" or 2020-04-29_09:04:42.567-0400: [E] Recovery failure: at least 41616 sectors of the VCD log for LG LG001 of RG rg1 could not be recovered. 2020-04-29_09:04:42.567-0400: [E] Beginning to resign log group LG001 in recovery group rg1 due to "recovery failure", caller err 333 when "recovering log group worker"
Environment	NA
Trigger	This is normally exposed by a race condition between a number of disks gradually hitting errors causing recovery groups or log groups to resign. Some strips of the log vtrack might not be marked as stale, and this could lead to the other strips copying the stale data by mistake.
Workaround	None

5.0.5.1

GNR

IJ25463

GPFS daemon assert: exp(this->mutexMagic == MUTEX_MAGIC_VALID) dSynch.C. This could occur during file system unmount. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Unmounting a file system
Workaround	None

5.0.5.1

Core GPFS

IJ25468

GPFS daemon assert: ofP->mnodeStatusIs(0x4) || ofP->mnodeStatusIs(0x2) && indAccessLock.isLockedExclusive() in sync.C (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Multiple nodes accessing or updating the same file
Workaround	None

5.0.5.1

Core GPFS

IJ25714

Potential deadlock when a file is accessed concurrently via mmap and regular file access methods. (show details)

Symptom	Hang/Deadlock/Unresponsiveness/Long Waiters
Environment	All
Trigger	On a system with insufficient free memory, a multi-threaded application accesses the same file in both mmap and regular read/write methods
Workaround	None

5.0.5.1

Core GPFS

IJ25469

GPFS daemon failed to start with "Cannot allocate memory" error when prefetchThreads is set to less than 3. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	Setting prefetchThreads to less than 3
Workaround	Set prefetchThreads to 3 or higher.

5.0.5.1

Core GPFS

IJ25478

GPFS daemon assert: !"Log file migrate check failed: need" in sgmdata.C. This could happen during mmrestripefs/mmdeldisk/mmrpldisk command. (show details)

Symptom	Abend/Crash
Environment	All
Trigger	File system panic on a client node while running mmrestripefs/mmdeldisk/mmrpldisk command
Workaround	Rerun the command.

5.0.5.1

Core GPFS

IJ25511

Under rare circumstances, all quorum nodes could be expelled in case the current cluster manager is expelled due to an error on network level on the current cluster manager node, which results in a cluster wide quorum loss. This applies only in the case RDMA has been activated and all GPFS RPCs are going over RDMA (verbsPorts,verbsRdma and verbsRdmaSend must be set). The current cluster manager will be expelled due to the network error (as expected) and the new elected cluster manager cannot make progress during its following group protocol, because it waits for a 10 seconds linger timeout down in the CCR, when a cached socket connection to the former cluster manager gets closed. This way all quorum nodes will be expelled. (show details)

Symptom	Node expel/Lost Membership/Quorum loss
Environment	Linux x86_64
Trigger	Error inject on network level (daemon IP address) of the current cluster manager node when RDMA is used for all GPFS RPCs
Workaround	None

5.0.5.1

RDMA, Cluster Membership, Cluster Manager

IJ22372

If the gpfsready script fails during GPFS startup, GPFS goes down automatically. This causes GPFS shutdown to hang for 5 minutes. (show details)

Symptom	Hang during mmfsd startup
Environment	Linux, AIX
Trigger	mmfsd startup and 'verifyGpfsReady' is set to 'yes'
Workaround	None

5.0.5.1

mmfsd startup

IJ25555

Input file with carriage return causes mmfileid to fail with arithmetic syntax error. (show details)

Symptom	Error output/message
Environment	All
Trigger	Carriage return characters in an input file
Workaround	Manually ensure that all input files are free of carriage return characters.

5.0.5.1

Core GPFS

IJ24043

The mmunlinkfileset command hangs and long waiter "waiting to quiesce" appears. A thread is hung and waiting inside the gpfs_s_delete_inode kernel extension routine. (show details)

Symptom	Hang/Deadlock/Unresponsiveness/Long Waiters
Environment	All
Trigger	Run the mmunlinkfileset command (or commands which need file system quiesce like mmcrsnapshot and mmdelsnapshot) and delete files.
Workaround	None

5.0.5.1

Filesets, Snapshots

IJ25557

fallocate(2) will set the wrong file size if the file has fragments and the end position of the fallocate range fits in the last block. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	fallocate(2) on a file which has fragments and the end position of the fallocate range fits in the last block
Workaround	None

5.0.5.1

Core GPFS

IJ25587

File system could panic with an error code 2 during the unmount process. This could happen if mmdelsnapshot command is running at the time of the unmount. (show details)

Symptom	Error output/message
Environment	All
Trigger	Unmounting the file system while the mmdelsnapshot command is still running
Workaround	Avoid unmounting the file system while the mmdelsnapshot command is still running.

5.0.5.1

Snapshots

IJ24352

LogAssert in file vnodeops.C (oldCount >= 0) (show details)

Symptom	Abend/Crash
Environment	AIX
Trigger	Unclear
Workaround	None

5.0.5.1

Snapshots

IJ25591

The time of the certificateExpiration field of mmkeyserv -Y output is not correct. (show details)

Symptom	Error output/message
Environment	All
Trigger	mmkeyserv -Y option
Workaround	None

5.0.5.1

Admin commands

IJ24499

When attempting to move or rename a file from a source to a destination, and the destination file already exists and has permissions set such that its deletion is not allowed, the move/rename operation wrongly ends up overwriting the destination file. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	The destination of the move/rename operation must already exist and have ACLs set such that its deletion is not allowed.
Workaround	None

5.0.5.1

ACLs

IJ24742

If root squash is enabled on a remote file system, the root user is remapped to a user specified UID and AFM fails to access the remote file system. (show details)

Symptom	Unexcepted results
Environment	Linux
Trigger	Accessing a remote file system using AFM while root squash is enabled
Workaround	None

5.0.5.1

AFM, AFM DR

IJ25388

A configuration variable "sharedTmpDir" is to instruct mmapplypolicy where to create temporary files shared among nodes during a policy scan. The internal configuration variable table missed containing this variable, which caused an error during GPFS daemon initialization but it does not result in any functional problem. (show details)

Symptom	Error message
Environment	All
Trigger	Restarting the GPFS daemon after configuring "sharedTmpDir"
Workaround	None

5.0.5.1

Policy, ILM

IJ25592

The timeout option provided by the mmkeyserv command allows a system admin to fine tune the communication with the key server by providing a timeout value. This timeout value does not apply to the communication with the key server occurring during the execution of the mmkeyserv command, creating the potential of the command returning timeout errors. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Network congestion or resource starvation
Workaround	Ensure that the host where the mmkeyserv command is invoked has good connectivity to the key server.

5.0.5.1

Admin commands

IJ25600

The GPFS kernel module's cryptographic functionality used block chaining ciphers that were previously deprecated. Newer versions of the Linux distro removed the deprecated ciphers and, consequently, the block chaining ciphers are not available any longer. On those distros, the GPFS kernel module is updated to use symmetric key ciphers. (show details)

Symptom	IO error
Environment	Linux
Trigger	Use of asynchronous direct I/O with kernels that do not provide block chain ciphers. You can look at /proc/crypto and see if there are any cbc(aes) ciphers of blkcipher type. If so, that will be used. If cbc(aes) is of type skcipher, that will be used. On zLinux, both bkcipher and skcipher alg types are available for cbc(aes); however, the blkcipher alg will be used as it is hardware accelerated.
Workaround	None

5.0.5.1

Encryption

IJ25601

If a file is migrated to a cloud using TCT, accessing the file in snapshot will not show any contents. (show details)

Symptom	No data seen on read
Environment	Linux
Trigger	Accessing a TCT migrated file thru one of the snapshots
Workaround	None

5.0.5.1

TCT

IJ25613

In the unlikely scenario that the GPFS configuration file (mmfs.cfg) becomes corrupted, the mmfsd daemon may be affected. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	A corrupted mmfs.cfg file
Workaround	None

5.0.5.1

Core GPFS

IJ25617

Even though the tenant contains no keys, it cannot be deleted when there are other clients registered to it. On the same cluster, the client should have already deregistered. So, it is likely that the registered client is from another cluster. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	Deleting the tenant when there are other clients registered to it
Workaround	Deregister all registered key clients from the key server before deleting the tenant.

5.0.5.1

Admin commands, Encryption

IJ24558

When a fileset is in the deleteRequired state, a blank character is missing between the "latest:" string and the snapshot name if it is too long, thus leading to parsing issues on the snapshot name from the output of mmlsfileset. (show details)

Symptom	Error message
Environment	All
Trigger	A fileset is in deleteRequired state and a global snapshot contains this fileset with a long snapshot name
Workaround	None

5.0.5.1

mmlsfileset command

IJ25620

AFM doesn't allow tuning the AFM tunables per node in the cluster. All of them seem to be only for the whole cluster level. A few of them such as afmHardMemThreshold and afmMaxParallelRecoveries need to be tuned at each gateway node. (show details)

Symptom	Unexpected Results/Behavior
Environment	Linux, AIX
Trigger	Try to set afmHardMemThreshold and afmMaxParallelRecoveries using mmchnode command with -N option.
Workaround	None

5.0.5.1

AFM

IJ25624

When a GPFS command is being blocked by another command, an informational message will be displayed to remind the user of the blocked command that the command will be resumed after the running conflicted command completes. This is not true for some long running commands, like mmlssnapshot and mmdelsnapshot. (show details)

Symptom	Unclear error message
Environment	All
Trigger	Any two conflicted GPFS commands, such as mmlssnapshot and mmdelsnasphot
Workaround	None

5.0.5.1

Conflicting GPFS commands

IJ25369

Prefetch enhancements in 5.0.2 introduced a minor internal checksuch that if the list file is present in an NFS mount commonacross the application and gateway nodes, AFM skips copying this list file from app to the gateway node and uses the list fileas is on the gateway node. But sometimes the same path and file name can exist andhave 2 entirely unrelated files. (show details)

Symptom	Unexpected Results/Behavior
Environment	Linux gateway nodes, Linux and AIX application nodes
Trigger	Have same named list file on the gateway node and the application node. These files should have a different list of files. Try running prefetch with this list file from the application node.
Workaround	Rename the list file on the application node to have a unique name such that it doesn't match any similar file on the gateway node. Or, remove or rename similar named list files on the gateway node so that AFM prefetch doesn't detect the same named list file from the application node that is present on the gateway node.

5.0.5.1

AFM

IJ25652

Cannot commit some log records successfully. But it might have updated the VtrackMap. Due to the resign, update to the memory metadata may not have happened. In such a scenario, if another thread is trying to do a VtrackMap flush operation, it successfully writes the metadata block, then the metadata version updated by the VtrackMap entry and metadata block will be the same, although the log will contain the latest version of the record. (show details)

Symptom	Error message
Environment	Linux
Trigger	A RG resign during a track write operation
Workaround	None

5.0.5.1

GNR

IJ25656

AFM builds up temporary files in the /var/ directory for recovery procedures of the AFM fileset. These files are not deleted until the next recovery on the same fileset. (show details)

Symptom	Unexpected Results/Behavior
Environment	Linux
Trigger	Failure of the AFM normal queue, and recovery getting triggered on next operation on the fileset
Workaround	User might have to remove the /var/mmfs/afm/-/recovery files manually, making sure that recovery has completed flushing the queue.

5.0.5.1

AFM

IJ25578

mmces event list command does not accept all options described in the man page. (show details)

Symptom	Code does not provide all functionality described in the man page.
Environment	Linux
Trigger	Execute the command: mmces events list HDFS
Workaround	Use mmhealth commands for the requested information.

5.0.5.1

CES

IJ25657

AFM is sending lookups on non existent or newly created files to the old system even though readdir was performed on the directory. This is causing too many lookups being sent to old system leading to performance degradtion. (show details)

Symptom	Performance degradation
Environment	All
Trigger	AFM migration
Workaround	None

5.0.5.1

AFM

IJ25547

GPFS daemon crashes and the file system gets unmounted. The GPFS daemon crashes because the daemon code hit an assert which indicates the disk address is not expected. (show details)

Symptom	GPFS daemon crashes with logAssertFailed: !"oldDiskAddrFound.compAddr(*oldDiskAddrP)", and the file system gets unmounted.
Environment	All
Trigger	Failed writes and seeks within a file
Workaround	None

5.0.5.1

Core GPFS

IJ25660

Applications that have lots of GPFS system calls may fail with a SIGSEGV. (show details)

Symptom	Application encountered SIGSEGV on first GPFS system call; backtrace shows a crash before entering the kernel
Environment	All (Primarily on x86_64 and secondarily on Z Linux)
Trigger	Extremely high concurrency with threads immediately calling gpfs_stat() (or any GPFS system call)
Workaround	Paced or metered thread startup

5.0.5.1

GPFS system call library

IJ25661

RPC message sending thread hung, such as the following:

Waiting 34581.7807 sec since 10:04:23, monitored, thread 13813 EEWatchDogThread: on ThCond 0x3FFCDC00E418 (MsgRecordCondvar), reason 'RPC wait'

(show details)

Symptom	Hang/Deadlock/Unresponsiveness/Long Waiters
Environment	All
Trigger	Network is not good which leads to reconnect.
Workaround	None

5.0.5.1

Core GPFS

IJ25663

A CES-IP was declared in /etc/hosts with just the IPand without a host name. This causes hangs in process. (show details)

Symptom	The mmces address add command hangs. Other internal CES IP processing could also hang.
Environment	Linux (protocol nodes)
Trigger	/etc/hosts is not populated with the 3 expected parameters.
Workaround	Fix the declarations in /etc/hosts.

5.0.5.1

CES

IJ25664

In a rare case, mmap(2)/munmap(2) system call may block file system quiesce and cause quiesce timeout.

(show details)

Symptom	Hang/Deadlock/Unresponsiveness/Long Waiters
Environment	Linux
Trigger	Commands like mmdelfileset, mmcrsnapshot, mmdelsnapshot start file system quiesce. At the same time an application calls mmap(2)/munmap(2).
Workaround	None

5.0.5.1

Filesets, Snapshots

IJ25472

tsbuhelper checkprotection might not work correctly if the filename contains two or more spaces. (show details)

Symptom	Spaces being removed from the input file names
Environment	Linux
Trigger	Using tsbuhelper checkprotection when file name contains more than two spaces
Workaround	None

5.0.5.1

tsbuhelper checkprotection subcommand

IJ25685

On systems that are booted in FIPS, the ssh client produces extra messages on stdout. The message "FIPS mode initialized" causes GPFS command to fail. GPFS requires that the shell command produces no extraneous messags.

(show details)

Symptom	Error output/message
Environment	Linux
Trigger	Command failure on systems that booted in FIPS mode
Workaround	Disable FIPS at boot or write a wrapper script that will remove the extraneous message from ssh. The wrapper script can be added to the GPFS remote shell command.

5.0.5.1

Admin commands

IJ25686

If files are already cached and not any file is queued for prefetch then it returns an error. (show details)

Symptom	NA
Environment	All
Trigger	NA
Workaround	NA

5.0.5.1

AFM

IJ25579

Cygwin version 3.1.5 released on June 1, 2020, has changed its implementation of symlinks. Cygwin symlinks are now Windows reparse points instead of the older-style system file with header. Due to this change, GPFS on Windows fails to interpret the new Cygwin symlinks. This results in errors during the GPFS daemon startup, specifically in its attempt to load the authorized public key.

(show details)

Symptom	Abend/Crash
Environment	Windows
Trigger	Upgrade to Cygwin version 3.1.5 (or later).
Workaround	Revert to older level of Cygwin (version 3.1.4 or earlier).

5.0.5.1

Windows

IJ25687

When a snapshot gets deleted, some of the blocks of the snapshot are copied to the previous snapshot to maintain data and metadata consistency. If due to some reason when the deletion of the snapshot is interrupted, there can be a scenario where the blocks are copied to the previous snapshot but the block disk addresses are not yet removed from the snapshot being deleted, the status of such snapshots change to DeleteRequired. This condition is expected for DeletedRequired snapshots and will be handled properly when the deletion of the snapshot is retried. But if before deleting the DeleteRequired snapshot, if an offline fsck is run on the file system then fsck will falsely report such blocks as duplicate address between the DeleteRequired snapshot and its previous snapshots.

(show details)

Symptom	Incorrect report of corruptions by fsck
Environment	All
Trigger	This issue can occur when the delete of a snapshot has been interrupted due to some reason causing the snapshot to now show status of DeleteRequired and then offline fsck is run on the file system before deleting the DeleteRequired snapshot.
Workaround	Complete deletion of all the DeleteRequired status snapshots before running offline fsck.

5.0.5.1

FSCK

IJ25689

Fsck reports a bad DA in a to-be-deleted inode even though such inodes will be cleaned up during normal GPFS operations. (show details)

Symptom	Error output/message
Environment	All
Trigger	Corrupt disk address in a to-be-deleted inode
Workaround	None needed; it is not harmful to allow fsck to repair the corruption.

5.0.5.1

FSCK

IJ25692

Running an administrative command within a few seconds after running a read-only offline fsck can lead to an assert. (show details)

Symptom	Error Abend/Crash
Environment	All
Trigger	Administrative command run right after offline fsck
Workaround	None

5.0.5.1

FSCK

IJ25698

Offline fsck reports false positive replica mismatch if the NSD goes down midway. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	NSD going down while offline fsck is running
Workaround	None

5.0.5.1

FSCK

IJ25791

Due to a delayed file close in the VFS layer and a context mismatch, closing the file after the replication does not wait for the file system quiesce causing the remote log assert.

(show details)

Symptom	Assert/Crash
Environment	Linux
Trigger	AFM replication using NSD backend with file system quiesce
Workaround	None

5.0.5.1

AFM, AFM DR

IJ25792

AFM does not replicate the file the correct amount of times when the time is set using the gpfs_set_times_path and gpfs_set_times API.

(show details)

Symptom	Unexpected behaviour
Environment	Linux
Trigger	Using non-POSIX API to set file time on AFM fileset
Workaround	None

5.0.5.1

AFM, AFM DR

IJ25856

Offline fsck requires a certain amount of pagepool memory for it to run with a single inode scan pass. If the needed amount of pagepool memory is not available, it will display a warning message before starting the fsck scan indicating the number of inode scan passes it will take with the current available pagepool memory. It also shows the amount of pagepool memory it would need to run a complete single inode scan pass. For example, following is the message displayed by fsck if there is insufficient pagepool memory available for fsck to run with a single inode scan pass:

---------------- Available pagepool memory will require 3 inode scan passes by mmfsck. To scan inodes in a single pass, total pagepool memory of 11767119872 bytes is needed. The currently available total memory f or use by mmfsck is 8604614656 bytes. Continue fsck with multiple inode scan passes? n ---------------- Now the problem is that in some cases it will display an incorrect value of pagepool memory needed. Another side effect of this problem is that in some cases fsck might not show the above message and instead shows the below incorrect message: --------------- There is not enough free memory available for use by mmfsck in . Continue fsck with multiple inode scan passes? n

(show details)

Symptom	Incorrect value in message
Environment	All
Trigger	This issue is most likely to trigger when offline fsck is run on a large file system but where the nodes do not have small pagepool memory.
Workaround	While there is no specific workaround here, you can either choose to continue running fsck with multiple inode scan passes or else try to increase the pagepool with some random amount and keep incrementing as much as possible till it makes fsck run with a single inode scan pass.

5.0.5.1

FSCK

IJ25665

mmap may expand the file size to a page boundary if the file is a sparse file and the file size is not a multiple of page size. (show details)

Symptom	Unexpected Results/Behavior
Environment	All
Trigger	At the mmap page fault time GPFS will allocate a disk block if it's not allocated yet. A mmap read/write beyond file size will cause SIGBUS. But if the file size is not a multiple of page size, the page covers eof is accessible, and a mmap write to this page should keep current file size.
Workaround	None

5.0.5.1

Core GPFS