| IJ56487 |
Medium Importance |
Changes to the perfmon configurations are not updated on nodes that were down at the time of the change were made.
(show details)
| Symptom |
perfmon configure is not updated. |
| Environment |
Linux Only |
| Trigger |
Perfmon configuration is not updated on nodes that were down. |
| Workaround |
Reissue the mmperfmon command to update the configuration once all nodes are up, or run mmcommon run invokePerfmonctl updateon the perfmon nodes that were down. |
|
6.0.0.1 |
Perfmon |
| IJ56488 |
Suggested |
A hang can occur when three operations hit the same file at once:
a process touches a shared, writable mmap mapping and faults a page,
another thread/process performs mremap (needing the mmap write semaphore), and a concurrent write()/pwrite() to the same region.
Under certain timing, the page-fault path must fetch a file lock from the daemon, while the writer is also fetching a conflicting lock. The result is a lock/semaphore cycle between the page-fault handler, the writer, and mremap, and I/O to that file can stall indefinitely.
(show details)
| Symptom |
Threads hang in file operations; GPFS traces show the mmap page-fault path waiting on a fetched lock, a writer stuck on the mmap semaphore after initiating a daemon fetch, and mremap waiting for the semaphore upgrade. No progress until GPFS services are restarted.
Fix description (high level):
Extend the existing mmap uXfer (“borrowed lock”) fast-path into the daemon fetch path. When the kernel’s lock attempt requires a fetch, the daemon can—under safe conditions—temporarily “borrow” a read lock for the page-fault request and signal the kernel to proceed, breaking the cycle while preserving correctness. (Normal lock/token ownership is finalized once the fetch completes; error paths are handled so the kernel falls back safely if borrowing isn't possible.)
|
| Environment |
ALL OS environments |
| Trigger |
File is mmap'd MAP_SHARED|PROT_WRITE (or read-only with faults against the same region) while a concurrent write()/pwrite() targets the same range.
A mremap occurs concurrently, contending on the mmap semaphore.
Lock acquisition in the kernel returns E_NEED_FETCH and both the page-fault path and the writer rely on the daemon to fetch/upgrade the inode lock; specific timing can create a cyclic wait.
|
| Workaround |
None practical. (Avoiding concurrent mmap access and mremap/writes to the same region prevents the issue but is often not feasible.) |
|
6.0.0.1 |
All Scale Users |
| IJ56619 |
Critical |
When running AIO, the thread submitting the I/O request is not the same as the one completing the I/O request. There is race condition where an AIO request that is quickly completed is still accessed from the submitting threads. This either results in a kernel KFENCE warning or a node crash.
(show details)
| Symptom |
Abend/Crash |
| Environment |
ALL Linux OS environments |
| Trigger |
Run AIO in a way that the requests a completed very quickly. This is workload dependend and might be hard to recreate. |
| Workaround |
There is no workaround, the fix is required to avoid this problem. |
|
6.0.0.1 |
All Scale Users |
|
Suggested |
When applications simultaneously use a writable, shared memory map (mmap) and perform regular write()/pwrite()operations to the same file that is subject to snapshot Copy-on-Write (COW), the file system can hit a three-way deadlock. The cycle involves:
• a page-faulting mmap reader that triggers COW into a previous snapshot,
• a concurrent VMA change (e.g., mremap/munmap) that requires the kernel's mmap write semaphore,
• and a regular write path that holds the inode write lock and then page-faults on its user buffer (which also needs the mmap semaphore).
Once formed, the cycle blocks progress on the affected file and can ultimately lead to automatic deadlock breakup (filesystem panic/unmount) depending on configuration.
(show details)
| Symptom |
•Application threads or system threads hang on file I/O to the affected file.
•Trace/logs show CopyDataOnWriteHandlerThread waiting on inode rf, a writer holding wa and blocked in a page fault, and a VMA operation holding/waiting the mmap write semaphore.
•With deadlock breakup enabled, Scale may log multi-phase “deadlock breakup” and unmount/panic the impacted filesystem. |
| Environment |
All supported OS environments. |
| Trigger |
This issue affects customers that:
•Use writable, shared mmap on files that may require snapshot COW, and
•Perform regular write()/pwrite() to the same file, and
•Occasionally execute VMA-altering operations such as mremap/munmap on the mapping.
A deadlock can occur when:
•An mmap page fault (“PF reader”) triggers CopyDataOnWrite for a prior snapshot and needs the inode rf lock.
•A concurrent writer holds the inode wa lock, then page-faults on its user buffer and must acquire the mmap semaphore.
•A concurrent mremap/munmap seeks the mmap semaphore as writer, blocking page-fault progress.This forms a cycle (PF reader ↔ writer ↔ mremap) that stalls I/O on the file. |
| Workaround |
•Avoid concurrent VMA changes (mremap/munmap) while a file is actively accessed via writable shared mmap and regular writes on a snapshot-eligible file.
•Where feasible, separate write bursts from mmap page-fault activity on the same file, or map readers MAP_PRIVATEif application semantics allow.
(These are operational mitigations only; they do not fully prevent the issue.) |
|
6.0.0.1 |
GPFS/Scale — mmap, snapshot Copy-on-Write, locking. |
IJ56620 |
Critical |
When mmfsck detects a hole in a reserved file, it fills the hole by allocating a new disk address and adding that address to the file’s indirect block. It also updates its internal block allocation bitmap to mark the new block as in-use.
However, the internal block allocation bitmap is distributed across the scanning nodes. If the newly allocated block falls outside the region of the bitmap owned by the node that performed the allocation, the node may skip updating the bitmap. As a result, the block remains unmarked in the bitmap. This leads mmfsck to falsely later identify the block as lost. In repair mode, it then incorrectly marks the block as free. Later, when the file system is in use, it may reallocate this block to another file, resulting in duplicate block corruption.
(show details)
| Symptom |
Operation failure due to FS corruption and SGPanic |
| Environment |
ALL Operating System environments |
| Trigger |
This issue can happen when mmfsck detects and repair holes in reserved files. |
| Workaround |
Run mmfsck in repair mode (-y) again after the first repair run. |
|
6.0.0.1 |
FSCK |
| IJ56253 |
High Importance
|
If a filesystem has quotas enabled and a file is unlinked (its last directory entry removed) before a chown is performed, the chown call will fail with ENOENT, even though the file descriptor remains open and valid.
(show details)
| Symptom |
Unexpected Results/Behavior |
| Environment |
All Operating System environments |
| Trigger |
- Enable quota (file system or fileset level)
- Create a file, unlink it while the file descriptor is still valid.
- Set ownership for this file descriptor.
- Close the file descriptor. |
| Workaround |
Disable quotas |
|
6.0.0.1 |
Quotas |
| IJ56680 |
Suggested |
mmbackup verifies directory size as one of the triggers to select objects to be sent to IBM Storage Protect Server. Since directory size will be calculated during restore, if only size is different, no need to re-backup the directory. Hence, mmbackup will not verify size during during backup candidate selection process if the object is directory.
(show details)
| Symptom |
mmbackup may select unchanged directories as backup candidates |
| Environment |
ALL OS that supports mmbackup |
| Trigger |
run live fs backup and then run snapshot backup |
| Workaround |
none |
|
6.0.0.1 |
mmbackup |
| IJ56142 |
High Importance
|
With workloads that heavily lookup or traverse symlinks, contention can occur inside GPFS. The problem is that every symlink lookup request from an application results in the symlink target being queried from the file system, resulting in possible contention on internal locks.
(show details)
| Symptom |
Performance Impact/Degradation |
| Environment |
ALL Linux OS environments |
| Trigger |
The problem is caused by heavily concurrent lookups of the same symlink by many threads. |
| Workaround |
There is no workaround. |
|
6.0.0.1 |
All Scale Users |
| IJ52020 |
High Importance
|
Background sync could be blocked while reducing allocation region, this could cause other operations such as create/delete snapshot to be blocked.
(show details)
| Symptom |
Performance Impact/Degradation |
| Environment |
ALL Operating System environments |
| Trigger |
Running applications on a client that require new disk space to be allocated. |
| Workaround |
None |
|
6.0.0.1 |
All Scale Users |
| IJ56690 |
High Importance
|
There is a small window of opportunity for the assert to go off.
(show details)
| Symptom |
Abend/Crash |
| Environment |
All platforms |
| Trigger |
DIO workload on a file system of 6.0.0.0+ |
| Workaround |
Disable the assert |
|
6.0.0.1 |
UStore |
| IJ56736 |
Suggested |
Starting in 5.2.3.0 gpfs.base required openssl libraries, specifically for mmfsd. Although the binary required the libraries as some symbols were defined, they were unused. This was introduced with the release of the IBM Storage Scale native REST API feature. All communication from scaleadmd to mmfsd is done over a local Unix Domain Socket and ssl is not in use.
(show details)
| Symptom |
Installs package that is required, but unused |
| Environment |
Linux Only |
| Trigger |
Install gpfs.base |
| Workaround |
None |
|
6.0.0.1 |
Linux Only |
| IJ56734 |
High Importance
|
When reading from snapshot files, applications may encounter unexpected non-zero data in blocks that were never written to in the original (root) filesystem. These blocks were part of a pre-allocated file but remained uninitialized, and therefore should logically contain zeros. The error occurs because the snapshot exposes raw, uninitialized disk contents—garbage data—at these locations. This issue is specific to snapshots and does not occur when reading from the root filesystem, where such blocks are correctly interpreted as zero.
(show details)
| Symptom |
Unexpected Results/Behavior |
| Environment |
ALL Operating System environments |
| Trigger |
If both the snapshot and the root filesystem contain a block that was pre-allocated but never written to, reading from the snapshot may return uninitialized data ("garbage") instead of zeroes. |
| Workaround |
None |
|
6.0.0.1 |
Snapshots |
| IJ56765 |
Suggested |
When an object is created in, copied out of with the '-p' flag, or moved into a GPFS file system on AIX, the “extended entries” flag is always turned on. This will incorrectly show an ACL as always having extended entries, regardless of its contents. This will cause 'ls -e' to display a “+” representing the existence of extended entires once the object is moved out of the GPFS file system, and for 'aclget' to always display that extended entries are enabled.
(show details)
| Symptom |
Unexpected Results/Behavior |
| Environment |
AIX only |
| Trigger |
Create an object in a GPFS file system, move an object into a GPFS file system, or copy an object out of a GPFS file system with the '-p' flag. The problem can be observed by running 'aclget' on that object, or moving the object out of the GPFS file system and running 'ls -e' |
| Workaround |
Use 'aclget' to correctly identify existence of extended entries, ignoring “+” in 'ls -e' |
|
6.0.0.1 |
All Scale Users |
| IJ56564 |
Suggested |
Node ID in component listing can show blank value. It's expected all node ID's show a non-zero integer value
(show details)
| Symptom |
Missing node ID when displaying component information is only symptom. |
| Environment |
Linux Only |
| Trigger |
Issuing a discover component command via the GUI is only known way to induce this error. |
| Workaround |
mmchcomp command line utility can be used to set a blank node ID to a specified value. |
|
6.0.0.1 |
GUI, ESS/GNR |
| IJ56781 |
Suggested |
Setting stat-poll-interval and stat-slot-time to zero does not restore to the automatic adjustments of QoS statistics.
(show details)
| Symptom |
The mmqos command behavior is not consistent with the what documented in the manpage. |
| Environment |
All |
| Trigger |
Setting stat-poll-interval and stat-slot-time to zero does not restore to the automatic |
| Workaround |
Use a null string instead of zero value |
|
6.0.0.1 |
QoS |
| IJ56797 |
Suggested |
File system creation fails when creating a file system with a file system version at PTF level. For example, issuing the command "scalectl filesystem create -n fs0 -d disk1 --version 5.2.3.4", the creation will error out with the following error: "filesystem creation failed: rpc error: code = InvalidArgument desc = specified file system version is outside the supported range 5.2.3.0-5.2.3.0 for the native REST API"
(show details)
| Symptom |
Error output/message |
| Environment |
All Linux |
| Trigger |
file system creation with a PTF version as an argument |
| Workaround |
Specify "--version 5.2.3.0" instead of a PTF version or the default. |
|
6.0.0.1 |
Native Rest API |
| IJ56798 |
High Importance
|
outband Download failing with export map as target and gateway nodes IP used as part of mapping.
(show details)
| Symptom |
Unexpected Results |
| Environment |
Linux Only |
| Trigger |
create mapping with GW's IP and use this export map as target for fileset. Outband download on such a fileset fails. |
| Workaround |
use GW hostname instead of IP in mapping. |
|
6.0.0.1 |
AFM |
| IJ55722 |
High Importance
|
mmaddpdisk --replace failing with error 905 due to stale block device information in PDMaster object.
(show details)
| Symptom |
Error output/message |
| Environment |
Linux Only |
| Trigger |
When device is in replace state, wipe out the drive, and try to add mmaddpdisk --replace |
| Workaround |
Can failover the root LG to a different node, and then you can run this command. But not the correct solution. |
|
6.0.0.1 |
GNR |
| IJ56679 |
High Importance
|
mmfsd hits signal 11 on readSGDesc
(show details)
| Symptom |
Abend/Crash |
| Environment |
All platforms |
| Trigger |
After restart File system manager to break up a deadlock during mmadddisk, some nodes might hit Signal 11, the problem is that the file system manager is processing a more recent SG desc that is read from the disk, before the data structure associated with the new SG descriptor get populated. |
| Workaround |
There is not a work around, scale will crash and restart on itself. |
|
6.0.0.1 |
All Scale Users |