IJ55192 |
Critical |
In rare cases, an encrypted file system may panic and get unmounted as a result of a directory inode being ....
(show details)
Symptom |
Cluster/File System Outage |
Environment |
ALL Operating System environments |
Trigger |
Users creating many files in directories in encrypted file systems from many nodes in the cluster, may trigger a special code path that mishandles the accessing of such directories when a node tried to become a metanode for such directories inodes. |
Workaround |
Disable the stat cache by setting maxStatCache=0 |
|
5.2.3.3 |
Encryption |
IJ54956 |
High Importance
|
During file sharing, missing node information led to a crash. (is the problem description too loose? its almost the same as the title, but less technical)
(show details)
Symptom |
Abend/Crash |
Environment |
ALL Linux OS environments |
Trigger |
The file accessed from a remote cluster has an access control list. |
Workaround |
Adding information to the communication context about the nodes grant access and access the files. |
|
5.2.3.3 |
Remote cluster mount/UID remapping |
IJ54792 |
High Importance
|
Unable to add new disk with thin provisioning when attempted with running mmadddisk. The command failed with 'Disk 'PRD_ABITEST14_01' mismatch, it doesn't support 'UNMAP' to reclaim space.'.
(show details)
Symptom |
The command will fail with the error message like, "Disk 'PRD_ABITEST14_08' mismatch, it is not allowed to have both thin and non-thin disks in the system pool.". |
Environment |
Linux Only |
Trigger |
Run mmadddisk with a stanza file (contains a (list of) NSD(s). The stanza file would have thinDiskType={scsi | auto}. If the sysfs attribute (rotational, which is typically located at the queue (ex, /sys/devices/virtual/block/dm-1/queue/rotational)) for the disk(s) is(are) 0 indicating SSD, a different thinDiskType (i.e., nvme) will be returned which is a 'mismatch'. |
Workaround |
None |
|
5.2.3.3 |
thin-provisioning. |
IJ54254 |
Critical |
Lookup on a directory could stuck in a endless loop if there is a directory block with corruption.
(show details)
Symptom |
Hang/Deadlock/Unresponsiveness/Long Waiters |
Environment |
ALL Operating System environments |
Trigger |
Lookup on a directory with corrupted block |
Workaround |
Run offline mmfsck on the file system to repair any directory corruption. |
|
5.2.3.3 |
All Scale Users |
IJ55321 |
Critical |
When a Direct IO is performed on an AFM uncached file the DIO path skips AFM caching needed path if dio is desired. This causes Data to appear corrupted (all 0s).
(show details)
Symptom |
Data Corruption |
Environment |
All OS Environments |
Trigger |
Direct IO Read on an AFM uncached file. |
Workaround |
mmchconfig dioDisable=1 -i |
|
5.2.3.3 |
AFM |
IJ55350 |
High Importance
|
Filesystem level migration, or any multiple Fileset(s) filesystem needs checkDirty, checkUncached to be able to run at the FS level for complete checks. Also need an enhancement to mention -s similar to -g for all policy invocations of AFM.
(show details)
Symptom |
Unexpected Behavior |
Environment |
All OS Environments |
Trigger |
Running mmafmctl checkDirty and checkUncached commands at the Filesystem level where AFM filesets are present. |
Workaround |
Running at individual fileset level only. |
|
5.2.3.3 |
AFM |
IJ54084 |
Suggested |
In a file system configured with the (default) "relatime" setting, if nodes only read files (but not write to them), while others stat those files, stat() will not provide an updated value for atime. This will affect applications that count on updated atime to determine whether files have been accessed recently.
(show details)
Symptom |
Unexpected Results/Behavior |
Environment |
ALL Operating System environments |
Trigger |
The file system is created with the -S set to its default value ("relatime"). Applications read the content of files, but seldom write to them. Applications with perform 'stat' on the files run on nodes which are not the nodes where the reads take place. |
Workaround |
Set the (undocumented) forceAttributeRefresh configuration parameter, which will force nodes to retrieve updated stat info. For example: mmchconfig forceAttributeRefresh=60 -i |
|
5.2.3.3 |
ALL Operating System environments |
IJ55304 |
High Importance
|
The assert can sometimes happen due to token reference count leak.
(show details)
Symptom |
Abend/Crash |
Environment |
ALL Operating System environments |
Trigger |
When token transfer goes through certain code path |
Workaround |
Disable the assert may be OK in most cases. |
|
5.2.3.3 |
All Scale Users |
IJ55373 |
High Importance
|
On a large file system, tsapolicy may not free all queue elements processed during directory scan and could result in OOM. During directory scan, tsapolicy allocates queue elements to keep each directory entry for scanning and assigns a unique correlation number to each queue element. The correlation number is used as a watermark for the server process to free queue elements that have been processed when a client completes its assignment. But it is an int32 type and can be overflowed in a large file system. The number needs to be reset to 1 when it overflows int32.
(show details)
Symptom |
Component Level Outage |
Environment |
all platforms that support mmapplypolicy |
Trigger |
run mmapplypolicy on a large file system |
Workaround |
none |
|
5.2.3.3 |
mmapplypolicy |
IJ55376 |
High Importance
|
A check is called too often which can be problematic when many discs are checked leading to waiters affecting IO performance.
Check /var/adm/ras/mmsysmonitor.log for
[I] Timeout RunCmd Command /usr/lpp/mmfs/bin/mmremote getLocalNsdData -X timed out after 42 sec. Sending SIGTERM and /var/adm/ras/mmfs.log for "waiters"
(show details)
Symptom |
Error output/message Slow IO |
Environment |
ALL Operating System environments |
Trigger |
A check is called too often which can be problematic when many discs are checked |
Workaround |
As a quick fix the check can be disabled using mchconfig mmhealth-disk-check_nsd=False --force after update set parameter to True to re-enable the check |
|
5.2.3.3 |
System Health |
IJ55379 |
High Importance
|
The timeout test result is not consistent on AMD EPYC-Turin Processor. If the test passes, the GSKIT hangs workaround will not be applied. This causes problem later
(show details)
Symptom |
Installation and admin commands hang. |
Environment |
Linux OS environments |
Trigger |
This problem affects AMD EPYC-Turin. |
Workaround |
Manually apply the workaround |
|
5.2.3.3 |
Admin Commands, gskit |
IJ55381 |
Suggested |
mmfs.log: logAssertFailed: maxExpellableQuorumNodes>=0
(show details)
Symptom |
Abend/Crash |
Environment |
All |
Trigger |
Cluster manager trying to process an expel request after quorum has been lost. |
Workaround |
None |
|
5.2.3.3 |
GPFS Core |
IJ55406 |
Suggested |
The 'mmkeyserv tenant delete' command fails to remove the tenant definition from the Storage Scale cluster when the tenant no longer exists on the key server.
(show details)
Symptom |
Command failure |
Environment |
AIX and Linux |
Trigger |
The tenant was removed from the key server prior to invoking the 'mmkeyserv tenant delete' command. This occurs on new version of GKLM. |
Workaround |
Reissue the command with --force option. |
|
5.2.3.3 |
Admin Commands Encryption |
IJ55407 |
High Importance
|
An IO operation from NFS Ganesha may not always have the client ip address information. With FAL and NFS Ganesha enabled, if a previous NFS IO operation for a file system had the client IP address information associated with it, this IP can potentially be used to in audit events for another NFS IO operation of another filesystem, resulting in a mismatch of NFS client IPs in events of different file systems.
(show details)
Symptom |
Unexpected Results/Behavior |
Environment |
Linux |
Trigger |
- Enable File Audit Logging for at least two file systems
- Have NFS Ganesha running
- Have at least two NFS clients that mount the exports of the file systems to the same CES node.
- Generate IOs from the clients to the exports.
- A small set of events in the audit logs of each file system would contain incorrect/mismatch nfs client ips. |
Workaround |
None |
|
5.2.3.3 |
File Audit Logging, NFS |
IJ55408 |
High Importance
|
During AFM migration, files deleted on the target are not being removed from the local cache if the parent is dirty. "mmafmctl checkUncached" command reports these files are uncached.
(show details)
Symptom |
Unexpected results. |
Environment |
All OS Environments |
Trigger |
AFM migration with deleted files/dirs at the target |
Workaround |
None |
|
5.2.3.3 |
AFM |
IJ55647 |
Critical |
Files from the AFM cache may be incorrectly deleted or moved to the .ptrash directory when using the afmFastCreate option. A file may be incorrectly deleted from the cache when a newly created file is renamed.
(show details)
Symptom |
Unexpected Results |
Environment |
All OS Environments |
Trigger |
Using afmFastCreate option with AFM caching. |
Workaround |
Disable afmFastCreate option |
|
5.2.3.3 |
AFM |
IJ55648 |
Critical |
A deadlock may occur in the AFM environment when afmFastLookup is disabled, due to a lock ordering issue. This can lead to cluster-wide hangs.
(show details)
Symptom |
Deadlock |
Environment |
All OS Environments |
Trigger |
AFM caching under high workload |
Workaround |
Enable afmFastLookup option |
|
5.2.3.3 |
AFM |
IJ55409 |
High Importance
|
Parallel Read considers only unique Remote site mapped Gateway nodes for spawning READ_SPLIT messages except for GPFS backend. Same should be considered for Object backend also because all Gateway nodes will be mapped to the same remote target.
(show details)
Symptom |
Unexpected behavior |
Environment |
Linux Only |
Trigger |
Read on large object on the AFM COS backend with a mapping target |
Workaround |
None |
|
5.2.3.3 |
AFM |
IJ55188 |
Suggested |
If the utimensat() system call (which is used in the "touch" command) is issued on a file shortly after it was issued, it may not take effect and fail to update the file's modification time ("mtime").
(show details)
Symptom |
Unexpected Results/Behavior |
Environment |
ALL Linux OS environments |
Trigger |
Issuing the utimensat() system call on a given file multiple times in quick succession (< 1 second). |
Workaround |
Whenever feasible, wait at least 1 second between subsequent invocations of utimensat() on the same file. |
|
5.2.3.3 |
All Scale Users |
IJ55601 |
Medium Importance |
A previous network issue can lead into a log/trace entry like
totalReceived == scatteredP->scattered_total_len || (totalReceived == 0 && scatteredIndex == scatteredP->scattered_count) followed by an assert causing the node to stop.
(show details)
Symptom |
Assert/Crash |
Environment |
Linux Only |
Trigger |
A network failure triggering some state counters/variables in an undefined state |
Workaround |
None |
|
5.2.3.3 |
Scale |
IJ55167 |
High Importance
|
IBM has identified potential security leak or data access loss issue for files created from SMB clients. The issue may appear when SMB clients create files in folders that use ACL inheritance to change ACLs (additional access to groups, reduced access to a users primary group) from the default access mask.
(show details)
Symptom |
incorrect ACL written |
Environment |
Linux Only |
Trigger |
File creation via SMB protocol in folders with ACL inheritance |
Workaround |
None |
|
5.2.3.2 |
CES SMB |
IJ55170 |
High Importance
|
This issue often shows up when running git clone into an NFS-mounted directory.
Below is an example of the error that may occur:
$ git clone https://github.com/jupp0r/prometheus-cpp
Cloning into 'prometheus-cpp
'...remote: Enumerating objects: 5577, done.
remote: Counting objects: 100% (1562/1562), done.
remote: Compressing objects: 100% (373/373), done.
remote: Total 5577 (delta 1287), reused 1189 (delta 1189), pack-reused 4015 (from 2)
Receiving objects: 100% (5577/5577), 1.32 MiB | 7.94 MiB/s, done.
fatal: could not open '/mnt/nfs4/prometheus-cpp/.git/objects/pack/tmp_pack_fADmRg' for reading: Permission denied
fatal: fetch-pack: invalid index-pack output
(show details)
Symptom |
Permission denied error |
Environment |
Linux Only |
Trigger |
Permission denied error encountered during git clone |
Workaround |
None |
|
5.2.3.2 |
NFS |
IJ55184 |
High Importance
|
Scale 5.2.3 PTF1 and PTF2 contain a code change leading to possible slower read performance.
The problem exists in 5.2.3 PTF1 and the fix is in 5.2.3 PTF2
(show details)
Symptom |
Slower performance than expected. |
Environment |
ALL Linux OS environments and "Windows/x86_64" |
Trigger |
Any regular read workload can incur additional overhead. This has been specifically observed with the ior hard read benchmark, but could affect any read workload. |
Workaround |
Do not use Scale 5.2.3 PTF1 or PTF2. |
|
5.2.3.2 |
All Scale Users |
IJ54628 |
High Importance
|
Not able to read uncached file during Resync when AFM queue is on queueOnly state.
(show details)
Symptom |
Uncached file read failure during Resync |
Environment |
Linux Only |
Trigger |
Read on uncached file while AFM resync is queueing ops on gateway node. |
Workaround |
None |
|
5.2.3.1 |
AFM |
IJ53214 |
High Importance
|
With FAL and NFS Ganesha enabled, running workloads with path to an NFS export for long periods of time could result in NFS client ips not being logged in the audit log.
(show details)
Symptom |
Unexpected Results/Behavior |
Environment |
Linux |
Trigger |
- Wit FAL and NFS Ganesha enabled, run workloads with path to the NFS mount point for long periods of time |
Workaround |
- Restart NFS Ganesha if NFS client ips are not being logged |
|
5.2.3.1 |
File Audit Logging, NFS |
IJ54629 |
High Importance
|
mmrestorefs recreates all files and directories that were deleted after the snapshot was taken.If the deleted file is a special file, mmrestorefs uses mknod() system call to create the file.But mknod() cannot create a socket file on AIX. Hence, if socket files were deleted after the snapshot was taken,mmrestorefs on AIX will fail during re-creating the socket file.
(show details)
Symptom |
Component Level Outage |
Environment |
AIX only |
Trigger |
run mmrestorefs when a socket file was deleted after the snapshot was taken. |
Workaround |
none |
|
5.2.3.1 |
mmrestorefs |
IJ54802 |
High Importance
|
If mmrestripefs command issued while mmreclaimspace command is running, we could expect the assert.
(show details)
Symptom |
Abort |
Environment |
Linux / AIX |
Trigger |
On a file system with thin-provisioning (or space-reclamation only as well) enabled, run mmrestripefs while mmreclaimspace command running for space-reclamation |
Workaround |
None |
|
5.2.3.1 |
space-reclamation |
IJ54804 |
High Importance
|
Weighted RGCM Log group rebalance issue.
(show details)
Symptom |
Abend |
Environment |
Linux Only |
Trigger |
If we have slightly different log groups weights on the same server which might not balance the heavy weighted log groups. |
Workaround |
None |
|
5.2.3.1 |
ESS/GNR |
IJ54783 |
High Importance
|
When trying to install Storage Scale on Windows with latest Cygwin version (3.6.1), the installation can fail due to security issues.
(show details)
Symptom |
Upgrade/Install failure. |
Environment |
Windows/x86_64 only |
Trigger |
Upgrading Cygwin to version 3.6.1 before trying to install Storage Scale on Windows |
Workaround |
Downgrade Cygwin to version 3.6.0 or below before attempting to install Storage Scale on Window |
|
5.2.3.1 |
Install/Upgrade |
IJ53557 |
High Importance
|
GPFS asserted due to unexpected hold count on events exporter object during destructor.
(show details)
Symptom |
Assert |
Environment |
All platforms |
Trigger |
A race condition between EventsExporterReceiverThread and EventsExporterListenThread and an error path where the destructor is called |
Workaround |
None |
|
5.2.3.1 |
All Scale Users |
IJ54868 |
Suggested |
'(' character in the undefined value for default needs to be escaped. Failing which the propogation of theconfig to other nodes seems to throw error as syntax error.
(show details)
Symptom |
Unexpected Behavior |
Environment |
All OS Environments |
Trigger |
Tune the afmRecoveryDir back to its default value. |
Workaround |
None |
|
5.2.3.1 |
AFM |
IJ54878 |
High Importance
|
If the dependent fileset is created as a non-root user and linked, then the uid/gid are not replicated for the dependent fileset to the remote site.
(show details)
Symptom |
Unexpected Behavior |
Environment |
Linux Only |
Trigger |
Create and Link dependent fileset inside DR primary fileset as a non-root user. |
Workaround |
None |
|
5.2.3.1 |
AFM |
IJ54968 |
High Importance
|
opening a new file with O_RDWR|O_CREAT fails with EINVAL.
(show details)
Symptom |
file creation returns an error of EINVAL> |
Environment |
Linux Only |
Trigger |
Unknown |
Workaround |
None |
|
5.2.3.1 |
Scale Core |
IJ54967 |
High Importance
|
crash during cxiStrcpy in setSecurityXattr
(show details)
Symptom |
Crash |
Environment |
Linux Only |
Trigger |
file creation with selinux enabled. |
Workaround |
None |
|
5.2.3.1 |
Scale core |
IJ54966 |
High Importance
|
Kernel Crash with selinux enabled
(show details)
Symptom |
Crash |
Environment |
Linux Only |
Trigger |
file creation with selinux enabled. |
Workaround |
None |
|
5.2.3.1 |
Scale core |
IJ54965 |
High Importance
|
NFSV4 ACLs are not replicated with AFM fileset level options afmSyncNFSV4ACL and afmNFSV4
(show details)
Symptom |
Unexpected results |
Environment |
Linux Only |
Trigger |
Using options afmSyncNFSV4ACL and afmNFSV4 to replicate NFSv4 ACLs. |
Workaround |
None |
|
5.2.3.1 |
AFM |
IJ54963 |
High Importance
|
Symlinks are appended with a null character, which causes the pwd -P command to fail to resolve the real path.
(show details)
Symptom |
Unexpected results |
Environment |
Linux Only |
Trigger |
AFM caching with symlinks |
Workaround |
None |
|
5.2.3.1 |
AFM |
IJ54962 |
High Importance
|
Snapshots are not listed under .snapshots directory when the AFM is enabled on the file system
(show details)
Symptom |
Unexpected results |
Environment |
All OS environments |
Trigger |
Listing snapshots when AFM is enabled on the file system |
Workaround |
None |
|
5.2.3.1 |
AFM |
IJ54975 |
Suggested |
"mmhealth cluster show" my report an additional GUI pod after upgrade or rebalancing.
(show details)
Symptom |
Unexpected Results/Behavior |
Environment |
Open Shift (CNSA) |
Trigger |
CNSA Upgrade or other rebalancing action of GUI pods. |
Workaround |
Moving the cluster manager node (mmchmgr) will ensure a resync of the data. "mmhealth node show -a --resend" will do the same |
|
5.2.3.1 |
System Health |
IJ54976 |
High Importance
|
Nodes accessing the AFM fileset crash when the fileset is attempted to disable online with "mmchfileset -p afmTarget=disable-online" command
(show details)
Symptom |
Crash |
Environment |
Linux Only |
Trigger |
AFM fileset disable-online |
Workaround |
None |
|
5.2.3.1 |
AFM |
IJ54983 |
High Importance
|
File Audit Logging uses an internal data structure to keep track of NFS client ip addresses for NFS IOs coming from Ganesha. The CES nodes can crash during garbage collection of this structure due to a race condition caused by a use-after-free error.
(show details)
Symptom |
Abend/Crash |
Environment |
Linux |
Trigger |
- File audit logging is enabled on a file system with NFS Ganesha running. - Large amount of IOs running to NFS exports. |
Workaround |
- Disable File Audit Logging, or - Avoid NFS IOs when FAL is enabled |
|
5.2.3.1 |
File Audit Logging, NFS |
IJ54984 |
High Importance
|
hit Assert exp(getChildId().isValid()) during read operation if mmafmtransfer restarted
(show details)
Symptom |
Crash |
Environment |
Linux Only |
Trigger |
While read is in queue kill mmafmtransfer daemon |
Workaround |
None |
|
5.2.3.1 |
AFM |
IJ54985 |
High Importance
|
On doing mmchdisk start if the command encounters a corrupted inode that fails inode validation then it would not produce the interesting inode list showing the bad inode number. due to this the user did not come to know the affected inode number and had to rely on long running traces to get this information.
(show details)
Symptom |
pit.interestingInodes file not generated/populated |
Environment |
ALL Operating System environments |
Trigger |
When mmchdisk start encounters a corrupted inode |
Workaround |
Capture long running traces and provide the traces to support for getting this infomation |
|
5.2.3.1 |
PIT |
IJ54986 |
High Importance
|
Accessing a file through mmap while the same file is accessed on other nodes, or other operations are done on other nodes, there is a small chance of a race condition leading to a logassert
(show details)
Symptom |
Abend/Crash |
Environment |
ALL Linux OS environments |
Trigger |
Access parts of a mmaped file initially on one node, while having concurrent access or concurrent operations on the same file on other nodes |
Workaround |
Avoid the concurrent operations on other nodes while the file is accessed on one node |
|
5.2.3.1 |
All Scale Users |
IJ54593 |
High Importance
|
During token minimization, a deadlock can occur on a client node. With token minimization, a client node is first asked to give up any tokens that are only for cached files. Without the fix, calling this codepath for files that have been deleted, could result in a deadlock.
(show details)
Symptom |
Hang/Deadlock/Unresponsiveness/Long Waiters |
Environment |
ALL Linux OS environments |
Trigger |
Have many files cached on a client node. Delete files. Trigger a token server change, which then uses token minimization. |
Workaround |
Disable token minimization to avoid the problem: mmchconfig tokenXferMinimization=no. Or restart GPFS on the client node, to get out of the deadlock. |
|
5.2.3.1 |
All Scale Users |
IJ54987 |
High Importance
|
mmrestoreconfig restores file system configuration information which includes fileset information. When recreating AFM filesets, mmrestoreconfig tries to restore afmShowHomeSnapshot attributes but AFM does not allow to set afmShowHomeSnapshot attribute for IW cache mode fileset. Hence mmrestoreconfig will fail if there is an IW cache mode fileset.
(show details)
Symptom |
Component Level Outage |
Environment |
all platforms that support mmrestoreconfig |
Trigger |
run mmrestoreconfig for a file system that contains IW cache mode fileset |
Workaround |
none |
|
5.2.3.1 |
mmrestoreconfig |
IJ54988 |
Critical |
This APAR is to minimize the severity of the issue experienced during the erroneously processing of a DMAPI recall. This APAR does not correct the underlying symptom, however, it reduces the impact for customers who experience this issue. The APAR provides additional diagnostics in trace as well as the Linux kernel console.
(show details)
Symptom |
Customer who experienced the LogAssert (noted in the APAR title) will now receive a soft I/O error when trying to recall the file with a DMAPI 3rd party. |
Environment |
RHEL8 (x86_64, Power) and RHEL9 (x86_64, Power, Z) |
Trigger |
The initial problem was unable to be recreated in the lab. |
Workaround |
None |
|
5.2.3.1 |
DMAPI |
IJ54969 |
High Importance
|
kernel panic: general protection fault / ovl_dentry_revalidate_common / mmfsd ORrunning lsof /proc on a node crashes the node
(show details)
Symptom |
Crash |
Environment |
Linux Only |
Trigger |
Running lsof /proc on a node crashes the node. |
Workaround |
None |
|
5.2.3.1 |
Scale core |
IJ54979 |
High Importance
|
With afmFastCreate enabled, if the Create that tries to push the initial chunk of data fails to complete and gets requeued, then the requeued Create is replaying all data when it retries.And later there are a couple of Write messages that starting from offset where Create initially went inflight that is also played. Totaling to almost twice the amount of data of the file size to be replicated.
(show details)
Symptom |
Unexpected Behaviour |
Environment |
All Linux OS Environments (AFM Gateway nodes) |
Trigger |
afmFastCreate replication failing initially because of lock or network error and later replication being tried again. |
Workaround |
Set a higher value of afmAsyncDelay to push replication as far as the file is being written. |
|
5.2.3.1 |
AFM |
IJ54655 |
High Importance
|
By default, clusters created with version 5.2.0 or later have the numaMemoryInterleave value set to yes. This should start Storage Scale daemon with interleave memory policy, but it does not.
(show details)
Symptom |
Performance Impact/DegradationUnexpected Results/Behavior |
Environment |
ALL Linux OS environments |
Trigger |
This issue affects customers running Storage Scale in Linux NUMA environment and the Storage Scale clusters created with version 5.2.0 or later. |
Workaround |
Explicitly set numaMemoryInterleave=yes using mmchconfig command. # mmchconfig numaMemoryInterleave=yes |
|
5.2.3.1 |
All Scale Users |
IJ55083 |
HIPER |
mmap data on Windows nodes may not be correctly written to disk on Windows node running Scale 5.1.9 PTF10
(show details)
Symptom |
Data corruption |
Environment |
Windows/x86_64 only |
Trigger |
Write data from mmap applications on Windows. The data may not be written correctly to disk. |
Workaround |
There is no workaround. The recommendation is to not run 5.1.9 PTF 10 on Windows nodes without this fix. |
|
5.2.3.1 |
All Scale Users |
IJ55093 |
Critical |
Unexpected GPFS daemon assert could happen when file system has DMAPI enabled for use with DMAPI application
(show details)
Symptom |
Abend/Crash |
Environment |
ALL Operating System environments |
Trigger |
File delete on DMAPI enabled file system trigger destroy event |
Workaround |
Disable DMAPI on the file system |
|
5.2.3.1 |
DMAPI/HSM/TSM |
IJ55094 |
Suggested |
When updating a resource via scalectl with the --url option, the update mask is not set, meaning the field might not get updated, or might result in validations being skipped
(show details)
Symptom |
Unexpected Results/Behavior |
Environment |
Linux Only |
Trigger |
run scalectl <resource> update --url <host>:<port> |
Workaround |
use scalectl without the --url option, or run a REST API request with the appropriate update mask |
|
5.2.3.1 |
Native Rest API |
IJ55095 |
High Importance
|
mmafmctl getList subcommand deletes all .* files/dir in current working directory because of a variable initialisation issue in mmafmctl script.
(show details)
Symptom |
Unexpected Behavior |
Environment |
All OS Environments |
Trigger |
Running the mmafmctl getList subcommand from an important work directory like /root where important OS related files might exist. |
Workaround |
cd working directory to a empty dir in /tmp and run the mmafmctl getList subcommandfrom there. |
|
5.2.3.1 |
AFM |
IJ55119 |
High Importance
|
If an accessing cluster has been authorized to access a list of filesets, updating resources on the owning cluster to remove one fileset is not effective. The original list of filesets can still be accessed. An accessing cluster may not access to remote resources after a resource update to remove and then re-add the resources.
(show details)
Symptom |
Unexpected Results/Behavior |
Environment |
Linux |
Trigger |
Authorize access to fileset resources on the owning cluster via scalectl cluster remote authorize. Then remove a fileset via scalectl cluster remote update. Perform a series of authorize, remote mount, unauthorize, remote mount actions on file system resources. |
Workaround |
Use mmauth to update resources |
|
5.2.3.1 |
Native REST API |
IJ55120 |
Suggested |
If a remote file system definition is added using scalectl filesystem remote add, the Automount value may not be correct when viewing this definition with mmremotefs.
(show details)
Symptom |
Unexpected Results/Behavior |
Environment |
Linux |
Trigger |
Add a remote file system definition via scalectl filesystem remote add. Display this file system definition via mmremotefs show. The value in the Automount column in the output of mmremotefs show shows 'mount = false' instead of 'no'. |
Workaround |
None |
|
5.2.3.1 |
Native REST API |
IJ55121 |
High Importance
|
In a resource definition file, if fileset resources are specified and there is no matching file system resource. This is invalid.If the root fileset is not specified when there are fileset resources to authorize, this is invalid. If the file system's disposition does not match its root fileset's disposition, this is invalid. scalectl cluster remote authorize may grant some resources instead of returning an error.
(show details)
Symptom |
Unexpected Results/Behavior |
Environment |
Linux |
Trigger |
No matching fs resource for the fileset resources or, No root fileset specified in the fileset resources or,Mismatch dispositions between a file system and its root fileset. |
Workaround |
Ensure the resource definition file grants the correct resources |
|
5.2.3.1 |
Native REST API |
IJ55141 |
High Importance
|
If replica compare is done on block 0 of a snapshot inode0 file while the same block is being updated, a false positive replica mismatch can happen.
(show details)
Symptom |
Replica mismatch is reported for block 0 of snapshot inode0 file |
Environment |
All platforms |
Trigger |
Doing replica compare and updating snapshot inode0 file at the same time |
Workaround |
None |
|
5.2.3.1 |
core GPFS |
IJ53815 |
High Importance
|
During or after upgrade of manager nodes to 5.2.0.0+, deadlock can occur.
(show details)
Symptom |
Cluster/File System Outage |
Environment |
All Operating System environments |
Trigger |
Manager node(s) are on version 5.2.0.0+. Client nodes are running a release prior to 5.1.9.0, or morethan one file system is under token migration at the same time. |
Workaround |
None |
|
5.2.3.0 |
All Scale Users |
IJ53828 |
High Importance
|
If the customer executes the systemops command, it will allow any command to be executed, as there is no specific command validation in place.
(show details)
Symptom |
None |
Environment |
Linux Only |
Trigger |
No such conditions |
Workaround |
None |
|
5.2.3.0 |
No Such restriction |
IJ54043 |
Medium Importance |
When a file system maintenance command - mmrestripefs - or disk maintenance command - mm{ch,del,rpl}disk - run, the 'thin inode' is deallocated and the emergency space is deleted. This is unexpected behavior and could be a problematic if the file system gets OOS (Out-Of-Space) condition.
(show details)
Symptom |
After one of the commands run, the 'thin inode' in the internal dump will reset to -1 indicating that the 'thin inode' is deallocated and 'nBlocks' will be 0 indicating the emergency space is deleted.
[root@c145f11san04b sju]# mmfsadm dump stripe | egrep "State of|thin inode"
State of StripeGroup "test" at 0x18042A6A5B0, uid 7491A8C0:67D9B824, local id 1:
0: name 'system' Valid nDisks 32 nInUse 32 id 0 poolFlags 2 thin inode 41 nBlocks 519
1: name 'data' Valid nDisks 16 nInUse 16 id 65537 poolFlags 2 thin inode 42 nBlocks 526
2: name 'flash' Valid nDisks 8 nInUse 8 id 65538 poolFlags 2 thin inode 43 nBlocks 526
[root@c145f11san04b sju]# mmrestripefs test -R -N nc1
...
[root@c145f11san04b sju]# mmfsadm dump stripe | egrep "State of|thin inode
"State of StripeGroup "test" at 0x18042A6A5B0, uid 7491A8C0:67D9B824, local id 1:
0: name 'system' Valid nDisks 32 nInUse 32 id 0 poolFlags 2 thin inode -1 nBlocks 0
1: name 'data' Valid nDisks 16 nInUse 16 id 65537 poolFlags 2 thin inode -1 nBlocks 0
2: name 'flash' Valid nDisks 8 nInUse 8 id 65538 poolFlags 2 thin inode -1 nBlocks 0 |
Environment |
Linux/AIX |
Trigger |
Run one of the command - mmrestripefs or mm{ch,del,rpl}disk |
Workaround |
None |
|
5.2.3.0 |
thin-provisioning |
IJ54044 |
High Importance
|
Because of the limitation in the current implementation in managing reserved inode pool, 'thin inode' can be shared erroneously with policy file while it is still assigned to the emergency space on the file system with SSS6K and FCM4 drives. This will trigger an 'assert' because of the corruption by being shared the same inode. And, it could also the file system metadata corruption by itself which will have consequence its own.
(show details)
Symptom |
After one of the commands run, the 'thin inode' in the internal dump will reset to -1 indicating that the 'thin inode' is deallocated and 'nBlocks' will be 0 indicating the emergency space is deleted.
[root@c145f11san04b sju]# mmfsadm dump stripe | egrep "State of|thin inode
"State of StripeGroup "test" at 0x18042A6A5B0, uid 7491A8C0:67D9B824, local id 1:
0: name 'system' Valid nDisks 32 nInUse 32 id 0 poolFlags 2 thin inode 41 nBlocks 519
1: name 'data' Valid nDisks 16 nInUse 16 id 65537 poolFlags 2 thin inode 42 nBlocks 526
2: name 'flash' Valid nDisks 8 nInUse 8 id 65538 poolFlags 2 thin inode 43 nBlocks 526
[root@c145f11san04b sju]# mmchpolicy test /home/sju/policy-default
...
[root@c145f11san04b sju]# mmchmgr test c145f11san04a
[root@c145f11san04b sju]# mmchpolicy test /home/sju/policy-default
[root@c145f11san04b sju]# mmfsadm dump stripe | grep "policy file inode"
policy file inode: 41 |
Environment |
Linux/AIX |
Trigger |
Run 'mmchpolicy' command multiple times. |
Workaround |
None |
|
5.2.3.0 |
thin-provisioning |
IJ54045 |
Medium Importance |
To help controlling the issues (refer to D.341360, D.343470, and D.343471) with file systems created from SSS6K and FCM4, a new option 'thininode' is added to the tsdbfs command. This option will be used to reset 'thin inode'.
(show details)
Symptom |
With the command, 'thin inode' will be reset to the value as appeared in the tsdbfs command.
[root@c145f11san04b sju]# mmfsadm dump stripe | egrep "State of|thin inode"
State of StripeGroup "test" at 0x18042A6A5B0, uid 7491A8C0:67D9B824, local id 1:
0: name 'system' Valid nDisks 32 nInUse 32 id 0 poolFlags 2 thin inode 41 nBlocks 519
1: name 'data' Valid nDisks 16 nInUse 16 id 65537 poolFlags 2 thin inode 42 nBlocks 526
2: name 'flash' Valid nDisks 8 nInUse 8 id 65538 poolFlags 2 thin inode 43 nBlocks 526
[root@c145f11san04b sju]# tsdbfs test patch desc thininode 0 -1
[root@c145f11san04b sju]# tsdbfs test patch desc thininode 1 -1
[root@c145f11san04b sju]# tsdbfs test patch desc thininode 2 -1
[root@c145f11san04b sju]# mmfsadm dump stripe | egrep "State of|thin inode"
State of StripeGroup "test" at 0x18042A6A5B0, uid 7491A8C0:67D9B824, local id 1:
0: name 'system' Valid nDisks 32 nInUse 32 id 0 poolFlags 2 thin inode -1 nBlocks 519
1: name 'data' Valid nDisks 16 nInUse 16 id 65537 poolFlags 2 thin inode -1 nBlocks 526
2: name 'flash' Valid nDisks 8 nInUse 8 id 65538 poolFlags 2 thin inode -1 nBlocks 526 |
Environment |
Linux/AIX |
Trigger |
Refer to D.343470 and D.343471 on the issues/symptoms and how to trigger. |
Workaround |
None |
|
5.2.3.0 |
thin-provisioning |
IJ54079 |
High Importance
|
An application using SMB server may invoke the gpfs_stat_x() call (available in libgpfs.so) to retrieve stat information for a file.Such call implements "statlite" semantics, meaning that the size information is not assured to be the latest. Other applications which invoke standard stat()/fstat() calls do expect the size information to be up to date.However, due a problem in the logic, after gpfs_stat_x() is invoked, information is cached inside the kernel, and the cache is not purged even when other nodes change the file size (for example by appending data to it). The result is that stat() invoked on the node may still retrieve out of date file size information as other nodes write into the file.
(show details)
Symptom |
Unexpected Results/Behavior |
Environment |
ALL Operating System environments |
Trigger |
SMB Applications invoking gpfs_stat_x() will cause wrong file size information to be retrieved by stat()/fstat() invoked by other applications. |
Workaround |
None |
|
5.2.3.0 |
All Scale Users |
IJ54328 |
Critical |
Incorrect snapshot data—either stale or uninitialized—may be read while the mmchdisk start command is being executed on file systems with replication enabled.
(show details)
Symptom |
Data corruption, snapshot data read may not be as expected. |
Environment |
All platforms |
Trigger |
The issue may happen if some data replicas are stale or uninitialized, and snapshot data is accessed while running the mmchdisk start command to repair the bad replicas. |
Workaround |
Avoid accessing snapshot data while running the mmchdisk start command. |
|
5.2.3.0 |
GPFS core |