1627   The cluster has insufficient redundancy in its controller connectivity.

Explanation

The cluster has detected that it does not have sufficient redundancy in its connections to the disk controllers. This means that another failure in the SAN could result in loss of access to the application data. The cluster SAN environment should have redundant connections to every disk controller. This redundancy allows for continued operation when there is a failure in one of the SAN components.

To provide recommended redundancy, a cluster should be configured so that:

  • each node can access each disk controller through two or more different initiator ports on the node.
  • each node can access each disk controller through two or more different controller target ports. Note: Some disk controllers only provide a single target port.
  • each node can access each disk controller target port through at least one initiator port on the node.

If there are no higher-priority errors being reported, this error usually indicates a problem with the SAN design, a problem with the SAN zoning or a problem with the disk controller.

If there are unfixed higher-priority errors that relate to the SAN or to disk controllers, those errors should be fixed before resolving this error because they might indicate the reason for the lack of redundancy. Error codes that must be fixed first are:

  • 1210 Local FC port excluded
  • 1230 Login has been excluded

Note: This error can be reported if the required action, to rescan the Fibre Channel network for new MDisks, has not been performed after a deliberate reconfiguration of a disk controller or after SAN rezoning.

The 1627 error code is reported for a number of different error IDs. The error ID indicates the area where there is a lack of redundancy. The data reported in an event log entry indicates where the condition was found.

The meaning of the error IDs is shown below. For each error ID the most likely reason for the condition is given. If the problem is not found in the suggested areas, check the configuration and state of all of the SAN components (switches, controllers, disks, cables and cluster) to determine where there is a single point of failure.

010040 A disk controller is only accessible from a single node port.

  • A node has detected that it only has a connection to the disk controller through exactly one initiator port, and more than one initiator port is operational.
  • The error data indicates the device WWNN and the WWPN of the connected port.
  • A zoning issue or a Fibre Channel connection hardware fault might cause this condition.

010041 A disk controller is only accessible from a single port on the controller.

  • A node has detected that it is only connected to exactly one target port on a disk controller, and more than one target port connection is expected.
  • The error data indicates the WWPN of the disk controller port that is connected.
  • A zoning issue or a Fibre Channel connection hardware fault might cause this condition.

010042 Only a single port on a disk controller is accessible from every node in the cluster.

  • Only a single port on a disk controller is accessible to every node when there are multiple ports on the controller that could be connected.
  • The error data indicates the WWPN of the disk controller port that is connected.
  • A zoning issue or a Fibre Channel connection hardware fault might cause this condition.

010043 A disk controller is accessible through only half, or less, of the previously configured controller ports.

  • Although there might still be multiple ports that are accessible on the disk controller, a hardware component of the controller might have failed or one of the SAN fabrics has failed such that the operational system configuration has been reduced to a single point of failure.
  • The error data indicates a port on the disk controller that is still connected, and also lists controller ports that are expected but that are not connected.
  • A disk controller issue, switch hardware issue, zoning issue or cable fault might cause this condition.

010044 A disk controller is not accessible from a node.

  • A node has detected that it has no access to a disk controller. The controller is still accessible from the partner node in the I/O group, so its data is still accessible to the host applications.
  • The error data indicates the WWPN of the missing disk controller.
  • A zoning issue or a cabling error might cause this condition.

010117 A disk controller is not accessible from a node allowed to access the device by site policy

  • A disk controller is not accessible from a node that is allowed to access the device by site policy. If a disk controller has multiple WWNNs, the disk controller may still be accessible to the node through one of the other WWNNs.
  • The error data indicates the WWNN of the inaccessible disk controller.
  • A zoning issue or a fibre channel connection hardware fault might cause this condition.

User response

  1. Check the error ID and data for a more detailed description of the error.
  2. Determine if there has been an intentional change to the SAN zoning or to a disk controller configuration that reduces the cluster's access to the indicated disk controller. If either action has occurred, continue with step 8.
  3. Use the GUI or the CLI command lsfabric to ensure that all disk controller WWPNs are reported as expected.
  4. Ensure that all disk controller WWPNs are zoned appropriately for use by the cluster.
  5. Check for any unfixed errors on the disk controllers.
  6. Ensure that all of the Fibre Channel cables are connected to the correct ports at each end.
  7. Check for failures in the Fibre Channel cables and connectors.
  8. When you have resolved the issues, use the GUI or the CLI command detectmdisk to rescan the Fibre Channel network for changes to the MDisks. Note: Do not attempt to detect MDisks unless you are sure that all problems have been fixed. Detecting MDisks prematurely might mask an issue.
  9. Mark the error that you have just repaired as fixed. The cluster will revalidate the redundancy and will report another error if there is still not sufficient redundancy.
  10. Go to MAP 5700: Repair verification.

Possible Cause-FRUs or other:

  • None