When the data center enters a failure scenario, consider
overriding quorum so that container server events are not ignored.
You can use the xscmd utility to query about and
run quorum tasks, such as the quorum status and overriding quorum.
About this task
Override quorum in a data center failure scenario only. When
you override quorum, any surviving catalog server instance can be
used. All survivors are notified when one is told to override quorum.
Procedure
- Query quorum status with the xscmd utility.
xscmd -c showQuorumStatus -cep cathost:2809
Use
this option to display the quorum status of a catalog service instance.
You can optionally use the -to or --timeout option
on your command to reduce the timeout value to avoid waiting for operating
system or other network timeouts during a network brown out or system
loss. The default timeout value is 30 seconds.
One
of the following outcomes is displayed: - Quorum is disabled: The catalog servers
are running in a quorum-disabled mode. Quroum disabled mode is a development
or single data center mode. Do not use quorum disabled mode for multiple
data center configurations.
- Quorum is enabled and the catalog server has quorum:
Quorum is enabled and the system is working normally.
- Quorum is enabled but the catalog server is waiting
for quorum: Quorum is enabled and quorum has been
lost.
- Quorum is enabled and the quorum is overridden:
Quorum is enabled and quorum has been overridden.
- Quorum status is outlawed: When a
brown out occurs, splitting the catalog service into two partitions,
A and B. The catalog server A has overridden quorum. The network partition
resolves and the server in the B partition is outlawed, requiring
a JVM restart. It also occurs if the catalog JVM in B restarts during
the brown out and then the brown out clears.
- Override quorum with the xscmd utility.
xscmd -c overrideQuorum -cep cathost:2809
Running this command forces the surviving catalog servers
to re-establish a quorum.
- Diagnose quorum with the xscmd utility.
- Display a list of the core groups:
Use the
-c
listCoreGroups option to display a list of all the core
groups for the catalog server.
xscmd -c listCoreGroups -cep cathost:2809
- Teardown servers:
Use the
–c teardown option
to remove a server manually from the data grid. Removing a server
from the grid is usually not necessary. Servers are automatically
removed when they are detected as failed, but the command is provided
for use under the guidance of IBM® support.
See
Stopping servers gracefully with the xscmd utility for more information about
using this command.
xscmd –c teardown server1,server2,server3 -cep cathost:2809 –g Grid
- Display the route table:
Use the
-c routetable option
to display the current route table by simulating a new client connection
to the data grid. It also validates the route table by confirming
that all container servers are recognizing their role in the route
table, such as which type of shard for which partition.
xscmd -c routetable -cep cathost:2809 –g myGrid
- Check the map sizes:
Use the
-c showMapSizes option
to verify that key distribution is uniform over the shards in the
key. If some container servers have more keys than others, then it
is likely the hash function on the key objects has a poor distribution.
xscmd -c showMapSizes -cep cathost:2809 -g myGrid -ms myMapSet
- Set trace strings:
Use the -c setTraceSpec option
to set the trace settings for all JVMs that match the filter specified
for the xscmd command. This setting changes the
trace settings only, until another command is used or the JVMs modified
fail or stop.
xscmd -c setTraceSpec -spec ObjectGrid*=event=enabled -cep cathost:1099
–g myGrid –hf host1
This string enables trace for all
JVMs on the server with the specified host name, in this case host1.
- Display unassigned shards:
Use the
-c
showPlacement -sf U option to display the list of shards
that cannot be placed on the data grid. Shards cannot be placed when
the placement service has a constraint that is preventing placement.
For example, if you start JVMs on a single physical server while in
production mode, then only primary shards can be placed. Replicas
are not assigned until JVMs start on a second physical server. The
placement service places replicas only on JVMs with different IP addresses
than the JVMs that are hosting the primary shards. Having no JVMs
in a zone can also cause shards to be unassigned.
xscmd -c showPlacement -sf U -cep cathost:2809 –g myGrid