PQ81567: NodeAgent crashes after repeated start/stop
of clusters and/or servers
Downloadable files
Abstract
This is caused by a timing window/race condition in admin
code that handles process management and monitoring.
Download Description
The problem is a timing window issue when servers are started by a
nodeagent in an multi-node configuration. If the servers are stopped
within the first twenty minutes after they are started, there is a
potential crash that may occur due to memory getting deallocated while it
is still in use. The window is fairly narrow, but it can be hit.
A common way to reproduce it is to create a cluster of servers, start
them, then when it reports all servers are up, stop the cluster. Do it
repeatedly and you will eventually hit the problem. It may happen within
the first few restarts, or it may take days or repeated start/stops within
the twenty minute window.
Prerequisites
Please download the UpdateInstaller below to install this fix.