The monitoring
policy settings of the application server define how it is monitored
by the node agent. The main properties involved in this are:
- Ping Interval
How often the nodeagent pings the application server process to ensure it
is still running and able to handle requests.
- Ping Timeout
How long the node agent waits for a response from the app server before
determining it is "unreachable".
- Automatic Restart
Should the node agent restart the application server if the application
server process fails.
- Node Restart State
When the node agent starts up, does it also start the application
server.
The node agent monitoring the application server begins with the node
agent startup. The "Node Restart State" property defines whether or not
the nodeagent spawns the application server process during its startup
procedure. If this property is set to RUNNING, then it will attempt to
start the application server:
ADMN1001I: An attempt is made to
launch server1 on node WAS_60_Node01. |
|
When the nodeagent spawns the application server process, the application
server process will be considered the child of the node agent, which is
the parent process. When the nodeagent does not start the application
server (for example: startServer server1), the nodeagent will
"adopt" the application server as its child. The node agent will then wait
for the number of seconds in the "Ping Timeout" property for a response
from the application server process that the startup was successful. This
comes in the form of a discovery message.
ADMD0023I: The system discovered
process (name: server1, type: ManagedProcess, pid: 2556) |
|
If the node agent does not discover the application server process in the
time defined in the "Ping Timeout" property, then it will assume the
application server startup failed. Then the nodeagent will spawn a new
application server process if the "Automatic Restart" property is set to
true. It will continue this procedure for the number of times specified in
the "Maximum startup attempts" property.
Once the node agent discovers the application server process, it updates
it's routing table and begins to monitor the child process. The nodeagent
checks the application server process to ensure it is still up and able to
respond in intervals defined by the "Ping Interval" property.
The JMX communications are done via either SOAP or RMI. This is
determined by the "Preferred Connector" configuration property for each
Application Server. If the Preferred Connector is SOAP (default), then the
communication is done through each server's SOAP_CONNECTOR_ADDRESS port
(default 8880). If the Preferred Connector is RMI, then the communication
is done through each server's BOOTSTRAP_ADDRESS port (default 2809).
The monitoring policy configuration settings are stored in each
application server's server.xml file. Example path for server1
would be:
install_root/config/MyCell/nodes/MyNode/servers/server1/server.xml |
|
The monitoring policy configuration settings appear as follows:
<monitoringPolicy
xmi:id="MonitoringPolicy_1141403412562" maximumStartupAttempts="3"
pingInterval="60" pingTimeout="180" autoRestart="true"
nodeRestartState="STOPPED"/> |
|
Example Startup Trace:
Trace output will be in black, while notes about the output will be in
blue.
Nodeagent Trace: The nodeagent reads the monitoring policy settings
for the application server.
[3/17/06 13:06:02:109 EST] 0000000a
NodeAgent 3 process monitor policy:
server1
PingInterval: 60
PingTimeout: 180
MaximumStartupAttempts: 3
NodeRestartState: 1
PreviousState: -1
AutoRestart: true
[3/17/06 13:06:02:109 EST] 0000000a NodeAgent 3
pName = server1 currentPid = null
[3/17/06 13:06:02:109 EST] 0000000a NodeAgent 3
isRestartingAllServers = false
[3/17/06 13:06:02:109 EST] 0000000a NodeAgent 3
restartState = RUNNING
[3/17/06 13:06:02:109 EST] 0000000a NodeAgent 3
launching server1
<-- The nodeagent launches the server1 process since
restartState = RUNNING
[3/17/06 13:06:02:109 EST] 0000000a NodeAgent >
launchProcess Entry
server1 |
Nodeagent Trace: The application server is launched with process ID
2556.
[3/17/06 13:06:05:547 EST] 0000000a
NodeAgent 3
Launched process pid: 2556
[3/17/06 13:06:05:547 EST] 0000000a NodeAgent 1
addLaunchedChild
serverName=server1
pid=2556
[3/17/06 13:06:05:547 EST] 0000000a NodeAgent >
saveNodeState Entry
[3/17/06 13:06:05:547 EST] 0000000a NodeAgent 3
restartServers = false monitorFile =
C:\WebSphere60/profiles/AppSrv01/logs/nodeagent/monitor.state
[3/17/06 13:06:05:562 EST] 0000000a NodeAgent 3
launchedChildren 2556
[3/17/06 13:06:05:594 EST] 0000000a NodeAgent <
saveNodeState Exit |
Example Monitoring Trace:
Nodeagent Trace: The nodeagent pings the application server to see if
it is up and running using a queryNames JMX call.
[3/17/06 13:07:35:391 EST] 00000059
PidWaiter > contact Entry
2556
<-- PidWaiter contact Entry starts the monitoring process. Look
for the final result when PidWaiter contact returns.
[3/17/06 13:07:35:391 EST] 00000059 SecurityHelpe 3 Getting
server subject.
[3/17/06 13:07:35:391 EST] 00000059 AdminServiceI >
queryNames Entry
WebSphere:type=Server,process=server1,*
...
[3/17/06 13:07:35:859 EST] 00000059 AdminServiceI >
getAttribute Entry
WebSphere:name=server1,process=server1,platform=proxy,node=WAS_60_Node01,j2eeType=J2EEServer,version=6.0.2.7,type=Server,
mbeanIdentifier=cells/WAS_60_Cell01/nodes/WAS_60_Node01/servers/server1/server.xml#Server_1141403412266,
cell=WAS_60_Cell01,processType=ManagedProcess
<-- Sends a JMX call to the application server to check if
server1 is still running |
AppServer Trace: The application server receives the nodeagent's ping
and responds if it is running.
[3/17/06 13:07:35:922 EST] 0000002b
AdminServiceI > getAttribute Entry
WebSphere:name=server1,process=server1,platform=proxy,node=WAS_60_Node01,j2eeType=J2EEServer,version=6.0.2.7,type=Server,
mbeanIdentifier=cells/WAS_60_Cell01/nodes/WAS_60_Node01/servers/server1/server.xml#Server_1141403412266,
cell=WAS_60_Cell01,processType=ManagedProcess
<-- The application server receives the call
...
[3/17/06 13:07:35:922 EST] 0000002b AdminServiceI <
getAttribute Exit
STARTED
[3/17/06 13:07:35:922 EST] 0000002b AdminServiceD <
getAttribute Exit
[3/17/06 13:07:35:922 EST] 0000002b SOAPConnector 3 return
object type = class java.lang.String; value = STARTED
<-- The application server sends back the status |
Nodeagent Trace: The nodeagent receives the server1's response.
[3/17/06 13:07:35:938 EST] 00000059
AdminServiceI < getAttribute Exit
STARTED
[3/17/06 13:07:35:938 EST] 00000059 PidWaiter <
contact Exit
true
<-- The PidWaiter contact method returns true since the server
is started.
[3/17/06 13:07:35:938 EST] 00000059 PidWaiter 3
Pid 2556: For server1, bContact = true isProcessStopping = false
alarmSyncObject = 0 |
Example trace showing server1 process was killed and restarted:
Nodeagent Trace: The nodeagent detects that server1 has been killed
and restarts it.
[3/17/06 13:08:18:094 EST] 00000072
RoutingTable > RemoveChildThread.run Entry
[3/17/06 13:08:18:094 EST] 00000072 RoutingTable 3
RoutingListner.parentRemoved:
com.ibm.ws.management.event.ProcessListener
[3/17/06 13:08:18:094 EST] 00000072 ProcessListen >
childRemoved Entry
{cell=WAS_60_Cell01,
version=6.0.2.7, pid=2556, name=server1, node=WAS_60_Node01,
role=ManagedProcess}
...
[3/17/06 13:08:22:359 EST] 00000072 NodeAgentStat <
childRemoved Exit
...
[3/17/06 13:08:22:344 EST] 00000059 PidWaiter 3
Pid 2556: Process is being relaunched
[3/17/06 13:08:22:344 EST] 00000059 PidWaiter >
reLaunchProcess Entry
2556
[3/17/06 13:08:22:344 EST] 00000059 PidWaiter A
ADML0064I: Restarting an unreachable server "server1". |
|