Monitoring agents and integration servers

To maximize the throughput of the application, the agents should be monitored to ensure that they are able to process all of the pending tasks within an acceptable time frame. If an agent is not able to process its tasks fast enough, the pending jobs accumulate and cause a bottleneck in the system.

Health monitor agent

The health monitor agent provides the following abilities:

A few of the statistics provided by the health monitor agent are:

The YFS_SNAPSHOT table stores the statistical details of pending tasks of transactions collected by the agent servers. The parameter CollectPendingJobs in time-triggered agents controls whether records are inserted in the table. The health monitor deletes the records from this table after the default purge interval of 30 days.

The heartbeat records in the YFS_HEARTBEAT table are also purged by the health monitor agent with a default purge interval of 30 days.

The health monitor schedules a purge once every 24 hours to purge the snapshot and heartbeat records that are older than 30 days. To change this purge interval from 30 days to suit your needs, use the <INSTALL_DIR>/properties/customer_overrides.properties file to set the following property:

yantra.hm.purge.interval=<value>

Server heartbeat

System Management tracks the status of the agent and integration servers by recording the server "heartbeat" while the server is running. If the server goes down, the heartbeat stops getting recorded. If a server with the same name is brought back up, the heartbeat resumes. For more information about purging heartbeat records, see Health monitor agent.

Alert when agent or integration server terminates unexpectedly

It is possible to configure a service to be run whenever an agent or integration server goes down unexpectedly. This service can perform many tasks, including sending an e-mail message to a system administrator or creating an alert in a system administrator's inbox. For more information about the data available for the service, see Data published for health monitor alerts.

Shutting down an agent or integration server through the System Management console (or pressing Ctrl+C on the command line window) does not generate an alert.

Agent pending tasks

The number of pending tasks of every agent is recorded during every persist interval, unless the CollectPendingJobs criteria parameter for the agent is set to N in the Agent Criteria Details.

Alert when the pending tasks threshold is exceeded

It is possible to configure a service to be run if the number of pending tasks for an agent goes above a threshold limit. This service can perform many tasks, including sending an e-mail message or creating an alert for a system administrator. For more information about the data available for the service, see Data published for health monitor alerts.

Other agent statistics

System Management records the processing rate for each agent during each persist interval.

Additionally, some of the most important agents record statistics that are specific to that agent. For example, the schedule order agent records the number of orders scheduled and number of orders backordered during each persist interval.

Integration server statistics

System Management records the processing rate as well as the minimum, maximum, and average response times for integration servers for each persist interval.

It is not possible to set a threshold or configure a service to be run for any of the statistics collected for integration servers.