If you are running a custom system service as an EGO service, you can specify a script to gracefully shut down service instances. There is also a configurable timeout parameter during which the system waits for the target instance to exit. The system kills the target instance if it is still running after the timeout expires. This feature is implemented by the JobController and ControlWaitPeriod parameters.
When the service controller wants to kill (shut down) a service, it gives the service a "grace period" that is defined by <ego:ControlWaitPeriod>. The job control command will be started on the same host with the same initial environment as the container to be terminated. After the grace period has passed, if the instance container is still alive, SIGKILL is sent to terminate the container.
This script is prepared by the user to perform cleanup. If the job controller fails, EGO will forcibly terminate the service instance.
Defines the grace period. If the instance container is still alive after the grace period has passed, PEM sends SIGKILL to terminate the container. Also, the job controller itself will be killed when SIGKILL is sent. The format of ControlWaitPeriod is PTXHYMZS, which means X hours, Y minutes and Z seconds. For example, PT10M0s means 10 min and PT60s means 60 sec. The range is 0~1hour; if the setting is out of this range, the service will not be loaded by egosc.
The ControlWaitPeriod and JobController parameters can be added to the Service Defintion file through the Platform Management Console. Follow the steps outlined in Update a service to add and configure the parameters.
Check if your service can be stopped after a grace period instead of being killed immediately. The grace period might have 5+ seconds delay.
Also, you can check the egosc log under $EGO_ESRVDIR/esc/log. You should see:
2009-04-01 09:18:46.000 CST WARN [13769] do_containerStateChange(): on host <bjg270-01>, the container <3> belongs to instance <1> of service <plc> terminated, reason <Terminated by job controller>, status <1>
The message above means that the job controller has finished its clean up and terminated the service successfully.
Check the egosc log under $EGO_ESRVDIR/esc/log. You should see:
2009-04-01 09:17:17.000 CST WARN [13769] do_containerStateChange(): on host <bjg270-01>, the container <9> belongs to instance <1> of service <test> terminated, reason <Terminated by SIGKILL, job controller does not exist or failed>, status <0>
Only ControlWaitPeriod was added to the Service Definition file.
Check the egosc log for the following message:
2009-04-01 11:49:31.000 CST ERROR [8946] validContainerSpec(): Conflict parameters, controlWaitPeriod is defined but JobController is not defined, refused2009-04-01 11:49:31.000 CST ERROR [8946] loadServiceDefinition(): parse section ServiceDefinition failed
2009-04-01 11:49:31.000 CST ERROR [8946] loadServiceDefinition():parse service definition file /opt/ego/eservice/esc/conf/services/test.xml failed
2009-04-01 11:49:31.000 CST ERROR [8946] loadServices(): failed to load service definition from </opt/ego/eservice/esc/conf/services/test.xml>
ControlWaitPeriod in the Service Definition file is less than 0 or greater than 1 hour.
Check the egosc log for the following message:
2009-04-01 12:25:30.000 CST ERROR [10321] validContainerSpec(): Invalid controlWaitPeriod, refused
2009-04-01 12:25:30.000 CST ERROR [10321] loadServiceDefinition(): parse section ServiceDefinition failed
2009-04-01 12:25:30.000 CST ERROR [10321] loadServiceDefinition():parse service definition file /opt/ego/eservice/esc/conf/services/test.xml failed
2009-04-01 12:25:30.000 CST ERROR [10321] loadServices(): failed to load service definition from </opt/ego/eservice/esc/conf/services/test.xml>
ControlWaitPeriod in the Service Definition file is configured as 0.
If you define ControlWaitPeriod as PT0H0M0S, PT0M0S, or PT0S, EGO will set the value of ControlWaitPeriod to 2 minutes, and at the same time ControlWaitPeriod will be removed by egosc from the Service Definition file.
Only JobController was added to the Service Definition file.
If you did not define ControlWaitPeriod but only JobController, the default value for ControlWaitPeriod is 2 minutes.