Why does session recovery fail after the SSM dies?

If there is a lot of persisted workload data,i.e., recoverable sessions and tasks to be recovered, it is possible that the SSM is taking a longer time than the default timeout to recover the data. In this case, the SD terminates the SSM process, which is busy with recovery and starts a new one that also needs to recover the same workload data and most likely cannot finish within the timeout as well.

The SSM “startUpTimeout” attribute in the application profile is set to 60 sec by default, meaning the SD will terminate and restart the SSM process if the newly started SSM does not register with the SD within the configured timeout. It may be necessary to configure an "appropriately” longer timeout so that the SSM can finish recovery without being killed. This timeout should also be short enough so that the SD can retry to start SSM if the failure was caused by an exception, such as an environment error.