|
Problem(Abstract) |
Nullpointer Exception is written into trace.log but all
processes complete successfully:
java.lang.NullPointerException at
com.ibm.bpe.framework.ProcessMDB.processEngineMessage(ProcessMDB.java:675) |
|
|
|
Resolving the
problem |
Exception in trace.log
java.lang.NullPointerException at
com.ibm.bpe.framework.ProcessMDB.processEngineMessage(ProcessMDB.java:675)
(...Stack trace continues)
Quiesce/Resume algorithm
Process Choreographer uses JMS messages to navigate through
multi-transacted processes. If one of these messages cannot be processed,
i.e. the transaction has to be rolled back.
The Process Choreographer implements a Quiesce/Resume algorithm that
makes sure that messages are copied to a hold queue, if and only if they
are poisoned.
In short, the algorithm works as follows:
- Message is retried for 3 times.
- If the message could not be processed for 3 times, it is
removed from the BPEIntQueue and copied to the BPERetQueue. The latter
queue works as a buffer. If the buffer contains more than a configurable
number of messages, the system switches to a quiesce mode (Infrastructure
seems to fail, because there where a number of messages that could not be
processed in a row).
- In quiesce mode and in the normal processing mode, the
next message on the queue is processed. If that works, i.e. the
transaction can be committed, all messages in the retention queue are
copied back to the BPEIntQueue for a retry. (If the system was in quiesce
mode, the successfull processing of the last message signals that the
infrastructure is working again. The system switches back to processing
mode).
- Messages that have passed the retention queue for a
configurable number of times are assumed to be poisoned and are copied
into the BPEHoldQueue.
Reason for the Exception
If messages that are left in the retention queue because they could not
be processed earlier, and any instance data of the process the message
belongs to is deleted (should not happen !), we get the following
scenario:
- An interruptible process is started.
- Messages are put into the BPEIntQueue.
- If one of these messages is processed sucessfully, the
message from the retention queue is copied into the BPEIntQueue and
therefore processed in later.
- The NullpointerException is caused by the fact that the
workflow engine needs a context for the message. If this context does not
exist for any reason, the message cannot be processed. Note that this is
really an exception, because the situation that there is no context in the
database, but messages left in the system, should not occur!
Note that the exception is not caused by the current testcase
which will complete successfully! The message that causes the exception
belongs to a process that is already deleted in the database.
Proposed solution
If there are only a few poisoned messages in the system, do nothing. The
quiesce/resume algorithm will filter them out soon.
(That can be seen in the trace.log:
The retry counter is incremented
Retry Count was set to 1 on EngineMessage
Retry Count was set to 2 on EngineMessage.
Retry Count was set to 3 on EngineMessage.
An the message is copied to the hold queue.)
In test environments, it might be better to simply empty all queues, and
restart the testcase that "caused" that exception. It will then work
without any problems.
The queues can be emptied using the following commands:
dis qlocal('WQ_BPERetQueue') CURDEPTH
clear qlocal('WQ_BPERetQueue') |
|
|
|
|
|
|