A use case provides a context for a recovery scenario. In the use case, a business has an application that receives a request to create a new Account.
The solution is comprised of multiple modules as recommended through module best practices.
The first module mediates the request and delegates work to an Account Creation process. In the example below we have implemented the solution as separate modules where the request is passed between the mediation module (AccountRouting) and the processing module (AccountCreation) via an SCA import/export. See the following screen capture for an illustration of the two modules.
From the assembly diagram shown in Figure 1, you can begin to see at what locations in the flow that failures might occur. Any of the invocation points in the assembly diagram can propagate or involve a transaction. There are a few areas in the flow where data will collect as a result of application or system failures.
In general, transaction boundaries are created and managed by the interaction (synchronous and asynchronous) between components and import/export bindings and their associated qualifiers. Business data accumulates in specific recovery locations most often due to transaction failure, deadlock or rollback.
Transaction capabilities within WebSphere® Application Server help WebSphere ESB enlist transactions with service providers. These enlisted interactions are particularly important to understand with respect to import and export bindings. Understanding how imports and exports are used within your specific business cases is important in determining where events in need of recovery accumulate.
An error handling strategy should define interaction patterns, transactions used, import and export usage prior to developing the application. The solution architect should identify the preferences to use, guidelines, that are then used as the application is created. For example, the architect needs to understand when to use synchronous vs. asynchronous calls, when to use BPEL fault handling and so forth. The architect needs to know whether or not all services can participate in transactions, and for those services that can not participate, how to handle compensation if problems are encountered.
Additionally, the application shown in the assembly diagram in Figure 1 leverages connectivity groups and module development best practices. By leveraging this pattern we now have the ability to stop the inbound flow of new events by stopping the AccountRouting module.
The following sections address the location of business data in the case of failure and recovery.
In our business case, we leverage a BPEL process for AccountCreation process.
Short running processes are known as microflows.
Knowing the answers to these questions will impact your recovery strategy for invocations 7 and 8 shown in the assembly diagram, as highlighted in the screen capture below:
Stateful components, such as Long Running BPEL processes and Business State Machines, involve many database transactions where process activity changes and state changes are committed to the database. The work progresses by updating the database and placing a message on an internal queue that describes what is to be done next.
If there are problems processing messages that are internal to the Business Flow Manager, these messages are moved to a Retention Queue. The system attempts to continue to process messages. If a subsequent message is successfully processed, the messages on the Retention Queue are resubmitted for processing. If the same message is placed on the Retention Queue five times, it is then placed on the Hold Queue.
Additional information about viewing the number of messages and replaying messages can be found in Replaying Messages from the Retention Queue / Hold Queue.
The Failed event manager (FEM) is used to replay events or service invocation requests that are made asynchronously between most component types.
Failed events are created if the AccountRouting component makes an asynchronous call to the SCA Import binding AccountCreationSCAImport and a ServiceRuntimeException is returned.
It is important to note that failed events are not generated in most cases where BPEL is the client in the service interaction. This means that the invocation for 7 and 8 (as shown in Figure 2) will not typically result in a failed event. BPEL provides fault handlers and other ways to model for failure. For this reason, if there is a ServiceRuntimeException (SRE) failure calling "JDBCOutboundInterface", the SRE is returned to the BPEL for processing. The error handling strategy for the project should define how runtime exceptions are consistently handled in BPEL.
It is important to note however, failed events are created for asynchronous response message for the BPEL client if these messages can not be delivered to the process instance due to an infrastructure failure.
The following diagram illustrates how the Failed event manager component works. Descriptions of the processing associated with each numbered step are provided following the diagram.
The retry limit default value is 5 - one original and 4 retries. You can change the default value in the administrative console. For example, given an SCA module M, you could navigate to
and change the value in the Maximum failed deliveries field.When are "failed events" created?
As stated, failed events are neither created for synchronous invocations nor typically for two-way business process interactions.
Failed events are generally created when clients use an asynchronous invocation pattern and a ServiceRuntimeException is thrown by the service provider.
If everything is done synchronously and in the same transaction, data is not collected anywhere. Instead it is all rolled back to the client that made the call. Where ever a commit is occurs, data collects. If the calls are all synchronous, but there are multiple commits, then these commits become an issue.
In general, you should use asynchronous processing calls or long running BPEL if multiple transactions are needed. So each ASYNC call is a chance for data to collect. Long running BPEL process are a collection point.
Invocation Pattern | Failed Event Created Y/N? | Notes |
---|---|---|
Synchronous | No | Failed events are not created for service business exceptions or when using a synchronous pattern |
Asynchronous - One Way | No | By definition, one-way invocations cannot declare faults, meaning, it is impossible to throw a ServiceBusinessException. |
Asynchronous - Deferred Response | No | Failed Events are not created for service business exceptions |
Asynchronous - Callback | No | Failed Events are not created for service business exceptions |
Invocation Pattern | Failed Event Created Y/N? | Notes |
---|---|---|
Synchronous | No | Failed events are not created for service runtime exceptions or when using a synchronous pattern. |
Asynchronous - One Way | Yes | |
Asynchronous - Deferred Response | Yes | |
Asynchronous - Callback | Yes | |
BPEL - Two Way | No | Failed Events are not created when the source
component is a business process.
Note: For an asynchronous call, if
the response can not be returned to BPEL, then a failed event is created.
|
BPEL - One Way | Yes |
Additional information about viewing and resubmitting failed events can be found in section Resubmitting Failed Events.
SCA Module Destination
Again, referring back to our business case.
These destinations are created when the module is deployed to an application server or a cluster.
There are rare opportunities for messages to accumulate in these destinations. The accumulation of messages in these locations is a strong indication that there maybe a performance problem or an application defect. Investigate immediately. It is important to monitor the depth of the module destinations (with your chosen IT monitoring solution) as a back up of messages could lead to a system outage or a prolonged recycle time.
We call these "SCA Module" destinations because the generated name is the same as the module name with the additional "sca/". These destinations are pivotal in the functioning of SCA asynchronous invocations (brokering requests and responses). There are a varying number of additional destinations that are generated during application install on the SCA.SYSTEM bus but for the purpose of the discussion we'll be addressing the importance of the "SCA Module" destination.
System Integration Bus Retry
Referring to our business case, there are a number of SI Bus destinations created by SCA to support asynchronous communication.
As we have learned, one of these destinations is called "sca/AccountRouting" You can adjust the number of retries that happen during a ServiceRuntimeException of an asynchronous service invocation by changing the value of the "Maximum Failed Deliveries" property via the admin console. However, you may not set the value less than 2 in modules with a BPEL process. The second delivery is required to return ServiceRuntimeExceptions back to the BPEL for processing.
System Exception Destinations
The failed event manager is one place where we can look to administer failures. When dealing with Imports and Exports that are JMS or EIS based, we have to consider another important location.
Destinations on the SCA.Application bus are configured to route failed messages to the SIB system exception destination for that bus. Thus, if a JMS export picks up a message from the SCA Application bus and runs into a rollback situation, the failed message will be routed to the SIB system exception destination instead of the WBI recovery exception destination. This scenario differs from the failed event discussion above in that a failure to deserialize a message on the SCA.Application bus will not result in a failed event. There is a system exception destination on every bus within the solution. These destinations must be monitored and administered much like the "dead letter queue" common to MQ infrastructures.
Consider the following scenario.
We can have this type of failure when trying to receive requests from the AccountRoutingJMSExport (1). This export is a JMS export and there is a possibility that events can accumulate on the System Exception Destination on the SCA.Application.Bus. Use the chosen IT monitoring solution to observe the depth of this destination.
Failed Event Manager and SIB Destinations
Node name: WESBNode Server name: server1 Recovery exception destination: WBI.FailedEvent.WESBNode.server1In general, all the destinations created on the SCA.System bus will be configured to route failed messages to the recovery exception destination.
When a system failure occurs, in addition to capturing the failed message in this exception destination, the WebSphere ESB recovery feature also generates a failed event that represents the system error and stores it into the Recovery database as described in the Failed Event Manager section of this document.
In summary, WebSphere ESB provides administrative capabilities above and beyond the underlying WebSphere Application Server platform. Proper measures should be made to understand and use these capabilities along with following the guidance provided in the Planning error prevention section of Planning error prevention and recovery.
Administrative Capability | Bundled With WebSphere ESB Y/N? | Summary |
---|---|---|
Failed Event Manager | Yes | Read/Edit/Delete Access. This is the central place to administer Service Runtime Exceptions and other forms of infrastructure failures. |
Service Integration Bus Browser |
Yes |
Read/Delete. Use the Service Integration Bus Browser on the administrative console for browsing and performing day-to-day operational tasks on service integration buses. |