Fix (APAR): PK65439 Status: Fix Release: 6.0.2.31 Operating System: AIX,HP-UX,Linux,Solaris,Windows Supersedes Fixes: CMVC Defect: 510343 Byte size of APAR: 403308 Date: 2008-11-19 Abstract: When messages are transferred between messaging engines within a service integration bus, or over an inter-bus link between buses, messages are queued within one messaging engine w Description/symptom of problem: PK65439 resolves the following problem: ERROR DESCRIPTION:? Insufficient information is output in the JVM logs of an application server when issues occur on a connection between service integration bus messaging engines. Messages can build up within a messaging engine for transfer to a remote messaging engine (within the same bus or connected over an inter-bus link) without any entries being displayed in the JVM logs. For inter-bus links the transmission stream where these messages are stored while awaiting transfer does not have an administrative interface, so there is no mechanism to query the number of messages built up. LOCAL FIX:? PROBLEM SUMMARY:? USERS AFFECTED: Users of the default messaging provider for WebSphere Application Server V6.0 or V6.1, with multiple members of a service integration bus or multiple buses connected via inter-bus links. PROBLEM DESCRIPTION: When messages are transferred between messaging engines within a service integration bus, or over an inter-bus link between buses, messages are queued within one messaging engine while they await transmission to the remote messaging engine. For connections between messaging engines in a bus, these messages are queued on ?remote queue points? which can be viewed in the administrative console. In WebSphere Application Server V6.0 and V6.1, messages awaiting transmission over inter-bus links cannot be viewed in the administrative console. If an issue occurs which prevents messages from flowing between messaging engines, the amount of information currently available makes it difficult to detect and resolve the problem. This APAR adds information to the JVM logs of an application server hosting a messaging engine to help detect and resolve problems of this type. Some of the messages are produced by default with the APAR applied, as they signify events which are unexpected under normal operation. Others can be enabled using tuning parameters to provide additional information in a system where a problem is under investigation, or to provide additional monitoring of critical connections. RECOMMENDATION: Users who have experienced problems with a connection between messaging engines (including inter-bus links), or require detailed monitoring or performance tuning of a critical connection, are recommended to set the following tuning parameters on the messaging engines on both sides of the connection. sib.processor. logAllMessageDepthIntervals=5000 sib.processor.logDepthThresholdEvents=link sib.processor. blockedCommittingMessageInterval=60000 sib.processor. logUnresolvedGapsInTransmissionStreams=5000 With these options enabled, the number of log messages written to the JVM logs of the application server hosting the messaging engine will increase. You can use this information directly by referring to the details for each message in the problem summary section, and this information will provide additional information to IBM Service if you were to experience an issue with a messaging engine connection. The following areas have been identified where problem diagnosis for a messaging engine connection is difficult. 1) Detecting build-up of messages awaiting transfer: A key indicator of a problem on a connection between messaging engines is the build-up of messages on the transmitting end of the connection. The earlier this can be detected, the more likely a problem can be resolved before any limits are reached which would cause producers to fail to send messages. The information currently available makes detecting this condition difficult. 2) Detecting a high message threshold has been reached: Once the number of messages queued for transmission over a link has reached the high message threshold, producers will fail to send messages to the destination with a SIMPLimitExceededException in the stack of the exception. Unless the application logs these exceptions, there is no information in the JVM logs of the application server to show the limit has been reached. 3) Identifying indoubt transactions blocking transmission: An indoubt transaction is one between the prepare and commit phases of a two-phase commit. Although this state is usually very short lived, there are circumstances where manual intervention is required to resolve a transaction that has entered this state. When the send of a message between messaging engines is involved in an indoubt transaction, messages stop flowing over a connection until it is resolved. This allows the bus to maintain order of delivery over the connection. It is currently difficult to identify the transaction which requires manual intervention in order to resolve it. 4) Identifying a gap in the ordered stream of messages: Both sides of a messaging engine to messaging engine connection maintain state related to the connection, in order to ensure exactly once delivery of messages to their target destination in the order they were sent. This includes assigning a unique sequence number to each message which flows over the link. A messaging engine receiving messages over a connection cannot complete delivery of a message with a particular sequence number until all previous messages have been delivered. The protocol used by the bus takes account for circumstances which can cause messages to arrive over the network connection out of order, or where network issues or server restarts prevent individual messages in the stream from being delivered. Any issue with this logic could theoretically prevent the connection from continuing to deliver new messages, and there could be insufficient information available in the logs of the application servers to detect the issue. 5) Tuning the efficiency of connections: It is possible for the protocol used to transfer messages between messaging engines to become inefficient under certain circumstances. It is currently difficult to detect this inefficiency and hence perform tuning to improve performance. PROBLEM CONCLUSION:? This APAR introduces new messages and tuning parameters described below. For details of setting the tuning properties see the "Setting tuning properties of a messaging engine" section of the information center here: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/topic/com. ibm.websphere.pmc.nd.doc/tasks/tjk0120_.html 1) Detecting build-up of messages awaiting transfer: Property: sib.processor.logAllMessageDepthIntervals Allowed values: Any positive integer Default: 0 (disabled) When this property is set to a positive integer, one of the messages below will be logged for any destination or transmission stream when its depth increases to a multiple of the interval, and again when the depth drops below the next lowest multiple of the interval. For destinations hosted on the messaging engine: CWSIP0787I: Destination {0} on messaging engine {1} has reached a depth of {2} messages. Explanation: The message point for the destination has reached the stated message depth User response: No action to be taken. For destinations hosted on another messaging engine in the bus: CWSIP0788I: {0} messages queued on messaging engine {1} for transmission to destination {2} on messaging engine {3}. Explanation: The remote message point for the destination has reached the stated message depth User response: No action to be taken. For destinations in another bus connected via an inter-bus link: CWSIP0789I: {0} messages queued on messaging engine {1} for transmission to foreign bus {2} on link {3}. Explanation: The link to the foreign bus has reached the stated message depth User response: No action to be taken. It is also possible to customize this behavior for an individual destination or foreign bus, using a tuning parameter with the following name: sib.processor.logMessageDepthIntervals.DEST_OR_FOREIGNBUS Where DEST_OR_FOREIGNBUS is the name of a destination, or the name of an individual foreign bus. 2) Detecting a high message threshold has been reached: Property: sib.processor.logDepthThresholdEvents Allowed values: 'links', 'on', 'off' Default: 'links' When a destination reaches the high message threshold (hence preventing further messages from being sent via this destination) a message may be logged depending on the setting of this tuning parameter. A value of 'off' prevents the message from being logged for any destinations. A value of 'links' (default) causes a message to be logged when the transmission stream of an inter-bus or WMQ links reaches the high message threshold. A value of 'on' causes a message to be logged if any destination or transmission stream reaches the high message threshold. One of the following existing messages is logged: CWSIP0553W, CWSIP0555W, CWSIP0557W, CWSIP0559W 3) Identifying indoubt transactions blocking transmission: Name: sib.processor.blockedCommittingMessageInterval Allowed values: Any positive integer (milliseconds) Default: 300000 (5 minutes) If a transaction is blocking transmission of messages for longer than the number of milliseconds specified in the tuning parameter, the following message is logged: CWSIP0785W: A message held on messaging engine {1} for transmission to a remote destination or foreign bus {0} has remained in committing state for {2} seconds under transaction {3}. Further messages may not flow until this transaction completes. Explanation: Any messages for the remote destination or foreign bus sent after the blocked message will not be transmitted until the transaction is resolved. User response: Resolve the transaction identified as blocking the message If a transaction previously blocking transmission is later resolved (without restarting the messaging engine) the following message is also logged: CWSIP0786I: Messages being sent to the remote destination or foreign bus {0} from messaging engine {1} are no longer blocked by transaction {2}. Explanation: The previously reported unresolved transaction has now been resolved and message transmission has been resumed. User action: No action to be taken. Please review the "Resolving indoubt transactions" topic in the information center for more information on identifying and resolving indoubt transations: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/topic/com. ibm.websphere.pmc.nd.doc/tasks/tjm0165_.html 4) Identifying a gap in the ordered stream of messages: Property: sib.processor.logUnresolvedGapsInTransmissionStreams Allowed values: Any positive integer (milliseconds) Default: 0 (disabled) With this parameter set, a messaging engine receiving messages over a connection will write a log entry if a gap in the ordered stream of messages is detected and is not resolved within the specified interval. As short lived gaps are expected under normal operation, it is recommended that the parameter should be set to a value greater than 5000ms. This should avoid excessive output being written to the logs. If a long lived gap is detected, and is subsequently resolved, a second log entry is written to show when the gap was resolved. If the messaging engine that is the source of the messages is currently unavailable (or the inter-bus link is not running), unresolved gaps are expected until the sending messaging engine or the inter-bus link is re-started. When a long lived gap is detected on an inter-bus link the following log entry is written: CWSIP0790W: Messaging engine {3} has detected a gap in the stream of messages received from bus {1} on link {2}. Requests made to fill this gap have yet to be satisfied. The gap starts at sequence id {0}. Explanation: The messaging engine receiving messages over the inter-bus link has detected a gap in the sequence ids of messages received. Requests for re-delivery of one or more messages have been made to the sending messaging engine but the gap has not yet been filled. If the inter-bus link is not currently running, the gap will not be resolved until the inter-bus link is restarted. User response: Restart the link. The log message written when the long lived gap is resolved is: CWSIP0791I: The gap starting at sequence id {0} in the message stream from bus {1} on link {2} has been resolved on messaging engine {3}. Explanation: A previously reported gap in the sequence of messages has now been resolved, subsequent messages will now be processed. User response: No action to be taken. When a long lived gap is detected on a connection between messaging engines in a bus the following log entry is written: CWSIP0792W: Messaging engine {3} has detected a gap in the stream of messages received from messaging engine {2} for destination {1}. Requests made to fill this gap have yet to be satisfied. The gap starts at sequence id {0}. Explanation: The messaging engine receiving messages for the destination has detected a gap in the sequence ids of messages received. Requests for re-delivery of one or more messages have been made to the sending messaging engine but the gap has not yet been filled. If the messaging engines are unable to communicate the gap will not be resolved until this is resolved. User response: Restart the sending messaging engine if it is stopped and ensure the connection between the two messaging engines is active. The log message written when a long lived gap is resolved is: CWSIP0793I: The gap starting at sequence id {0} in the message stream for destination {1} from messaging engine {2} has been resolved on messaging engine {3}. Explanation: A previously reported gap in the sequence of messages has now been resolved, subsequent messages will now be processed. User response: No action to be taken. 5) Tuning the efficiency of connections: Property: sib.processor.repeatedMessagePercentage Allowed values: Integer between 0 and 100 Default: 0 (disabled) Property: sib.processor.repeatedMessageInterval Allowed values: Positive integer (number of messages) Default value: 2000 When the sib.processor.repeatedMessagePercentage property is set to a percentage value, a messaging engine will write a log entry if the combination of loads on the link, buffer sizes and network speed has resulted in more than that percentage of messages to be sent over the network multiple times. The percentage is measured over the sample size defined in the sib.processor.repeatedMessageInterval property. A log entry can be written a maximum of once every 5 minutes for a particular messaging engine to messaging engine connection. A high percentage of repeated messages can occur after re-starting a messaging engine or inter-bus link when there are many messages queued for transmission to or from that messaging engine. This occurs as a result of the temporary high load placed on the connection while the backlog of messages are transmitted. Once any backlog of messages has been cleared, the connection returns to its steady state and the percentage of repeated messages should return to a lower level. For inter-bus links the message is: CWSIP0794W: {0} percent of the messages received by messaging engine {3} from bus {1} over inter-bus link {2} have repeatedly been transmitted over the inter-bus link. Explanation: A high percentage of messages sent from the foreign bus have previously been received by this messaging engine, messages will not be duplicated but the performance of the messaging system may be reduce. This message can occur when a sudden burst of messages is sent over the link, repeated occurrences of this message may indicate the throughput of the link is being exceeded on a sustained basis. User response: Investigate whether the rate of production of messages for transfer over this link is too high. For connections between messaging engines within a bus the message is: CWSIP0795W: {0} percent of the messages received by messaging engine {2} for destination {3} transmitted from messaging engine {1} have repeatedly been transmitted to the messaging engine. Explanation: A high percentage of messages sent from the remote messaging engine have previously been received by this messaging engine, messages will not be duplicated but the performance of the messaging system may be reduce. This message can occur when a sudden burst of messages is transmitted to a messaging engine. Repeated occurrences of this message may indicate the maximum message throughput is being exceeded on a sustained basis. User response: Investigate whether the rate of production of messages to this messaging engine is too high. If a suspected inefficiency is detected using these tuning parameters, the following additional tuning parameters introduced in this APAR can be used to change the behavior of the link to increase efficiency: Property: com.ibm.ws.sib.jfapchannel.RL_DISPATCHER_MAXQUEUESIZE_ME Allowed values: Positive integer (bytes) Default: Dynamically calculated value This parameter determines how many messages and control flows the sending side of the connection can place into an in-memory buffer for transmission over the network. Users experiencing inefficiency are recommended to test with a small buffer size. For example, start with a value of ?96? and then tune and test iteratively. Property: sib.processor.transmissionStreamGapEagerness Allowed values: Positive integer (milliseconds) Default: 200 This parameter determines how long a receiving messaging engine will wait before it re-requests a message that is expected (due to its sequence number) and has not yet been received. In a high throughput or low network speed environment, increasing this value can reduce the likelihood of a message being re-requested while it is already queued for transfer over the network connection. Be aware that increasing this value can increase the time taken to automatically resolve a gap which occurs due to a network error or restart of one end of the link. The fix for this APAR is currently targeted for inclusion in fixpack 6.0.2.33 and 6.0.2.21. Please refer to the Recommended Updates page for delivery information: http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980 Directions to apply fix: NOTE: Choose the: 1) Release the fix applies to 2) The Editions that apply 3) Delete the Editions & Methods that do not apply and this Note Fix applies to Editions: Release 6.0 __ Application Server (Express or BASE) __ Network Deployment (ND) __ WebSphere Business Integration Server Foundation (WBISF) __ Edge Components __ Developer __ Extended Deployment (XD) Install Fix to: Method: __ Application Server Nodes __ Deployment Manager Nodes __ Both NOTE: The user must: * Have Administrative rights in Windows, or be the Actual Root User in a UNIX environments. * Logged in with the same authority level when unpacking a fix, fix pack or refresh pack. * Be at V6.1.0.13 or newer of the Update Installer. Certain iFixes may require a newer version of the Update Installer and the Update Installer will inform you during the installation process if a newer version is required. This can be checked by reviewing the level of the Update Installer in file /updateinstaller/version.txt. The Update Installer can be downloaded from the following link: http://www.ibm.com/support/docview.wss?rs=180&uid=swg21205991 For detailed instructions to Extract the Update Installer see the following Technote: http://www.ibm.com/support/docview.wss?rs=180&uid=swg21205400 Note that there are two different methods for delivering iFixes, depending on the contents. The fix may be delivered either as a single file with a .pak extension (such as 6.1.0.11-WS-WAS-IFPK12345.pak) or a single file with a .zip extension (such as 6.1.0.11-WS-WAS-IFPK12345.zip) which then contains one or more files with a .pak extension. 1) If your iFix is delivered as a single file with a .pak extension, Copy the .pak file directly to the maintenance directory. If your iFix is delivered as a single file with a .zip extension, unzip the file into the maintenance directory. 2) Shutdown WebSphere Manually execute setupCmdLine.bat in Windows or . ./setupCmdLine.sh in Unix from the WebSphere instance that maintenance is being applied to. 3) Launch Update Installer and click the Next button on the Welcome page. 4) Enter the directory path of the installation location of the WebSphere product you want to update, and click the Next button. 5) Select the "Install maintenance package" operation and click the Next button. 6) Enter the directory path of your maintenance directory where you have the maintenance packages (.pak files) and click the Next button. 7) The Available Maintenance Package to Install page should list all maintenance packages (.pak files) that it finds in the directory path provided in the previous step. The Update Installer will select the correct maintenance packages based on your system configuration and will not allow an invalid combination to be installed. Please keep the Update Installer recommendations and click the Next button and continue with the installation of the maintenance package. 8) Please note that in the future, if a Feature Pack is installed or uninstalled, a different set of iFixes will be needed. Use the Update Installer again at that time, with the maintenance directory location where these maintenance packages are stored, to determine the required interim fixes for the new WebSphere and Feature Pack(s) combination. 9) The maintenance packages could have one of a set of names, and these names will help determine which maintenance package you need to install. The APAR name (PKxxxxx) should appear as part of the filename. Between the APAR number and the .pak extension there will be 0 to 2 characters added. The table below indicates the usage of each of the maintenance packages with respect to which Feature Packs, if any, are installed. |.pak File Names |No Feature Packs|EJB3 Only|WebServices Only| Both | |6.1.0.x-WS-WAS-IFPK12345 | X | X | X | X | |6.1.0.x-WS-WAS-IFPK12345C | X | | | | |6.1.0.x-WS-WAS-IFPK12345C Directions to remove fix: NOTE: * The user must have Administrative rights in Windows, or be the Actual Root User in a UNIX environments. * FIXES MUST BE REMOVED IN THE ORDER THEY WERE APPLIED * DO NOT REMOVE A FIX UNLESS ALL FIXES APPLIED AFTER IT HAVE FIRST BEEN REMOVED * YOU MAY REAPPLY ANY REMOVED FIX Example: If your system has fix1, fix2, and fix3 applied in that order and fix2 is to be removed, fix3 must be removed first, fix2 removed, and fix3 re-applied. 1) Shutdown WebSphere Manually execute setupCmdLine.bat in Windows or . ./setupCmdLine.sh in Unix from the WebSphere instance that uninstall is being run against. 2) Start Update Installer 3) Enter the installation location of the WebSphere product you want to remove the fix. 4) Select "Uninstall maintenance package" operation. 5) Enter the file name of the maintenance package to uninstall (PKxxxxx.pak). 6) UnInstall maintenance package. 7) Restart WebSphere Directions to re-apply fix: 1) Shutdown WebSphere. 2) Follow the Fix instructions to apply the fix. 3) Restart WebSphere. Additional Information: