activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiram Chirino (JIRA)" <>
Subject [jira] Resolved: (AMQ-443) ReliableTransport / KeepAlive algorithm does not work properly.
Date Thu, 15 Jun 2006 03:47:51 GMT
     [ ]
Hiram Chirino resolved AMQ-443:

    Fix Version: 4.0
     Resolution: Fixed

4.0 Has implemented a more robust keepalive solution.  KeepAlive packets are only sent when
the transport has been idle.  Also, while the transport is performing a blocking opperation
it is not considered idle.

> ReliableTransport / KeepAlive algorithm does not work properly.
> ---------------------------------------------------------------
>          Key: AMQ-443
>          URL:
>      Project: ActiveMQ
>         Type: Bug

>   Components: Transport, Broker
>     Versions: 3.2, 3.2.1
>  Environment: Solaris 8 / 10.  JDK 1.5
>     Reporter: Kevin Yaussy
>      Fix For: 4.0
>  Attachments:,
> The current implementation of will sometimes force disconnections
on well behaved connections.  The problem may arrise if there is a connection which goes away,
and the KeepAlive send to that channel blocks while attempting to reconnect.  If this reconnection
takes a while, then other channels that were responding fine may get their connections broken.
 This happens due to the following code in
> 		if ((channel.getLastReceiptTimestamp() + channel.getKeepAliveTimeout() * 2) < System.currentTimeMillis())
> or
> 		} else if ((channel.getLastReceiptTimestamp() + channel.getKeepAliveTimeout()) <
System.currentTimeMillis()) {
> The fact that the receipt timestamp is checked against System.currentTimeMillis() causes
the code to break otherwise good connections.  If a KeepAlive send (in examineChannel) for
a broken channel takes longer than some good channel's KeepAliveTimeout, then the good connection
gets broken.
> This can, in turn, cause some pretty bad behavior in the Broker.  While testing and diagnosing
this problem, I could some brokers in a network of brokers stuck.  The sequence of events
during recovery, which get interrupted due to closing the connections, would sometimes lead
to the broker hanging waiting for a receipt, such as during an addConsumer (which eventually
calls syncSendWithReceipt).
> I have redone the logic in (which required a small change to ReliableTransportChannel
as well).  This now seems to work.
> I'm a bit concerned about the blocking calls, though.  This may be a different issue
/ bug.  I thought it looked like there was a mechanism to cancel outstanding receipt waiters
- but, every once in a while that mechanism would not get called.  This results in the broker
basically getting stuck, and does not ever really recover.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message