activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Yaussy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AMQ-443) ReliableTransport / KeepAlive algorithm does not work properly.
Date Thu, 15 Jun 2006 12:43:51 GMT
    [ https://issues.apache.org/activemq/browse/AMQ-443?page=comments#action_36397 ] 

Kevin Yaussy commented on AMQ-443:
----------------------------------

Yes - and so far the 4.0 approach is working very well in this respect.

> ReliableTransport / KeepAlive algorithm does not work properly.
> ---------------------------------------------------------------
>
>          Key: AMQ-443
>          URL: https://issues.apache.org/activemq/browse/AMQ-443
>      Project: ActiveMQ
>         Type: Bug

>   Components: Transport, Broker
>     Versions: 3.2, 3.2.1
>  Environment: Solaris 8 / 10.  JDK 1.5
>     Reporter: Kevin Yaussy
>      Fix For: 4.0
>  Attachments: KeepAliveDaemon.java, ReliableTransportChannel.java
>
>
> The current implementation of KeepAliveDaemon.java will sometimes force disconnections
on well behaved connections.  The problem may arrise if there is a connection which goes away,
and the KeepAlive send to that channel blocks while attempting to reconnect.  If this reconnection
takes a while, then other channels that were responding fine may get their connections broken.
 This happens due to the following code in KeepAliveDaemon.java:
> 		if ((channel.getLastReceiptTimestamp() + channel.getKeepAliveTimeout() * 2) < System.currentTimeMillis())
{
> or
> 		} else if ((channel.getLastReceiptTimestamp() + channel.getKeepAliveTimeout()) <
System.currentTimeMillis()) {
> The fact that the receipt timestamp is checked against System.currentTimeMillis() causes
the code to break otherwise good connections.  If a KeepAlive send (in examineChannel) for
a broken channel takes longer than some good channel's KeepAliveTimeout, then the good connection
gets broken.
> This can, in turn, cause some pretty bad behavior in the Broker.  While testing and diagnosing
this problem, I could some brokers in a network of brokers stuck.  The sequence of events
during recovery, which get interrupted due to closing the connections, would sometimes lead
to the broker hanging waiting for a receipt, such as during an addConsumer (which eventually
calls syncSendWithReceipt).
> I have redone the logic in KeepAliveDaemon.java (which required a small change to ReliableTransportChannel
as well).  This now seems to work.
> I'm a bit concerned about the blocking calls, though.  This may be a different issue
/ bug.  I thought it looked like there was a mechanism to cancel outstanding receipt waiters
- but, every once in a while that mechanism would not get called.  This results in the broker
basically getting stuck, and does not ever really recover.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   https://issues.apache.org/activemq/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message