activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Tully (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMQ-3575) Failover transport race condition causes intermittent incomplete bridge connections
Date Wed, 02 Nov 2011 23:27:32 GMT

    [ https://issues.apache.org/jira/browse/AMQ-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142676#comment-13142676
] 

Gary Tully commented on AMQ-3575:
---------------------------------

Can you validate against trunk, with
{code} Assert.assertTrue("Unexpected state: BrokerInfo command was processed", brokerInfoProcessed);{code}
the test works on trunk, which if I understand you correctly, validates that this is fixed.
correct?

In general, using static:failover: has proven to be problematic. failover hides transport
errors but a network bridge is designed to recover from such errors by recreating the bridge
so failover should be configured to not reconnect.
see: https://issues.apache.org/jira/browse/AMQ-3542
                
> Failover transport race condition causes intermittent incomplete bridge connections
> -----------------------------------------------------------------------------------
>
>                 Key: AMQ-3575
>                 URL: https://issues.apache.org/jira/browse/AMQ-3575
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Transport
>    Affects Versions: 5.5.0
>         Environment: CentOS 5.5 and Mac OSX10
>            Reporter: Aaron Phillips
>             Fix For: 5.6.0
>
>         Attachments: FailoverNetworkConnectionRaceConditionTest.java
>
>
> There is a race condition in FailoverTransport.java that sometimes results in preventing
network bridge connections from starting.  This is a serious issue as it was preventing us
from setting up failover connections between brokers.  I would have asked it be critical if
it weren't for a workaround.  The workaround I have found is as follows:
> Turn on activemq thread pooling option to avoid failover bridge connection race condition.
 Change the following property to in your start script to make it false like so.  Somehow
this got me around the problem of the wrong thread sometimes winning:
> -Dorg.apache.activemq.UseDedicatedTaskRunner=false
> I've attached a unit test to be dropped in activemq-core/src/test/java/org/apache/activemq/transport/failover.
 The unit test shows that when a delay is introduced in setting of the TransportListener,
the BrokerInfo command required to complete the bridge connection will never be processed.
 There are two unit tests in this class and both are designed to pass.  The test called "testTcpThreadWinsPreventsCompletionOfBridge"
passes by asserting that it *did not* receive the BrokerInfo command.  You can see through
setting the delay value that you can control whether the Main thread wins (in which case all
is well), or the TCP thread wins (in which case the network bridge is hung and fails to start)
> Note, this issue only affects network bridge connections which are setup with failover
transport, such as a broker that connects to a Master-Slave pair, e.g. failover://(tcp://master:61616,tcp://slave:61616)?randomize=false

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message