activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Serrano (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AMQ-3719) Non failing IOException causes FailoverTransport to hang until real failure occurs
Date Wed, 15 Feb 2012 20:13:00 GMT
Non failing IOException causes FailoverTransport to hang until real failure occurs
----------------------------------------------------------------------------------

                 Key: AMQ-3719
                 URL: https://issues.apache.org/jira/browse/AMQ-3719
             Project: ActiveMQ
          Issue Type: Bug
          Components: Transport
         Environment: Intel(R) Core(TM) i5 CPU M 540 @2.53GHz
8 GB, 64-bit
            Reporter: Martin Serrano
            Priority: Critical
             Fix For: 5.6.0


I have only encountered this failure when the broker is experiencing heavy load and a new
connection attempt is made.

* The FailoverTransport tracks commands that have been issued so that it can restore the state
upon a failure/reconnect event.
* If an IOException occurs when sending a tracked command, the oneway() method returns, assuming
that the IOException is indicative of a transport failure and will result in a failure/reconnect
event.
* Some IOExceptions (like WireFormatNegotiation timesouts) are not always indicative of transport
failure however.  In this case since no subsequent failure/reconnect event occurs, the command
will never be resent.  If this is a synchronous command (like that generated by starting a
connection) the calling thread will hang.

Incidentally, my reading of the code is that only non-tracked commands can generate the IOException
that triggers the handleTransportFailure command.  Is that what we really want?  

My belief is that the IOExceptions should always result in the triggering of the handleTransportFailure,
regardless of origin.

I will attach a unit test and fix shortly.  The test will often fail (i.e. hang) without the
fix, but not always since I use a wireFormat.maxInactivityDurationInitalDelay=1 option to
trigger the behavior.  If the system runs fast enough, it sometimes will not get the timeout.
 I wasn't sure exactly how such a test should be written... or if the test environment has
controls to prevent hanging tests (in case of regression) from hanging a build.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message