Return-Path: X-Original-To: apmail-activemq-dev-archive@www.apache.org Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9E0D2970D for ; Wed, 15 Feb 2012 21:09:27 +0000 (UTC) Received: (qmail 4758 invoked by uid 500); 15 Feb 2012 21:09:27 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 4712 invoked by uid 500); 15 Feb 2012 21:09:27 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 4701 invoked by uid 99); 15 Feb 2012 21:09:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2012 21:09:27 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2012 21:09:21 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C01921B995B for ; Wed, 15 Feb 2012 21:09:00 +0000 (UTC) Date: Wed, 15 Feb 2012 21:09:00 +0000 (UTC) From: "Martin Serrano (Updated) (JIRA)" To: dev@activemq.apache.org Message-ID: <1723915712.42597.1329340140788.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1806785847.42343.1329336780106.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (AMQ-3719) Tracked command IOException causes FailoverTransport to hang until failure occurs for untracked command MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/AMQ-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Serrano updated AMQ-3719: -------------------------------- Summary: Tracked command IOException causes FailoverTransport to hang until failure occurs for untracked command (was: Non failing IOException causes FailoverTransport to hang until real failure occurs) > Tracked command IOException causes FailoverTransport to hang until failure occurs for untracked command > ------------------------------------------------------------------------------------------------------- > > Key: AMQ-3719 > URL: https://issues.apache.org/jira/browse/AMQ-3719 > Project: ActiveMQ > Issue Type: Bug > Components: Transport > Environment: Intel(R) Core(TM) i5 CPU M 540 @2.53GHz > 8 GB, 64-bit > Reporter: Martin Serrano > Priority: Critical > Fix For: 5.6.0 > > Attachments: amq-3719.patch > > > I have only encountered this failure when the broker is experiencing heavy load and a new connection attempt is made. > * The FailoverTransport tracks commands that have been issued so that it can restore the state upon a failure/reconnect event. > * If an IOException occurs when sending a tracked command, the oneway() method returns, assuming that the IOException is indicative of a transport failure and will result in a failure/reconnect event. > * Some IOExceptions (like WireFormatNegotiation timesouts) are not always indicative of transport failure however. In this case since no subsequent failure/reconnect event occurs, the command will never be resent. If this is a synchronous command (like that generated by starting a connection) the calling thread will hang. > Incidentally, my reading of the code is that only non-tracked commands can generate the IOException that triggers the handleTransportFailure command. Is that what we really want? > My belief is that the IOExceptions should always result in the triggering of the handleTransportFailure, regardless of origin. > I will attach a unit test and fix shortly. The test will often fail (i.e. hang) without the fix, but not always since I use a wireFormat.maxInactivityDurationInitalDelay=1 option to trigger the behavior. If the system runs fast enough, it sometimes will not get the timeout. I wasn't sure exactly how such a test should be written...The test will fail if connection does not succeed within 60s -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira