Return-Path: X-Original-To: apmail-activemq-dev-archive@www.apache.org Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0B0ACE326 for ; Wed, 6 Feb 2013 09:47:15 +0000 (UTC) Received: (qmail 53716 invoked by uid 500); 6 Feb 2013 09:47:14 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 53621 invoked by uid 500); 6 Feb 2013 09:47:14 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 53321 invoked by uid 99); 6 Feb 2013 09:47:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Feb 2013 09:47:13 +0000 Date: Wed, 6 Feb 2013 09:47:13 +0000 (UTC) From: "Rob Waite (JIRA)" To: dev@activemq.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (AMQNET-409) Failover fails under load. Client goes into reconnect loop. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AMQNET-409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Waite updated AMQNET-409: ----------------------------- Description: We were testing failover under load. We have a queue that I placed 500k messages in. I then have 5 consumers pulling them as fast as they can. The operations are noops essentially... so messages are consumed in our case at around 500 messages per second with our test servers. I would bring up the consumers and they would be consuming away. I would then "kill -9" AMQ and then observe the client. From the AMQ web admin... they would come back but you would see the 5 consumers go back and forth between 5 and 0 and comsumption went down from 500/s to about 3/s. Looking at AMQ logs... there was a series of errors like this " Async error occurred: javax.jms.JMSException: Unmatched acknowledge: MessageAck" and then there would be a couple variations of exceptions regarding the connection being lost (EOFException, broken pipe, etc). There would be an endless loop creating this pattern.. a message about an unmatched ack... a message about a socket dying... then reconnection from the client... on and on. I can provide the full stack traces but I have already found the line that seems to cause the client to reconnect, fail, disconnect ad infinitum. In our particular case... there is a line in the NMS.ActiveMQ source in Connection.cs line 950 (in Apache.NMS.ActiveMQ-1.5.6-src) that looks like: {code:title=Connection.cs|borderStyle=solid} Tracer.DebugFormat("Connection[{0}]: Async exception with no exception listener: " + error, this.ConnectionId); {code} The "error" variable was populated with: {noformat} "Apache.NMS.NMSConnectionException: Unmatched acknowledge: MessageAck {commandId = 1146, responseRequired = false, ackType = 2, consumerId = ID:kcccccc-54512-634956893773509011-1:10:3:1, firstMessageId = ID:qaaaaa-54879-634956774946428300-1:0:1:2:60645, lastMessageId = ID:qaaaaaa-ops01-54879-634956774946428300-1:0:1:2:60645, destination = queue://AAA.RwwTest.Failover, transactionId = null, messageCount = 1, poisonCause = null}; Could not find Message-ID ID:qaaaaaa-ops01-54879-634956774946428300-1:0:1:2:60645 in dispatched-list (start of ack)" {noformat} Notice that there are curly braces in there... when you try to concat this with the first parameter in the Tracer.DebugFormat call you get a System.FormatException with the message "Input string was not in a correct format." Changing the line to the following seemed to fix the particular case we were seeing.. but I imagine other use cases may cause the same: {code:title=Connection.cs|borderStyle=solid} Tracer.DebugFormat("Connection[{0}]: Async exception with no exception listener: {1}", this.ConnectionId, error); {code} It seems odd that String.Format style logging is used but a string is still concatenated. This seems to happen in a few places in the code. I would think that either the Tracer should handle this exception or trim illegal characters or DebugFormat should not use concatenation for the first param. I have not dug incredibly deeply into the source... but I imagine when an exception was thrown by the log call... it bubbled up and pissed something off. The connection seems to have been closed by the client... but it did not remove state from the previous attempt. When it reconnected, the same occurred and then there was the looping effect. was: We were testing failover under load. We have a queue that I placed 500k messages in. I then have 5 consumers pulling them as fast as they can. The operations are noops essentially... so messages are consumed in our case at around 500 messages per second with our test servers. I would bring up the consumers and they would be consuming away. I would then "kill -9" AMQ and then observe the client. From the AMQ web admin... they would come back but you would see the 5 consumers go back and forth between 5 and 0 and comsumption went down from 500/s to about 3/s. Looking at AMQ logs... there was a series of errors like this " Async error occurred: javax.jms.JMSException: Unmatched acknowledge: MessageAck" and then there would be a couple variations of exceptions regarding the connection being lost (EOFException, broken pipe, etc). There would be an endless loop creating this pattern.. a message about an unmatched ack... a message about a socket dying... then reconnection from the client... on and on. I can provide the full stack traces but I have already found the line that seems to cause the client to reconnect, fail, disconnect ad infinitum. In our particular case... there is a line in the NMS.ActiveMQ source in Connection.cs line 950 (in Apache.NMS.ActiveMQ-1.5.6-src) that looks like: {code:title=Connection.cs|borderStyle=solid} Tracer.DebugFormat("Connection[{0}]: Async exception with no exception listener: " + error, this.ConnectionId); {code} The "error" variable was populated with: {noformat} "Apache.NMS.NMSConnectionException: Unmatched acknowledge: MessageAck {commandId = 1146, responseRequired = false, ackType = 2, consumerId = ID:kcccccc-54512-634956893773509011-1:10:3:1, firstMessageId = ID:qaaaaa-54879-634956774946428300-1:0:1:2:60645, lastMessageId = ID:qaaaaaa-ops01-54879-634956774946428300-1:0:1:2:60645, destination = queue://AAA.RwwTest.Failover, transactionId = null, messageCount = 1, poisonCause = null}; Could not find Message-ID ID:qaaaaaa-ops01-54879-634956774946428300-1:0:1:2:60645 in dispatched-list (start of ack)" {noformat} Notice that there are curly braces in there... when you try to concat this with the first parameter in the Tracer.DebugFormat call you get a System.FormatException with the message "Input string was not in a correct format." Changing the line to the following seemed to fix the particular case we were seeing.. but I imagine other use cases may cause the same: {code:title=Connection.cs|borderStyle=solid} Tracer.DebugFormat("Connection[{0}]: Async exception with no exception listener: {1}", this.ConnectionId, error); {code} It seems odd that String.Format style logging is used but a string is still concatenated. This seems to happen many places in the code. I would think that either the Tracer should handle this exception or trim illegal characters or DebugFormat should not use concatenation for the first param. I have not dug incredibly deeply into the source... but I imagine when an exception was thrown by the log call... it bubbled up and pissed something off. The connection seems to have been closed by the client... but it did not remove state from the previous attempt. When it reconnected, the same occurred and then there was the looping effect. > Failover fails under load. Client goes into reconnect loop. > ----------------------------------------------------------- > > Key: AMQNET-409 > URL: https://issues.apache.org/jira/browse/AMQNET-409 > Project: ActiveMQ .Net > Issue Type: Bug > Components: ActiveMQ > Affects Versions: 1.5.6 > Environment: Should not matter. AMQ is on Linux, Client is on Windows with .NET 4 > Reporter: Rob Waite > Assignee: Jim Gomes > Attachments: Tracer.cs > > > We were testing failover under load. We have a queue that I placed 500k messages in. I then have 5 consumers pulling them as fast as they can. The operations are noops essentially... so messages are consumed in our case at around 500 messages per second with our test servers. > I would bring up the consumers and they would be consuming away. I would then "kill -9" AMQ and then observe the client. From the AMQ web admin... they would come back but you would see the 5 consumers go back and forth between 5 and 0 and comsumption went down from 500/s to about 3/s. > Looking at AMQ logs... there was a series of errors like this " Async error occurred: javax.jms.JMSException: Unmatched acknowledge: MessageAck" and then there would be a couple variations of exceptions regarding the connection being lost (EOFException, broken pipe, etc). There would be an endless loop creating this pattern.. a message about an unmatched ack... a message about a socket dying... then reconnection from the client... on and on. > I can provide the full stack traces but I have already found the line that seems to cause the client to reconnect, fail, disconnect ad infinitum. > In our particular case... there is a line in the NMS.ActiveMQ source in Connection.cs line 950 (in Apache.NMS.ActiveMQ-1.5.6-src) that looks like: > {code:title=Connection.cs|borderStyle=solid} > Tracer.DebugFormat("Connection[{0}]: Async exception with no exception listener: " + error, this.ConnectionId); > {code} > The "error" variable was populated with: > {noformat} > "Apache.NMS.NMSConnectionException: Unmatched acknowledge: MessageAck {commandId = 1146, responseRequired = false, ackType = 2, consumerId = ID:kcccccc-54512-634956893773509011-1:10:3:1, firstMessageId = ID:qaaaaa-54879-634956774946428300-1:0:1:2:60645, lastMessageId = ID:qaaaaaa-ops01-54879-634956774946428300-1:0:1:2:60645, destination = queue://AAA.RwwTest.Failover, transactionId = null, messageCount = 1, poisonCause = null}; Could not find Message-ID ID:qaaaaaa-ops01-54879-634956774946428300-1:0:1:2:60645 in dispatched-list (start of ack)" > {noformat} > Notice that there are curly braces in there... when you try to concat this with the first parameter in the Tracer.DebugFormat call you get a System.FormatException with the message "Input string was not in a correct format." > Changing the line to the following seemed to fix the particular case we were seeing.. but I imagine other use cases may cause the same: > {code:title=Connection.cs|borderStyle=solid} > Tracer.DebugFormat("Connection[{0}]: Async exception with no exception listener: {1}", this.ConnectionId, error); > {code} > It seems odd that String.Format style logging is used but a string is still concatenated. This seems to happen in a few places in the code. I would think that either the Tracer should handle this exception or trim illegal characters or DebugFormat should not use concatenation for the first param. > I have not dug incredibly deeply into the source... but I imagine when an exception was thrown by the log call... it bubbled up and pissed something off. The connection seems to have been closed by the client... but it did not remove state from the previous attempt. When it reconnected, the same occurred and then there was the looping effect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira