From dev-return-23047-apmail-activemq-dev-archive=activemq.apache.org@activemq.apache.org Thu Nov 04 19:22:54 2010 Return-Path: Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: (qmail 23728 invoked from network); 4 Nov 2010 19:22:54 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Nov 2010 19:22:54 -0000 Received: (qmail 5561 invoked by uid 500); 4 Nov 2010 19:23:26 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 5513 invoked by uid 500); 4 Nov 2010 19:23:25 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 5505 invoked by uid 99); 4 Nov 2010 19:23:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Nov 2010 19:23:25 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Nov 2010 19:23:22 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oA4JN0F0022927 for ; Thu, 4 Nov 2010 19:23:01 GMT Message-ID: <29173460.7991288898580803.JavaMail.jira@thor> Date: Thu, 4 Nov 2010 15:23:00 -0400 (EDT) From: "Timothy Bish (JIRA)" To: dev@activemq.apache.org Subject: [jira] Commented: (AMQNET-289) Deadlock while sending a message after failover within a consumer In-Reply-To: <28334789.51961286571100401.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: ae95407df07c98740808b2ef9da0087c X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/activemq/browse/AMQNET-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=63060#action_63060 ] Timothy Bish commented on AMQNET-289: ------------------------------------- Applied the suggested fix in trunk. @Daniel if you happen to have a stack trace on those three threads I'd love to see it, would like to ensure there aren't any other points where this sort of thing can happen. > Deadlock while sending a message after failover within a consumer > ----------------------------------------------------------------- > > Key: AMQNET-289 > URL: https://issues.apache.org/activemq/browse/AMQNET-289 > Project: ActiveMQ .Net > Issue Type: Bug > Components: ActiveMQ > Affects Versions: 1.4.1 > Environment: Windows 7 64 bits > Reporter: Morgan Martinet > Assignee: Jim Gomes > Priority: Critical > Fix For: 1.5.0 > > Attachments: deadlock.jpg, SessionExecutor.cs > > > Scenario: > - I have one producer that sends a request (with a temporary queue specified in the Reply-to attribute) to a consumer, in a separate process. > - both, the producer and the consumer, use the following connection string: failover:(tcp://localhost:61616)?timeout=3000 > - the consumer, when processing the request, waits 10 seconds then sends a response back, using the Reply-To attribute. > - immediately after the message has been sent, while the consumer is waiting for 10 secs, I restart the ActiveMQ broker. > - once the the consumer wakes up and tries to send its reply, it will deadlock because of the failover. > We have managed to identify the resources that deadlock: > Thread1 - lock(reconnectMutex) (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\Transport\Failover\FailoverTransport.cs: line 366) > Thread1 - wait on lock(this.consumers.SyncRoot) (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\Session.cs: line 830) > Thread2 - lock(this.consumers.SyncRoot) (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\SessionExecutor.cs: line 147) > Thread2 - wait on lock(reconnectMutex) (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\Transport\Failover\FailoverTransport.cs: line 531) > Patch: > I managed to find a simple fix for this, by moving the consumer dispatch out of the this.consumers.SyncRoot lock in SessionExecutor.cs: > {{ > public void Dispatch(MessageDispatch dispatch) > { > try > { > MessageConsumer consumer = null; > lock(this.consumers.SyncRoot) > { > if(this.consumers.Contains(dispatch.ConsumerId)) > { > consumer = this.consumers[dispatch.ConsumerId] as MessageConsumer; > } > // Note that consumer.Dispatch(...) was moved below, outside of the lock. > } > // If the consumer is not available, just ignore the message. > // Otherwise, dispatch the message to the consumer. > if(consumer != null) { > consumer.Dispatch(dispatch); > } > } > catch(Exception ex) > { > Tracer.DebugFormat("Caught Exception While Dispatching: {0}", ex.Message ); > } > } > }} > Note that I ran the unit tests before my patch and I got 3 failures. Then I got the same failures with my patch. So, I hope it didn't break anything but I'll let you find the best solution... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.