activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Morgan Martinet (JIRA)" <jira+amq...@apache.org>
Subject [jira] Created: (AMQNET-289) Deadlock while sending a message after failover within a consumer
Date Fri, 08 Oct 2010 20:51:40 GMT
Deadlock while sending a message after failover within a consumer
-----------------------------------------------------------------

                 Key: AMQNET-289
                 URL: https://issues.apache.org/activemq/browse/AMQNET-289
             Project: ActiveMQ .Net
          Issue Type: Bug
          Components: ActiveMQ
    Affects Versions: 1.4.1
         Environment: Windows 7 64 bits
            Reporter: Morgan Martinet
            Assignee: Jim Gomes
            Priority: Critical


Scenario:
- I have one producer that sends a request (with a temporary queue specified in the Reply-to
attribute) to a consumer, in a separate process.
- both, the producer and the consumer, use the following connection string: failover:(tcp://localhost:61616)?timeout=3000
- the consumer, when processing the request, waits 10 seconds then sends a response back,
using the Reply-To attribute.
- immediately after the message has been sent, while the consumer is waiting for 10 secs,
I restart the ActiveMQ broker.
- once the the consumer wakes up and tries to send its reply, it will deadlock because of
the failover.

We have managed to identify the resources that deadlock:
Thread1 - lock(reconnectMutex)    (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\Transport\Failover\FailoverTransport.cs:
line 366)
Thread1 - wait on lock(this.consumers.SyncRoot)    (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\Session.cs:
line 830)

Thread2 - lock(this.consumers.SyncRoot)   (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\SessionExecutor.cs:
line 147)
Thread2 - wait on lock(reconnectMutex)    (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\Transport\Failover\FailoverTransport.cs:
line 531)

Patch:
I managed to find a simple fix for this, by moving the consumer dispatch out of the this.consumers.SyncRoot
lock in SessionExecutor.cs:
{{
        public void Dispatch(MessageDispatch dispatch)
        {
            try
            {
                MessageConsumer consumer = null;

                lock(this.consumers.SyncRoot)
                {
                    if(this.consumers.Contains(dispatch.ConsumerId))
                    {
                        consumer = this.consumers[dispatch.ConsumerId] as MessageConsumer;
                    }
// Note that consumer.Dispatch(...) was moved below, outside of the lock.
                }
                // If the consumer is not available, just ignore the message.
                // Otherwise, dispatch the message to the consumer.
                if(consumer != null) {
                    consumer.Dispatch(dispatch);
                }
            }
            catch(Exception ex)
            {
                Tracer.DebugFormat("Caught Exception While Dispatching: {0}", ex.Message );
            }
        }
}}

Note that I ran the unit tests before my patch and I got 3 failures. Then I got the same failures
with my patch. So, I hope it didn't break anything but I'll let you find the best solution...


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message