activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothy Bish (JIRA)" <jira+amq...@apache.org>
Subject [jira] Commented: (AMQNET-289) Deadlock while sending a message after failover within a consumer
Date Sat, 09 Oct 2010 16:41:41 GMT

    [ https://issues.apache.org/activemq/browse/AMQNET-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=62478#action_62478
] 

Timothy Bish commented on AMQNET-289:
-------------------------------------

It would be helpfull if you could provide an NUnit test case that reproduces the issues so
that the fix can be verified as well as ensuring that there are no recersion in the future.

> Deadlock while sending a message after failover within a consumer
> -----------------------------------------------------------------
>
>                 Key: AMQNET-289
>                 URL: https://issues.apache.org/activemq/browse/AMQNET-289
>             Project: ActiveMQ .Net
>          Issue Type: Bug
>          Components: ActiveMQ
>    Affects Versions: 1.4.1
>         Environment: Windows 7 64 bits
>            Reporter: Morgan Martinet
>            Assignee: Jim Gomes
>            Priority: Critical
>             Fix For: 1.5.0
>
>         Attachments: deadlock.jpg, SessionExecutor.cs
>
>
> Scenario:
> - I have one producer that sends a request (with a temporary queue specified in the Reply-to
attribute) to a consumer, in a separate process.
> - both, the producer and the consumer, use the following connection string: failover:(tcp://localhost:61616)?timeout=3000
> - the consumer, when processing the request, waits 10 seconds then sends a response back,
using the Reply-To attribute.
> - immediately after the message has been sent, while the consumer is waiting for 10 secs,
I restart the ActiveMQ broker.
> - once the the consumer wakes up and tries to send its reply, it will deadlock because
of the failover.
> We have managed to identify the resources that deadlock:
> Thread1 - lock(reconnectMutex)    (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\Transport\Failover\FailoverTransport.cs:
line 366)
> Thread1 - wait on lock(this.consumers.SyncRoot)    (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\Session.cs:
line 830)
> Thread2 - lock(this.consumers.SyncRoot)   (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\SessionExecutor.cs:
line 147)
> Thread2 - wait on lock(reconnectMutex)    (c:\Temp\Apache\NMS.ActiveMQ\1.4.1\src\main\csharp\Transport\Failover\FailoverTransport.cs:
line 531)
> Patch:
> I managed to find a simple fix for this, by moving the consumer dispatch out of the this.consumers.SyncRoot
lock in SessionExecutor.cs:
> {{
>         public void Dispatch(MessageDispatch dispatch)
>         {
>             try
>             {
>                 MessageConsumer consumer = null;
>                 lock(this.consumers.SyncRoot)
>                 {
>                     if(this.consumers.Contains(dispatch.ConsumerId))
>                     {
>                         consumer = this.consumers[dispatch.ConsumerId] as MessageConsumer;
>                     }
> // Note that consumer.Dispatch(...) was moved below, outside of the lock.
>                 }
>                 // If the consumer is not available, just ignore the message.
>                 // Otherwise, dispatch the message to the consumer.
>                 if(consumer != null) {
>                     consumer.Dispatch(dispatch);
>                 }
>             }
>             catch(Exception ex)
>             {
>                 Tracer.DebugFormat("Caught Exception While Dispatching: {0}", ex.Message
);
>             }
>         }
> }}
> Note that I ran the unit tests before my patch and I got 3 failures. Then I got the same
failures with my patch. So, I hope it didn't break anything but I'll let you find the best
solution...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message