activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Gerhardsson (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMQ-4022) Receiving event while disconnecting -> AMQ deadlock -> "Already connected" error
Date Mon, 17 Sep 2012 13:58:07 GMT

     [ https://issues.apache.org/jira/browse/AMQ-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gabriel Gerhardsson updated AMQ-4022:
-------------------------------------

    Description: 
Executive summary
---
When an event is about to be delivered to a client while that client simultaneously sends
a DISCONNECT message, there's a risk for a deadlock between two threads in ActiveMQ, causing
no DISCONNECT receipt as well as "Already connected" error if another client connects with
the same client ID.


Details
---
Here's steps to reproduce, using STOMP, tho it's possible that this issue exists for other
protocols also, since the deadlock happens in broker/TransportConnection.java. I could reproduce
the error easily with these steps, tho it's entirely possible that some of the steps aren't
strictly required but rather provides certain needed timing.

1) Connect to ActiveMQ using STOMP

2) Send a SUBSCRIBE message with destination=/topic/foo, without asking for a receipt. No
other options sent in message. 

3) Send a message to /topic/foo, without asking for a receipt. Note: It's important that the
message is auto-acked by the server.

4) Wait 10ms for message

5) Go right ahead and send DISCONNECT message, asking for a receipt.

5.1) About 20% of the time, the message is never received, nor is the DISCONNECT receipt.

5.2) Run 'jstack -p <activemq pid>', watch how the JVM prints info on stdout about a
deadlock between two threads.


Test script
---
Apply the small patch to the perl library Net/Stomp.pm
It's a naive minimal patch to make it wait for a receipt for the DISCONNECT message. It will
actually just wait for any frame to arrive and won't check what it is, but it's enough for
this test.

Run amqtest.pl a few times. A non-buggy run completes almost immediately. A buggy run hangs
the script during disconnect.


Workaround
---
-When sending the SUBSCRIBE with receipt requested, and waiting for the receipt before sending
the SEND, I was unable to reproduce the issue, even after running the script 100000 times.-
No, it turns out that the workaround suggested above does not completely eliminate the problem.
It probably just introduces a large enough delay to almost always avoid the race condition.
After 30000 additional test runs with the script the error occurred again.

  was:
Executive summary
---
A message is lost, the connection ends up in a strange state, and then when the client sends
DISCONNECT it gets no receipt.


Details
---
Steps to reproduce:

1) Using STOMP

2) Send a SUBSCRIBE message with destination=/topic/foo, without asking for a receipt. No
other options sent in message.

3) Send a message to /topic/foo, without asking for a receipt.

4) Wait 10ms for message

5) Go right ahead and send DISCONNECT message, asking for a receipt.

5.1) About 20% of the time, no DISCONNECT receipt is received, however long the client waits.
This seems to coincide with the case when the message doesn't arrive in #4.

5.2) It does look like ActiveMQ gets the DISCONNECT message tho since it unsubscribes the
client from /topic/foo at that timestamp.


The DISCONNECT receipt getting lost is clearly a bug. Waiting for the receipt is important
since I've seen cases where the client isn't unregistered properly in ActiveMQ if the client
just sends DISCONNECT and then closes the socket.


Test script
---
Apply the small patch to the perl library Net/Stomp.pm
It's a naive minimal patch to make it wait for a receipt for the DISCONNECT message. It will
actually just wait for any frame to arrive and won't check what it is, but it's enough for
this test.

Run amqtest.pl a few times. A non-buggy run completes almost immediately. A buggy run hangs
the script during disconnect.


Workaround
---
-When sending the SUBSCRIBE with receipt requested, and waiting for the receipt before sending
the SEND, I was unable to reproduce the issue, even after running the script 100000 times.-
No, it turns out that the workaround suggested above does not completely eliminate the problem.
It probably just introduces a large enough delay to almost always avoid the race condition.
After 30000 additional test runs with the script the error occured again.

     Patch Info: Patch Available
        Summary: Receiving event while disconnecting -> AMQ deadlock -> "Already connected"
error  (was: SEND directly after SUBSCRIBE causes bad state -> no DISCONNECT receipt)

Updated title and description with clearer information, since this issue isn't really about
SUBSCRIBE but rather DISCONNECT deadlocking with auto-acking of an event inside Active MQ.
                
> Receiving event while disconnecting -> AMQ deadlock -> "Already connected" error
> --------------------------------------------------------------------------------
>
>                 Key: AMQ-4022
>                 URL: https://issues.apache.org/jira/browse/AMQ-4022
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: stomp
>    Affects Versions: 5.5.1
>         Environment: SLES11-SP1 x86_64
>            Reporter: Gabriel Gerhardsson
>         Attachments: AMQ_4022_possible_fix.diff, amqtest.pl, Net_Stomp.pm.diff
>
>
> Executive summary
> ---
> When an event is about to be delivered to a client while that client simultaneously sends
a DISCONNECT message, there's a risk for a deadlock between two threads in ActiveMQ, causing
no DISCONNECT receipt as well as "Already connected" error if another client connects with
the same client ID.
> Details
> ---
> Here's steps to reproduce, using STOMP, tho it's possible that this issue exists for
other protocols also, since the deadlock happens in broker/TransportConnection.java. I could
reproduce the error easily with these steps, tho it's entirely possible that some of the steps
aren't strictly required but rather provides certain needed timing.
> 1) Connect to ActiveMQ using STOMP
> 2) Send a SUBSCRIBE message with destination=/topic/foo, without asking for a receipt.
No other options sent in message. 
> 3) Send a message to /topic/foo, without asking for a receipt. Note: It's important that
the message is auto-acked by the server.
> 4) Wait 10ms for message
> 5) Go right ahead and send DISCONNECT message, asking for a receipt.
> 5.1) About 20% of the time, the message is never received, nor is the DISCONNECT receipt.
> 5.2) Run 'jstack -p <activemq pid>', watch how the JVM prints info on stdout about
a deadlock between two threads.
> Test script
> ---
> Apply the small patch to the perl library Net/Stomp.pm
> It's a naive minimal patch to make it wait for a receipt for the DISCONNECT message.
It will actually just wait for any frame to arrive and won't check what it is, but it's enough
for this test.
> Run amqtest.pl a few times. A non-buggy run completes almost immediately. A buggy run
hangs the script during disconnect.
> Workaround
> ---
> -When sending the SUBSCRIBE with receipt requested, and waiting for the receipt before
sending the SEND, I was unable to reproduce the issue, even after running the script 100000
times.-
> No, it turns out that the workaround suggested above does not completely eliminate the
problem. It probably just introduces a large enough delay to almost always avoid the race
condition. After 30000 additional test runs with the script the error occurred again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message