activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Gellings (JIRA)" <j...@apache.org>
Subject [jira] Updated: (AMQ-2627) Failover causes duplicate messages
Date Fri, 26 Feb 2010 18:04:40 GMT

     [ https://issues.apache.org/activemq/browse/AMQ-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Gellings updated AMQ-2627:
-------------------------------

    Attachment: activemq.xml
                NativeNMSConsumerAndProducer.zip

Attached is a console app using NMS to replicate the problem along with our activemq.xml.
 

zip file is password protected with password "fridaytest".  

We're using ActiveMQ v5.2, jdbc master/slave MSSQL 2008.  Attached is an NMS v1.2 RC4 consumer
with a transacted session as well as activemq.xml.  

To replicate:

1)	Work through the console prompts and produce 50 msgs.  
2)	Restart console and start consuming those 50 msgs.  
3)	In the middle of the consumer processing, restart broker
4)	The last message consumer was processing will be resent and not marked as redelivered.
 (this is the idempotent msg problem.  Ex. -- instead of $500 getting deposited into your
account, $1000 does)
5)	Then NMS blows up which seems like a different problem?

>From what I understand this shouldn't be the case if you use a transacted session, however
the attached console app can prove it is a problem.

Bottomline--I thought this was why the camel idempotent consumer pattern [1] existed which
can be leveraged by java clients.


[1] http://fusesource.com/docs/router/1.6/eip/MsgEnd-Idempotent.html 


Regards,

Mark


> Failover causes duplicate messages
> ----------------------------------
>
>                 Key: AMQ-2627
>                 URL: https://issues.apache.org/activemq/browse/AMQ-2627
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.3.0
>         Environment: Server: 2 RHEL 5.3 x86-64 machines. Kernel version 2.6.18-128.0.0.0.2.el5.
> Client: Same as above. Also tested with same results on Fedora Core 11
>            Reporter: Josh Carlson
>            Priority: Blocker
>         Attachments: activemq.xml, broken_failover.tar.bz2, NativeNMSConsumerAndProducer.zip
>
>
> When using a shared file system master/server activemq configuration and client acknoledgements
we run into a problem when
> our clients fail over to a new server. The problem is that the new server does not appear
to have any knowledge of pending
> messages that the old server had dispatched to clients. Consequently all of these pending
messages get dispatched a second
> time even though the clients had acknowledged them.
> Please confirm my suspicion that this is a server side bug and if there are any suggestions
for working around this issue so that it might work. I have put this at Priority 'Blocker'
because it blocks our progress towards deploying an ActiveMQ solution to our infrastructure.

> If you look at the log file from the new broker you can see that the ack for those messages
do not get matched:
>    2010-02-24 12:46:49,759 | WARN  | Async error occurred: javax.jms.JMSException: Unmatched
acknowledege:
> I do not know whether this gets bubbled up to the client or not. If it does it must be
under the hood in activemq-cpp
> because from the application layer I do not see any errors. In our in house Perl Stomp
client we wind up getting an ERROR
> frame which it did not know what to do with. This is where I intially ran into this problem.
Today is my first day using
> CMS to attempt to verify if the bug is independent of the client and to provide a reproducer
using a client everyone
> should have ready access to.
> The attached tar file will contain the following details for reproducing this problem.
> Contents:
>    README.txt                   - This File
>    activemq_1.xml               - ActiveMQ config for the server that was master at the
time I started the consumer
>    activemq_2.xml               - ActiveMQ config for the broker which became the master
after the original master failed
>    activemq_1.log               - Log file from the first server
>    activemq_2.log               - Log for the second server
>    producers/SimpleProducer.cpp - Modified version of program shipped in activemq-cpp-library-3.1.0
to
>                                   send only 2 messages and provide two broker hosts on
the command line.
>    consumers/SimpleConsumer.cpp - New file ... but really just a modified version of
SimpleAsyncConsumer shipped with
>                                   activemq-cpp-library-3.1.0. Modified as follows:
>                                      - Retrieves messages synchronously and in one thread
(so we can see what is going on)
>                                      - Takes two command line options to name broker
hosts to use in broker URI
>                                      - Uses Client Acknoledgements.
>                                      - After retrieving a message it blocks waiting for
standard input (so one has time to go kill the server)
>     Makefile.am                 - Modified version of the makefile to build the new SimpleConsumer
program.
>     
>     
> Note that the build for these files require that they be built from inside a activemq-cpp
build tree. So the first step to reproduce this problem would be to copy producers/SimpleProducer.cpp
consumers/SimpleConsumer.cpp and Makefile.am to your src/examples directory. Then run a top
level, configure and make. I ran this using activemq-cpp-library version 3.1.0
>     
> This reproducer expects that you only have 2 activemq brokers and that they be configured
using a shared file system master/slave configuration. It also expects an openwire transport
connector listening on port 61616 on those two machines. (Note: you'll see my activemq configs
using the transport uri: uri="tcp://q1masterhost:61616", q1masterhost goes to the ethernet
0 interface on each of the hosts.)
> Once you have those two brokers set up and running. Go ahead and run the simple_producer
code passing the hostnames of your two brokers on the command line:
>         [jcarlson@rocky examples]$ ./simple_producer mmq1 mmq2
>         =====================================================
>         Starting the example:
>         -----------------------------------------------------
>         Sent message #1 from thread 139817389041504
>         Sent message #2 from thread 139817389041504
>         -----------------------------------------------------
>         Finished with the example.
>         =====================================================
> Now do the same for the simple_consumer:
>         [jcarlson@rocky examples]$ ./simple_consumer mmq1 mmq2
>         =====================================================
>         Starting the example:
>         -----------------------------------------------------
>         Message #1 Received: Hello world! from thread 139817389041504
>         Waiting for stdin to acknoledge
> The app has retrieved one message but has not ack'ed it yet. Now go identify
> which host has the master broker and kill the process. The master broker will
> be the one which is *not* printing 'Database [lockfile] is locked' messages.
> In my case the broker was on mmq1 so I did this in another terminal:
>         ssh -t mmq1 sudo pkill java
> Immediatly I see this in the console I started the consumer in:
>   The Connection's Transport has been Interrupted.
> and then a few seconds later I see:
>   The Connection's Transport has been Restored.
> At this point I hit enter in the terminal so that the message I recieved on
> the other broker gets acknoledged and the consumer trys to get another message
>   Message #2 Received: Hello world! from thread 139817389041504
>   Waiting for stdin to acknoledge
> Ok at this point, since I have only put two messages on the queue I don't
> expect any more so when I hit enter and go back to get another message I
> expect it to just sit and wait for another message to come in. This is not
> what happens. A third message is retrieved:
>   Message #3 Received: Hello world! from thread 139817389041504
>   Waiting for stdin to acknoledge
> At this point when I hit enter again the app blocks and I kill it with Cntrl
> C.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message