activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ying (JIRA)" <>
Subject [jira] Commented: (AMQ-2627) Failover causes duplicate messages
Date Wed, 24 Feb 2010 22:24:40 GMT


ying commented on AMQ-2627:

I am having the same issue on the current trunk and these duplicates are actually stuck in
the new broker and will not get delivered to the consumer unless I restart the consumer. 

this bug is pretty the same as i filed
a few days ago. I wonder whether it is related to

> Failover causes duplicate messages
> ----------------------------------
>                 Key: AMQ-2627
>                 URL:
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.3.0
>         Environment: Server: 2 RHEL 5.3 x86-64 machines. Kernel version 2.6.18-
> Client: Same as above. Also tested with same results on Fedora Core 11
>            Reporter: Josh Carlson
>            Priority: Blocker
>         Attachments: broken_failover.tar.bz2
> When using a shared file system master/server activemq configuration and client acknoledgements
we run into a problem when
> our clients fail over to a new server. The problem is that the new server does not appear
to have any knowledge of pending
> messages that the old server had dispatched to clients. Consequently all of these pending
messages get dispatched a second
> time even though the clients had acknowledged them.
> Please confirm my suspicion that this is a server side bug and if there are any suggestions
for working around this issue so that it might work. I have put this at Priority 'Blocker'
because it blocks our progress towards deploying an ActiveMQ solution to our infrastructure.

> If you look at the log file from the new broker you can see that the ack for those messages
do not get matched:
>    2010-02-24 12:46:49,759 | WARN  | Async error occurred: javax.jms.JMSException: Unmatched
> I do not know whether this gets bubbled up to the client or not. If it does it must be
under the hood in activemq-cpp
> because from the application layer I do not see any errors. In our in house Perl Stomp
client we wind up getting an ERROR
> frame which it did not know what to do with. This is where I intially ran into this problem.
Today is my first day using
> CMS to attempt to verify if the bug is independent of the client and to provide a reproducer
using a client everyone
> should have ready access to.
> The attached tar file will contain the following details for reproducing this problem.
> Contents:
>    README.txt                   - This File
>    activemq_1.xml               - ActiveMQ config for the server that was master at the
time I started the consumer
>    activemq_2.xml               - ActiveMQ config for the broker which became the master
after the original master failed
>    activemq_1.log               - Log file from the first server
>    activemq_2.log               - Log for the second server
>    producers/SimpleProducer.cpp - Modified version of program shipped in activemq-cpp-library-3.1.0
>                                   send only 2 messages and provide two broker hosts on
the command line.
>    consumers/SimpleConsumer.cpp - New file ... but really just a modified version of
SimpleAsyncConsumer shipped with
>                                   activemq-cpp-library-3.1.0. Modified as follows:
>                                      - Retrieves messages synchronously and in one thread
(so we can see what is going on)
>                                      - Takes two command line options to name broker
hosts to use in broker URI
>                                      - Uses Client Acknoledgements.
>                                      - After retrieving a message it blocks waiting for
standard input (so one has time to go kill the server)
>                 - Modified version of the makefile to build the new SimpleConsumer
> Note that the build for these files require that they be built from inside a activemq-cpp
build tree. So the first step to reproduce this problem would be to copy producers/SimpleProducer.cpp
consumers/SimpleConsumer.cpp and to your src/examples directory. Then run a top
level, configure and make. I ran this using activemq-cpp-library version 3.1.0
> This reproducer expects that you only have 2 activemq brokers and that they be configured
using a shared file system master/slave configuration. It also expects an openwire transport
connector listening on port 61616 on those two machines. (Note: you'll see my activemq configs
using the transport uri: uri="tcp://q1masterhost:61616", q1masterhost goes to the ethernet
0 interface on each of the hosts.)
> Once you have those two brokers set up and running. Go ahead and run the simple_producer
code passing the hostnames of your two brokers on the command line:
>         [jcarlson@rocky examples]$ ./simple_producer mmq1 mmq2
>         =====================================================
>         Starting the example:
>         -----------------------------------------------------
>         Sent message #1 from thread 139817389041504
>         Sent message #2 from thread 139817389041504
>         -----------------------------------------------------
>         Finished with the example.
>         =====================================================
> Now do the same for the simple_consumer:
>         [jcarlson@rocky examples]$ ./simple_consumer mmq1 mmq2
>         =====================================================
>         Starting the example:
>         -----------------------------------------------------
>         Message #1 Received: Hello world! from thread 139817389041504
>         Waiting for stdin to acknoledge
> The app has retrieved one message but has not ack'ed it yet. Now go identify
> which host has the master broker and kill the process. The master broker will
> be the one which is *not* printing 'Database [lockfile] is locked' messages.
> In my case the broker was on mmq1 so I did this in another terminal:
>         ssh -t mmq1 sudo pkill java
> Immediatly I see this in the console I started the consumer in:
>   The Connection's Transport has been Interrupted.
> and then a few seconds later I see:
>   The Connection's Transport has been Restored.
> At this point I hit enter in the terminal so that the message I recieved on
> the other broker gets acknoledged and the consumer trys to get another message
>   Message #2 Received: Hello world! from thread 139817389041504
>   Waiting for stdin to acknoledge
> Ok at this point, since I have only put two messages on the queue I don't
> expect any more so when I hit enter and go back to get another message I
> expect it to just sit and wait for another message to come in. This is not
> what happens. A third message is retrieved:
>   Message #3 Received: Hello world! from thread 139817389041504
>   Waiting for stdin to acknoledge
> At this point when I hit enter again the app blocks and I kill it with Cntrl
> C.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message