Return-Path: X-Original-To: apmail-activemq-dev-archive@www.apache.org Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8F0DE7B89 for ; Mon, 15 Aug 2011 22:53:50 +0000 (UTC) Received: (qmail 8605 invoked by uid 500); 15 Aug 2011 22:53:50 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 8560 invoked by uid 500); 15 Aug 2011 22:53:49 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 8544 invoked by uid 99); 15 Aug 2011 22:53:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Aug 2011 22:53:49 +0000 X-ASF-Spam-Status: No, hits=-2001.1 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Aug 2011 22:53:47 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 5B95CBD519 for ; Mon, 15 Aug 2011 22:53:27 +0000 (UTC) Date: Mon, 15 Aug 2011 22:53:27 +0000 (UTC) From: "Timothy Bish (JIRA)" To: dev@activemq.apache.org Message-ID: <293212430.39926.1313448807371.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Closed] (AMQ-2627) Failover causes duplicate messages MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AMQ-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Bish closed AMQ-2627. ----------------------------- Resolution: Not A Problem This is a result of the NMS and CMS client not having internal message auditing which can filter dups in this particular use case. > Failover causes duplicate messages > ---------------------------------- > > Key: AMQ-2627 > URL: https://issues.apache.org/jira/browse/AMQ-2627 > Project: ActiveMQ > Issue Type: Bug > Components: Broker > Affects Versions: 5.3.0 > Environment: Server: 2 RHEL 5.3 x86-64 machines. Kernel version 2.6.18-128.0.0.0.2.el5. > Client: Same as above. Also tested with same results on Fedora Core 11 > Reporter: Josh Carlson > Attachments: NativeNMSConsumerAndProducer.zip, activemq.xml, broken_failover.tar.bz2 > > > When using a shared file system master/server activemq configuration and client acknoledgements we run into a problem when > our clients fail over to a new server. The problem is that the new server does not appear to have any knowledge of pending > messages that the old server had dispatched to clients. Consequently all of these pending messages get dispatched a second > time even though the clients had acknowledged them. > Please confirm my suspicion that this is a server side bug and if there are any suggestions for working around this issue so that it might work. I have put this at Priority 'Blocker' because it blocks our progress towards deploying an ActiveMQ solution to our infrastructure. > If you look at the log file from the new broker you can see that the ack for those messages do not get matched: > 2010-02-24 12:46:49,759 | WARN | Async error occurred: javax.jms.JMSException: Unmatched acknowledege: > I do not know whether this gets bubbled up to the client or not. If it does it must be under the hood in activemq-cpp > because from the application layer I do not see any errors. In our in house Perl Stomp client we wind up getting an ERROR > frame which it did not know what to do with. This is where I intially ran into this problem. Today is my first day using > CMS to attempt to verify if the bug is independent of the client and to provide a reproducer using a client everyone > should have ready access to. > The attached tar file will contain the following details for reproducing this problem. > Contents: > README.txt - This File > activemq_1.xml - ActiveMQ config for the server that was master at the time I started the consumer > activemq_2.xml - ActiveMQ config for the broker which became the master after the original master failed > activemq_1.log - Log file from the first server > activemq_2.log - Log for the second server > producers/SimpleProducer.cpp - Modified version of program shipped in activemq-cpp-library-3.1.0 to > send only 2 messages and provide two broker hosts on the command line. > consumers/SimpleConsumer.cpp - New file ... but really just a modified version of SimpleAsyncConsumer shipped with > activemq-cpp-library-3.1.0. Modified as follows: > - Retrieves messages synchronously and in one thread (so we can see what is going on) > - Takes two command line options to name broker hosts to use in broker URI > - Uses Client Acknoledgements. > - After retrieving a message it blocks waiting for standard input (so one has time to go kill the server) > Makefile.am - Modified version of the makefile to build the new SimpleConsumer program. > > > Note that the build for these files require that they be built from inside a activemq-cpp build tree. So the first step to reproduce this problem would be to copy producers/SimpleProducer.cpp consumers/SimpleConsumer.cpp and Makefile.am to your src/examples directory. Then run a top level, configure and make. I ran this using activemq-cpp-library version 3.1.0 > > This reproducer expects that you only have 2 activemq brokers and that they be configured using a shared file system master/slave configuration. It also expects an openwire transport connector listening on port 61616 on those two machines. (Note: you'll see my activemq configs using the transport uri: uri="tcp://q1masterhost:61616", q1masterhost goes to the ethernet 0 interface on each of the hosts.) > Once you have those two brokers set up and running. Go ahead and run the simple_producer code passing the hostnames of your two brokers on the command line: > [jcarlson@rocky examples]$ ./simple_producer mmq1 mmq2 > ===================================================== > Starting the example: > ----------------------------------------------------- > Sent message #1 from thread 139817389041504 > Sent message #2 from thread 139817389041504 > ----------------------------------------------------- > Finished with the example. > ===================================================== > Now do the same for the simple_consumer: > [jcarlson@rocky examples]$ ./simple_consumer mmq1 mmq2 > ===================================================== > Starting the example: > ----------------------------------------------------- > Message #1 Received: Hello world! from thread 139817389041504 > Waiting for stdin to acknoledge > The app has retrieved one message but has not ack'ed it yet. Now go identify > which host has the master broker and kill the process. The master broker will > be the one which is *not* printing 'Database [lockfile] is locked' messages. > In my case the broker was on mmq1 so I did this in another terminal: > ssh -t mmq1 sudo pkill java > Immediatly I see this in the console I started the consumer in: > The Connection's Transport has been Interrupted. > and then a few seconds later I see: > The Connection's Transport has been Restored. > At this point I hit enter in the terminal so that the message I recieved on > the other broker gets acknoledged and the consumer trys to get another message > Message #2 Received: Hello world! from thread 139817389041504 > Waiting for stdin to acknoledge > Ok at this point, since I have only put two messages on the queue I don't > expect any more so when I hit enter and go back to get another message I > expect it to just sit and wait for another message to come in. This is not > what happens. A third message is retrieved: > Message #3 Received: Hello world! from thread 139817389041504 > Waiting for stdin to acknoledge > At this point when I hit enter again the app blocks and I kill it with Cntrl > C. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira