Return-Path: Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: (qmail 50159 invoked from network); 2 Sep 2008 18:42:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Sep 2008 18:42:13 -0000 Received: (qmail 55380 invoked by uid 500); 2 Sep 2008 18:42:11 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 55357 invoked by uid 500); 2 Sep 2008 18:42:11 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 55346 invoked by uid 99); 2 Sep 2008 18:42:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Sep 2008 11:42:11 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Sep 2008 18:41:21 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 97352234C1C6 for ; Tue, 2 Sep 2008 11:41:52 -0700 (PDT) Message-ID: <27393094.1220380912618.JavaMail.jira@brutus> Date: Tue, 2 Sep 2008 11:41:52 -0700 (PDT) From: "Mario Siegenthaler (JIRA)" To: dev@activemq.apache.org Subject: [jira] Commented: (AMQ-1925) JDBC-Master/Slave Failover - Consumer stop after 1000 Messages In-Reply-To: <1284196296.1220377192611.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/activemq/browse/AMQ-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=45333#action_45333 ] Mario Siegenthaler commented on AMQ-1925: ----------------------------------------- Well the "just don't allow this value to become smaller than zero"-approach does not work. Now I've got consumers whose PrefetchSubscription isn't full but who don't get more than X messages anyway. However the change seems to increase X a notch... > JDBC-Master/Slave Failover - Consumer stop after 1000 Messages > -------------------------------------------------------------- > > Key: AMQ-1925 > URL: https://issues.apache.org/activemq/browse/AMQ-1925 > Project: ActiveMQ > Issue Type: Bug > Components: Broker > Affects Versions: 5.1.0 > Reporter: Mario Siegenthaler > Attachments: heapdump-1220373534484.hprof, threaddump-1220371256910.tdump > > > In a JDBC-Master/Slave Environment with ActiveMQ 5.1.0 (+patches for 1710 und 1838) the failover for consumers works, the consumers resume to get messages after the failover but then the suddenly stop after approx. 1000 messages (mostly 1000, one got to 1080). The consumers are using transacted sessions. > The thread dump look unsuspicious, everybody is waiting on the Socket > java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at org.apache.activemq.transport.tcp.TcpBufferedInputStream.fill(TcpBufferedInputStream.java:50) > at org.apache.activemq.transport.tcp.TcpBufferedInputStream.read(TcpBufferedInputStream.java:58) > at java.io.DataInputStream.readInt(DataInputStream.java:370) > at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269) > at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:203) > at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:195) > at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:183) > at java.lang.Thread.run(Thread.java:619) > A memory dump from the consumers shows that they've really run out of messages and are waiting for the broker to deliver new ones. I've attached both the thread dump and the heap dump to this issue (or better: I'll do so :) > The broker doesn't do anything (also waits on the transport-socket), the queue has a full page-in buffer (100 messages) but obviously fails to do anything with it. If I manually trigger a doDispatch of all pagedIn messages (via the debugger, just a try to revive the thing) it returns doing nothing at all, since all subscriptions are full (s.isFull). I further investigated the issue and was confused to see the prefetchExtension field of the PrefetchSubscription having a value of -1000 (negative!). This explains why it was considered full: > dispatched.size() - prefetchExtension >= info.getPrefetchSize() > 0 - (-1000) >= 1000 > quite nasty.. so even though the dispatched size was zero the client didn't receive any new messages. > The only place this value can become negative is inside acknowledge, where it's decremented (prefetchExtension--), all other places do a Math.max(0, X). > So here's my guess what happened: The client had a full (1000 messages) prefetch buffer when I killed my master. As soon as the slave was done starting they reconnected and started processing the messages in the prefetch and acknowleding them. This gradually decremented the counter into a negative value because the slave never got a chance to increment the prefetchExtension since it didn't action delivery those messages. > Possible solutions: > - clear the prefetch buffer on a failover > - just don't allow this value to become smaller than zero (not sure if that covers all bases) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.