Return-Path: Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: (qmail 61430 invoked from network); 2 Nov 2007 22:11:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Nov 2007 22:11:04 -0000 Received: (qmail 33835 invoked by uid 500); 2 Nov 2007 22:10:52 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 33803 invoked by uid 500); 2 Nov 2007 22:10:52 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 33794 invoked by uid 99); 2 Nov 2007 22:10:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Nov 2007 15:10:51 -0700 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rajdavies@gmail.com designates 66.249.92.171 as permitted sender) Received: from [66.249.92.171] (HELO ug-out-1314.google.com) (66.249.92.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Nov 2007 22:10:52 +0000 Received: by ug-out-1314.google.com with SMTP id k3so696767ugf for ; Fri, 02 Nov 2007 15:10:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:mime-version:in-reply-to:references:content-type:message-id:content-transfer-encoding:from:subject:date:to:x-mailer; bh=y4IkyjYp78+EfcTXnZts74KROG9E4H28nEg7xLBAxRQ=; b=YY/MKthphyIe9e3meMc4ORuCbXPTOsogbXfagrfnxrhxYu37fCvsXxCxeZigrgtCYKihHA3Vz/1lce8iAbn9XO+xoa/cLLn0NOUkLACvABDivHGsbjlHlnvr6SnwDjgw/YmA93VGoEvkikOas2SI9NJqT54xV4GP4haJOMn1RXM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:mime-version:in-reply-to:references:content-type:message-id:content-transfer-encoding:from:subject:date:to:x-mailer; b=lr//zbGYxA72AxeTXuC5Y6HqPPHVeuXtrdfCe3b2GF+1VUdpDVZqMXIImU/3wiNhllGxdSvlvbIjPLnrOffsOvGebnyRDBuvASLevjtluTNLe7yR420mM+vnxhlwH8kg9fS9FqqXUOFOxpI3oRFsL2dL7jMSTCsPl6jMZDc0bPY= Received: by 10.78.172.20 with SMTP id u20mr1759587hue.1194041430228; Fri, 02 Nov 2007 15:10:30 -0700 (PDT) Received: from ?192.168.1.67? ( [86.133.81.109]) by mx.google.com with ESMTPS id f4sm498185nfh.2007.11.02.15.10.27 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 02 Nov 2007 15:10:28 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v752.3) In-Reply-To: <13543207.post@talk.nabble.com> References: <13538049.post@talk.nabble.com> <13541167.post@talk.nabble.com> <0DF4C7B0-E229-4C62-94BC-894E4EDC560C@gmail.com> <13543207.post@talk.nabble.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <6A2F7B1A-C740-4A30-A6E2-7B4CEA04F17D@gmail.com> Content-Transfer-Encoding: 7bit From: Rob Davies Subject: Re: IndexOutOfBoundsException in AMQMessageStore using 20071028.173739 Date: Fri, 2 Nov 2007 22:10:23 +0000 To: dev@activemq.apache.org X-Mailer: Apple Mail (2.752.3) X-Virus-Checked: Checked by ClamAV on apache.org I wonder if you could take the latest again? cheers, Rob http://rajdavies.blogspot.com/ On Nov 2, 2007, at 7:26 AM, Andrew Miller wrote: > > > Rob, > > Yeah, in general, I've been deleting the entire local store > directory in > between tests, since we're just testing it is not critical that the > messages > are persisted. (with the message queue down of course) > Interestingly, I > restarted the broker to try to fix this problem. It marked the > store as bad > and regenerated the index from the journal (about an hour+ with > ~1.5-2m > messages). > > The consumers started working again at that point, for a while. > After it > reached ~7200 dispatched messages, it really blew up. I got this > in the > wrapper.log: > > INFO | jvm 1 | 2007/11/01 20:30:09 | > java.lang.NullPointerException > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.kaha.MessageIdMarshaller.writePayload > (MessageIdMarshaller.java:38) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.kaha.MessageIdMarshaller.writePayload > (MessageIdMarshaller.java:29) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.kaha.impl.index.hash.HashEntry.write > (HashEntry.java:93) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.kaha.impl.index.hash.HashPage.write > (HashPage.java:123) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.kaha.impl.index.hash.HashIndex.writeFullPage > (HashIndex.java:350) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.kaha.impl.index.hash.HashPageInfo.end > (HashPageInfo.java:98) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.kaha.impl.index.hash.HashBin.end(HashBin.java:299) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.kaha.impl.index.hash.HashBin.put(HashBin.java:153) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.kaha.impl.index.hash.HashIndex.store > (HashIndex.java:241) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.kaha.impl.container.MapContainerImpl.put > (MapContainerImpl.java:303) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.store.kahadaptor.KahaReferenceStore.addMessageRefe > rence(KahaReferenceStore.java:113) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.store.amq.AMQMessageStore$4.execute > (AMQMessageStore.java:341) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.util.TransactionTemplate.run > (TransactionTemplate.java:44) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.store.amq.AMQMessageStore.doAsyncWrite > (AMQMessageStore.java:330) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.store.amq.AMQMessageStore.asyncWrite > (AMQMessageStore.java:300) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.store.amq.AMQMessageStore$1.iterate > (AMQMessageStore.java:92) > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.thread.DedicatedTaskRunner.runTask > (DedicatedTaskRunner.java:9 > INFO | jvm 1 | 2007/11/01 20:30:09 | at > org.apache.activemq.thread.DedicatedTaskRunner$1.run > (DedicatedTaskRunner.java:36) > > After that stack trace, I start getting a lot more > NullPointerExceptions in > the logs, however, they don't have a corresponding stack trace: > > INFO | jvm 1 | 2007/11/01 20:30:45 | WARN AMQMessageStore > - Message could not be added to long term store: null > INFO | jvm 1 | 2007/11/01 20:30:45 | > java.lang.NullPointerException > INFO | jvm 1 | 2007/11/01 20:30:45 | WARN AMQMessageStore > - Message could not be added to long term store: null > INFO | jvm 1 | 2007/11/01 20:30:45 | > java.lang.NullPointerException > > Then, a different kind of exception, with stack trace: > > INFO | jvm 1 | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch > - Failed to fill batch > INFO | jvm 1 | 2007/11/01 20:31:20 | > java.lang.NullPointerException > INFO | jvm 1 | 2007/11/01 20:31:20 | ERROR IG > - Failed to page in more queue messages > INFO | jvm 1 | 2007/11/01 20:31:20 | java.lang.RuntimeException: > java.lang.NullPointerException > INFO | jvm 1 | 2007/11/01 20:31:20 | at > org.apache.activemq.broker.region.cursors.QueueStorePrefetch.hasNext > (QueueStorePrefetch.java:112) > INFO | jvm 1 | 2007/11/01 20:31:20 | at > org.apache.activemq.broker.region.cursors.StoreQueueCursor.hasNext > (StoreQueueCursor.java:129) > INFO | jvm 1 | 2007/11/01 20:31:20 | at > org.apache.activemq.broker.region.Queue.buildList(Queue.java:1022) > INFO | jvm 1 | 2007/11/01 20:31:20 | at > org.apache.activemq.broker.region.Queue.doPageIn(Queue.java:1005) > INFO | jvm 1 | 2007/11/01 20:31:20 | at > org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java: > 1068) > INFO | jvm 1 | 2007/11/01 20:31:20 | at > org.apache.activemq.broker.region.Queue.iterate(Queue.java:941) > INFO | jvm 1 | 2007/11/01 20:31:20 | at > org.apache.activemq.thread.DedicatedTaskRunner.runTask > (DedicatedTaskRunner.java:98) > INFO | jvm 1 | 2007/11/01 20:31:20 | at > org.apache.activemq.thread.DedicatedTaskRunner$1.run > (DedicatedTaskRunner.java:36) > INFO | jvm 1 | 2007/11/01 20:31:20 | Caused by: > java.lang.NullPointerException > > A couple other stack traces that look a lot like that, and then > tons of: > > INFO | jvm 1 | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch > - Failed to fill batch > INFO | jvm 1 | 2007/11/01 20:31:20 | > java.lang.NullPointerException > INFO | jvm 1 | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch > - Failed to fill batch > INFO | jvm 1 | 2007/11/01 20:31:20 | > java.lang.NullPointerException > INFO | jvm 1 | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch > - Failed to fill batch > > > (Sorry, I haven't searched JIRA/Google for this one yet so this may > be a > known issue) > > I reran most of the test with the journaledJDBC connector but it's not > nearly as fast as the local store so I'm hoping to get this > working. :-) > I've made the indexBinSize change you've recommended and I'll re- > run the > tests tomorrow. BTW thanks for putting together that amq-message- > store page > on the wiki. Thanks for the assistance! > > -Andy > > > > > > > > > > rajdavies wrote: >> >> Hi andy, >> >> did you start with a completely new AMQ Store data directory ? - You >> could delete the references store only - e.g. the Recovery section in >> http://activemq.apache.org/amq-message-store.html >> Also - as you are storing a few million messages - you might want to >> change the number of indexBinSize - the default number of has bins - >> parameter to something large - like 16000 (the default is 1024). >> >> I'l see if I can reproduce your latest problem with the >> RecoveryListenerAdaptor. >> >> cheers, >> >> Rob >> >> http://rajdavies.blogspot.com/ >> >> >> >> On Nov 2, 2007, at 2:16 AM, Andrew Miller wrote: >> >>> >>> >>> Thanks for the fix! We're running the test now... Parameters: >>> >>> SVN version: 591163 >>> Client jar: Updated to 5.0-20071028.173739 >>> >>> We haven't seen any of the original IndexOutOfBounds exception >>> messages, >>> even after about 1.2 million Enqueues. However, the consumers >>> appear to be >>> stuck and unable to consume. We had a RecoveryListenerAdapter >>> message >>> show up in the log, though I'm not sure this correlates with the >>> inability >>> of the consumers to consume these messages. >>> >>> 2007-11-01 18:10:42,645 [68.194.21:40095] ERROR >>> RecoveryListenerAdapter >>> - Message id ID:rlab-0007.xxxx.com-48174-119396578373 >>> 6-0:2:1:1:11 could not be recovered from the data store! >>> >>> I even wrote a little perl client that does nothing but subscribe >>> to this >>> queue, consumes one message, and exits. The script just blocks and >>> then >>> times out, even though there are many messages in the queue: >>> >>> rlab-0007:/opt/activemq/bin# ./query -QQueue=IG >>> ACTIVEMQ_HOME: /opt/activemq >>> ACTIVEMQ_BASE: /opt/activemq >>> Type = Queue >>> DispatchCount = 213 >>> Destination = IG >>> MaxEnqueueTime = 1371 >>> QueueSize = 926441 >>> Name = IG >>> DequeueCount = 213 >>> MemoryPercentageUsed = 0 >>> ConsumerCount = 19 >>> MemoryLimit = 30198988 >>> EnqueueCount = 1190424 >>> MinEnqueueTime = 1 >>> AverageEnqueueTime = 196.80751173708921 >>> >>> Here's the one-off perl script and the output: (I assume there's no >>> reason I >>> would be unable to consume a message sent via JMS in stomp/perl?) >>> (redacting script since my previous mail on this subject got >>> rejected for >>> being too spammy - suffice to say it simply subscribes to the >>> queue in >>> question and does a stomp->receive_frame()) >>> >>> amiller@mirror-01:/opt/amiller$ ./c1.pl >>> connected >>> could not receive message after 20 seconds at ./c1.pl line 17, >>> line >>> 4. >>> >>> We also had this problem with the 20071031 snapshot, but at that >>> time I >>> thought it was because we were using the 4.1.1 client jars. BTW >>> even though >>> we didn't see this issue in 20071002, we needed to upgrade to solve >>> AMQ-1479. >>> >>> Thanks again! >>> -Andy >>> >>> >>> >>> >>> rajdavies wrote: >>>> >>>> There's a potential fix for this I've just added to trunk - >>>> would it >>>> be possible to build the latest and test ? >>>> >>>> >>>> cheers, >>>> >>>> Rob >>>> >>>> http://rajdavies.blogspot.com/ >>>> >>>> >>>> >>>> On Nov 1, 2007, at 9:47 PM, Andrew Miller wrote: >>>> >>>>> >>>>> We've been using ActiveMQ in an infrastructure refresh project, >>>>> and >>>>> have run >>>>> into a nasty looking bug, it looks like we are dropping >>>>> messages. We have >>>>> a few fast producers that will produce approximately 4-5 million >>>>> persistent >>>>> messages over the course of about 2.5 hours, while many slower >>>>> consumers >>>>> will consume these messages over the course of a day or so. >>>>> >>>>> We are using Jencks/Spring for the producers and consumers, using >>>>> the >>>>> openwire protocol. For this test, we were using the AMQ 4.1.1 >>>>> client jars, >>>>> if that matters. (We will try updating to the new AMQ5 client >>>>> jars >>>>> today) >>>>> >>>>> About halfway through the production run, we started getting >>>>> IndexOutOfBounds exceptions in the log pretty frequently. >>>>> Later on >>>>> the >>>>> consumer tries to get these messages but appears to fail. >>>> >>>> (exception messages trimmed) >>>> >>>> >>> >>> -- >>> View this message in context: http://www.nabble.com/ >>> IndexOutOfBoundsException-in-AMQMessageStore-using-20071028.173739- >>> tf4734264s2354.html#a13541167 >>> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com. >>> >> >> >> > > -- > View this message in context: http://www.nabble.com/ > IndexOutOfBoundsException-in-AMQMessageStore-using-20071028.173739- > tf4734264s2354.html#a13543207 > Sent from the ActiveMQ - Dev mailing list archive at Nabble.com. >