activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Miller <z2amil...@gmail.com>
Subject Re: IndexOutOfBoundsException in AMQMessageStore using 20071028.173739
Date Fri, 02 Nov 2007 07:26:00 GMT


Rob,

Yeah, in general, I've been deleting the entire local store directory in
between tests, since we're just testing it is not critical that the messages
are persisted. (with the message queue down of course)  Interestingly, I
restarted the broker to try to fix this problem.  It marked the store as bad
and regenerated the index from the journal (about an hour+ with ~1.5-2m
messages).

The consumers started working again at that point, for a while.  After it
reached ~7200 dispatched messages, it really blew up.  I got this in the
wrapper.log:

INFO   | jvm 1    | 2007/11/01 20:30:09 | java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.MessageIdMarshaller.writePayload(MessageIdMarshaller.java:38)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.MessageIdMarshaller.writePayload(MessageIdMarshaller.java:29)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashEntry.write(HashEntry.java:93)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashPage.write(HashPage.java:123)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashIndex.writeFullPage(HashIndex.java:350)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashPageInfo.end(HashPageInfo.java:98)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashBin.end(HashBin.java:299)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashBin.put(HashBin.java:153)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashIndex.store(HashIndex.java:241)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.container.MapContainerImpl.put(MapContainerImpl.java:303)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.kahadaptor.KahaReferenceStore.addMessageReference(KahaReferenceStore.java:113)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.amq.AMQMessageStore$4.execute(AMQMessageStore.java:341)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.util.TransactionTemplate.run(TransactionTemplate.java:44)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.amq.AMQMessageStore.doAsyncWrite(AMQMessageStore.java:330)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.amq.AMQMessageStore.asyncWrite(AMQMessageStore.java:300)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.amq.AMQMessageStore$1.iterate(AMQMessageStore.java:92)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:9
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)

After that stack trace, I start getting a lot more NullPointerExceptions in
the logs, however, they don't have a corresponding stack trace:

INFO   | jvm 1    | 2007/11/01 20:30:45 | WARN  AMQMessageStore               
- Message could not be added to long term store: null
INFO   | jvm 1    | 2007/11/01 20:30:45 | java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:30:45 | WARN  AMQMessageStore               
- Message could not be added to long term store: null
INFO   | jvm 1    | 2007/11/01 20:30:45 | java.lang.NullPointerException

Then, a different kind of exception, with stack trace:

INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch            
- Failed to fill batch
INFO   | jvm 1    | 2007/11/01 20:31:20 | java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR IG                            
- Failed to page in more queue messages 
INFO   | jvm 1    | 2007/11/01 20:31:20 | java.lang.RuntimeException:
java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.cursors.QueueStorePrefetch.hasNext(QueueStorePrefetch.java:112)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.cursors.StoreQueueCursor.hasNext(StoreQueueCursor.java:129)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.Queue.buildList(Queue.java:1022)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.Queue.doPageIn(Queue.java:1005)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:1068)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.Queue.iterate(Queue.java:941)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
INFO   | jvm 1    | 2007/11/01 20:31:20 | Caused by:
java.lang.NullPointerException

A couple other stack traces that look a lot like that, and then tons of:

INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch            
- Failed to fill batch
INFO   | jvm 1    | 2007/11/01 20:31:20 | java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch            
- Failed to fill batch
INFO   | jvm 1    | 2007/11/01 20:31:20 | java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch            
- Failed to fill batch


(Sorry, I haven't searched JIRA/Google for this one yet so this may be a
known issue)

I reran most of the test with the journaledJDBC connector but it's not
nearly as fast as the local store so I'm hoping to get this working.  :-)   
I've made the indexBinSize change you've recommended and I'll re-run the
tests tomorrow.  BTW thanks for putting together that amq-message-store page
on the wiki. Thanks for the assistance!

-Andy









rajdavies wrote:
> 
> Hi andy,
> 
> did you start with a completely new AMQ Store data directory ? - You  
> could delete the references store only - e.g. the Recovery section in  
> http://activemq.apache.org/amq-message-store.html
> Also - as you are storing a few million messages - you might want to  
> change the number of indexBinSize - the default number of has bins -   
> parameter to something large - like 16000 (the default is 1024).
> 
> I'l see if I can reproduce your latest problem with the  
> RecoveryListenerAdaptor.
> 
> cheers,
> 
> Rob
> 
> http://rajdavies.blogspot.com/
> 
> 
> 
> On Nov 2, 2007, at 2:16 AM, Andrew Miller wrote:
> 
>>
>>
>> Thanks for the fix!  We're running the test now...  Parameters:
>>
>> SVN version: 591163
>> Client jar: Updated to 5.0-20071028.173739
>>
>> We haven't seen any of the original IndexOutOfBounds exception  
>> messages,
>> even after about 1.2 million Enqueues.   However, the consumers  
>> appear to be
>> stuck and unable to consume.     We had a RecoveryListenerAdapter  
>> message
>> show up in the log, though I'm not sure this correlates with the  
>> inability
>> of the consumers to consume these messages.
>>
>> 2007-11-01 18:10:42,645 [68.194.21:40095] ERROR  
>> RecoveryListenerAdapter
>> - Message id ID:rlab-0007.xxxx.com-48174-119396578373
>> 6-0:2:1:1:11 could not be recovered from the data store!
>>
>> I even wrote a little perl client that does nothing but subscribe  
>> to this
>> queue, consumes one message, and exits.  The script just blocks and  
>> then
>> times out, even though there are many messages in the queue:
>>
>> rlab-0007:/opt/activemq/bin# ./query -QQueue=IG
>> ACTIVEMQ_HOME: /opt/activemq
>> ACTIVEMQ_BASE: /opt/activemq
>> Type = Queue
>> DispatchCount = 213
>> Destination = IG
>> MaxEnqueueTime = 1371
>> QueueSize = 926441
>> Name = IG
>> DequeueCount = 213
>> MemoryPercentageUsed = 0
>> ConsumerCount = 19
>> MemoryLimit = 30198988
>> EnqueueCount = 1190424
>> MinEnqueueTime = 1
>> AverageEnqueueTime = 196.80751173708921
>>
>> Here's the one-off perl script and the output: (I assume there's no  
>> reason I
>> would be unable to consume a message sent via JMS in stomp/perl?)
>> (redacting script since my previous mail on this subject got  
>> rejected for
>> being too spammy - suffice to say it simply subscribes to the queue in
>> question and does a stomp->receive_frame())
>>
>> amiller@mirror-01:/opt/amiller$ ./c1.pl
>> connected
>> could not receive message after 20 seconds at ./c1.pl line 17,  
>> <GEN0> line
>> 4.
>>
>> We also had this problem with the 20071031 snapshot, but at that  
>> time I
>> thought it was because we were using the 4.1.1 client jars.  BTW  
>> even though
>> we didn't see this issue in 20071002, we needed to upgrade to solve
>> AMQ-1479.
>>
>> Thanks again!
>> -Andy
>>
>>
>>
>>
>> rajdavies wrote:
>>>
>>> There's a potential fix for this I've just added to trunk - would it
>>> be possible to build the latest and test ?
>>>
>>>
>>> cheers,
>>>
>>> Rob
>>>
>>> http://rajdavies.blogspot.com/
>>>
>>>
>>>
>>> On Nov 1, 2007, at 9:47 PM, Andrew Miller wrote:
>>>
>>>>
>>>> We've been using ActiveMQ in an infrastructure refresh project, and
>>>> have run
>>>> into a nasty looking bug, it looks like we are dropping
>>>> messages.    We have
>>>> a few fast producers that will produce approximately 4-5 million
>>>> persistent
>>>> messages over the course of about 2.5 hours, while many slower
>>>> consumers
>>>> will consume these messages over the course of a day or so.
>>>>
>>>> We are using Jencks/Spring for the producers and consumers, using  
>>>> the
>>>> openwire protocol.  For this test, we were using the AMQ 4.1.1
>>>> client jars,
>>>> if that matters.  (We will try updating to the new AMQ5 client jars
>>>> today)
>>>>
>>>> About halfway through the production run, we started getting
>>>> IndexOutOfBounds exceptions in the log pretty frequently.  Later on
>>>> the
>>>> consumer tries to get these messages but appears to fail.
>>>
>>> (exception messages trimmed)
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/ 
>> IndexOutOfBoundsException-in-AMQMessageStore-using-20071028.173739- 
>> tf4734264s2354.html#a13541167
>> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/IndexOutOfBoundsException-in-AMQMessageStore-using-20071028.173739-tf4734264s2354.html#a13543207
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.


Mime
View raw message