activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Davies <rajdav...@gmail.com>
Subject Re: 5.0 (and later) Queue stops dispatching
Date Tue, 11 Mar 2008 19:43:53 GMT

On 11 Mar 2008, at 17:08, Pete Schwamb wrote:

> I've been seeing some freezes on Queue dispatching, particularly  
> with the latest 5.0 release and more recent snapshots.  I've tried  
> very hard to reproduce reliably it in a small test case, but it  
> seems very timing dependent.  I was able to reproduce at least one  
> variant of it fairly reliably.  I am using the default  
> AMQMessageStore setup.  Also, I'm using fuse-5.0.0.9, because the  
> SNAPSHOT builds are failing even more spectacularly for me at the  
> moment, though from what I've seen in SVN, I believe this is still  
> an issue in the trunk.
>
> There are two non-durable subscribers on the queue via stomp, and  
> they consume more slowly than the producer, which publishes in  
> bursts.  After the first burst of 30 - 50k messages, I stop the  
> producer and let the consumers catch up.  Then I publish another  
> burst of messages.  This is usually where the freeze happens.
>
> First, I usually get a message like the following:
>
> ERROR RecoveryListenerAdapter        - Message id  
> ID:sand-52497-1205185863002-2:1:2:1:42787 could not be recovered  
> from the data store - already dispatched
>
> Then the queue stops dispatching.
> Here's what I first saw in the debugger, after the "already  
> dispatched" message appears:
>
> a) on the Queue, messages.hasNext() returns false, so the doPageIn()  
> method never pages anything in.
> b) messages.hasNext() -> currentCursor.hasNext() -> fillBatch() ->  
> doFillBatch() -> this.store.recoverNextMessages(this.maxBatchSize,  
> this) ->
> this.store.recoverNextMessages(this.maxBatchSize, this)
> c) KahaReferenceStore recoverNextMessages gets null back from  
> messageContainer.getNext(entry), because entry.nextItem = -1
>
> However, the message store usually has many thousands of messages  
> still in it, as evidenced by the 'size' attribute on  
> DiskIndexLinkedList.  So this is the first hint that the LinkedList  
> is corrupt.  I started looking more closely at DiskIndexLinkedList,  
> and noticed the following incorrect (I think) behavior:
>
> In DiskIndexLinkedList.getNextEntry(IndexItem current), line 274 is  
> "result = last".  On some occasions result.nextItem is -1, and  
> last.nextItem != -1.  Shouldn't last.nextItem always be -1?  I'm  
> wondering if the opposite was intended: to update "last".
>
> So I changed the following:
>
> Index: src/main/java/org/apache/activemq/kaha/impl/index/ 
> DiskIndexLinkedList.java
> ===================================================================
> --- src/main/java/org/apache/activemq/kaha/impl/index/ 
> DiskIndexLinkedList.java  (revision 635580)
> +++ src/main/java/org/apache/activemq/kaha/impl/index/ 
> DiskIndexLinkedList.java  (working copy)
> @@ -271,7 +271,7 @@
>              }
>              // essential last get's updated consistently
>              if (result != null && last != null &&  
> last.equals(result)) {
> -                       result = last;
> +                       last = result;
>              }
>              return result;
>      }
>
>
> And indeed, I no longer get the "already dispatched" message, and  
> queues continue dispatching after many cycles of the producer  
> flowing 10s of thousands of messages through.
>
> Hopefully this sheds some light on stability issues others may be  
> having.  I'm not sure I've fixed the problem 100%.  Is anyone else  
> seeing this?
>
> -Pete
>
>
Beautiful!!! Thx Pete - love it when other folks fix my bugs ;)


cheers,

Rob

http://open.iona.com/ -Enterprise Open Integration
http://rajdavies.blogspot.com/




Mime
View raw message