qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kim van der Riet <kim.vdr...@redhat.com>
Subject Re: Journal Full Error
Date Fri, 24 Jul 2009 19:13:57 GMT
On Fri, 2009-07-24 at 10:50 -0700, Sandy Pratt wrote:
> Hi,
> 
> I'm experimenting with the qpidd broker supplied with Fedora 10 (qpidd (qpidc) version
0.5 - the install should be up to date as of yesterday).
> Basically, I'm having a problem where I've been able to enqueue enough messages that
my journal is full and I can't start the broker because of this.  This is contrary to my understanding
of how persistent queues were supposed to work, in a number of ways:
> 
> 1) I thought producers were not allowed to commit messages past the 80% full point.
Correct. You should see a Enqueue capacity threshold error message such
as:
2009-07-24 14:20:48 notice Journal "perftest0": Created
2009-07-24 14:23:31 warning Journal "perftest0": Enqueue capacity threshold exceeded on queue
"perftest0".
2009-07-24 14:23:31 error Unexpected exception: Enqueue capacity threshold exceeded on queue
"perftest0". (JournalImpl.cpp:577)
2009-07-24 14:23:31 error Connection 127.0.0.1:55390 closed by error: Enqueue capacity threshold
exceeded on queue "perftest0". (JournalImpl.cpp:577)(501)

which closes the connection to the producer.

The journal is a circular disk buffer. Both enqueues and dequeues are
performed by writing records at the head of the journal. Thus a full
journal is a fatal condition because you can neither enqueue nor dequeue
(to make space). The 80% policy limits enqueues only; dequeues are not
subject to this limit as these operations inherently free up journal
space.

In practice, it may be possible to fill a journal by using tiny messages
(~10 bytes) to just below the enqueue threshold, then dequeueing all
these messages except the first. In this case the sheer number of
dequeues will fill the remaining approx. 20% of the journal while
keeping the first message in place as a blocker. One of the functional
tests uses this technique to test that a full journal occurs as
expected.

> 
> 2) I thought the flow-to-disk policy would result in excess messages being accepted into
a slower interim storage, not being rejected or causing the broker to crash.
This is incorrect. However, staged messages (ie messages that exceed a
broker per-message size threshold) are stored outside the journal in
BDB.
> 
> 3) It appears as though I was able to crash the broker and cause the apparent loss of
acked messages simply by sending the broker too many?
I have not been able to reproduce your scenario. I have attempted to
create and set up the broker using your parameters, but using a c++
client rather than a Java one, I always hit the enqueue threshold first
(as expected) I have even turned on a transactional block size of 1
(whcih emulates JMS IIRC), but in all cases, I have not been able to
reproduce this.

I would be interested in the following if you are able to supply it:

1. A reproducer (ie a program that will cause this error) that I can
run, and/or

2. A copy of the journal files so I can take a look at the state of the
journals.

> 
> ...
> 
> Here are some details about what I was doing:
> 
> I created a queue with the following command:
> 
> qpid-config --durable add queue test3 --durable --limit-policy flow-to-disk
> 
> This created the following journal files:
> 
> [root@prattrs-fedora test3]# ll
> total 12356
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:26 JournalData.0000.jdat
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:27 JournalData.0001.jdat
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:28 JournalData.0002.jdat
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:28 JournalData.0003.jdat
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:28 JournalData.0004.jdat
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:29 JournalData.0005.jdat
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:22 JournalData.0006.jdat
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:22 JournalData.0007.jdat
> -rw-r----- 1 qpidd qpidd     866 2009-07-22 14:23 JournalData.jinf
> [root@prattrs-fedora test3]# du -h
> 13M    .
> [root@prattrs-fedora test3]#

This is a standard 12.5MiB journal.

> 
> I then did a test where I ran some producers in Java JMS for a while without any consumers
running:
> 
>                                                                                     
                           TextMessage msg = sendSession.createTextMessage(testContext.getPrefix()
>                                                                                     
                                                           + RandomUtils.randomAlphaNumericString(4085));
>                                                                                     
                           producer.send(msg, DeliveryMode.PERSISTENT, Message.DEFAULT_PRIORITY,
>                                                                                     
                                                           Message.DEFAULT_TIME_TO_LIVE);
> 
> Unsurprisingly, my journal quickly filled up and the broker showed this in logs:
> 
> Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Failed to prepare: Journal
full on queue "test3". (JournalImpl.cpp:586)
> Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 critical Journal "test3":
Journal full on queue "test3".
> Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Error completing xid
yu\x00\x00\x00\x00\x00\x00\x97\xD6\xA9aP\x18IM\xB6\x91\xFA\xF1\xE6\x18}W: Journal full on
queue "test3". (JournalImpl.cpp:586)
> Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Commit failed with exception:
Journal full on queue "test3". (JournalImpl.cpp:586)
> Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Execution exception:
internal-error: Commit failed (qpid/broker/SemanticState.cpp:123)
> Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Channel exception: not-attached:
Channel 0 is not attached (qpid/amqp_0_10/SessionHandler.cpp:40)
> Jul 23 17:29:33 localhost qpidd[2550]: 2009-07-23 17:29:33 critical Journal "test3":
Journal full on queue "test3".
> Jul 23 17:29:33 localhost qpidd[2550]: 2009-07-23 17:29:33 error Unexpected exception:
Journal full on queue "test3". (JournalImpl.cpp:586)
> Jul 23 17:29:33 localhost qpidd[2550]: 2009-07-23 17:29:33 error Connection 153.32.227.108:57837
closed by error: Journal full on queue "test3". (JournalImpl.cpp:586)(501)
> 
> I can sort of understand this, although it means flow-to-disk is not functioning for
my queue, I suppose.  However, it turns out that I can't restart the broker either:
> 
> Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 notice Store module initialized;
dir=/var/lib/qpidd
> Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 notice Journal "test1": Created
> Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 notice Journal "test2": Created
> Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 warning Journal "test2": Recovery
found 12 files (different from --num-jfiles value of 8).
> Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 warning Journal "test2": Recovery
found file size = 12 (different from --jfile-size-pgs value of 24).
> Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 notice Journal "test3": Created
> Jul 23 17:40:01 localhost qpidd[6303]: 2009-07-23 17:40:01 critical Daemon startup failed:
Queue test3: recoverQueues() failed: jexception 0x0205 jcntl::recover() threw JERR_JCNTL_RECOVERJFULL:
Journal data files full, cannot write. (MessageStoreImpl.cpp:785)
> Jul 23 17:40:01 localhost qpidd[6302]: 2009-07-23 17:40:01 critical Broker start-up failed:
Daemon startup failed: Queue test3: recoverQueues() failed: jexception 0x0205 jcntl::recover()
threw JERR_JCNTL_RECOVERJFULL: Journal data files full, cannot write. (MessageStoreImpl.cpp:785)
This is expected. It is not possible to restart a full broker; it will
recover into a full state which is just as broken as prior to the
shutdown. At some future point, I will add auto-expansion to the
journal, and when this happens, it may be possible to expand a recovered
journal such that dequeues (and enqueues if there is enough space) are
possible on recover.

> 
> My client jars are qpid-client-M5.1.jar and qpid-common-M5.1.jar.
> 
> Thanks in advance to anyone who can help.   Questions are listed above.
> 
> Sandy
> 
Thanks for bringing this up, I am interested to see how the enqueue
capacity check is being evaded. It may be some odd sequence that occurs
through JMS - but I fail to see how.

Kim


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Mime
View raw message