qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Godfrey <rob.j.godf...@gmail.com>
Subject Re: org.apache.qpid.server.store.StoreException
Date Thu, 06 Oct 2016 22:36:51 GMT
We're planning a 6.0.5 to be released really soon which should be a
drop in replacement for 6.0.4 - this fix should be in it.

The code here hasn't changed for a long, *long* time so you'll have
been exposed to it for as long as you've been running Qpid... I think
you'd have to be *really* unlucky to hit the bug I identified if you
are not using TTL (even with TTL it's still a pretty unlucky thing to
hit).

-- Rob



On 7 October 2016 at 00:05, Ramayan Tiwari <ramayan.tiwari@gmail.com> wrote:
> Yeah, we have been using this version of the broker in production for more
> than a year. I am not aware of any change with respect to our production
> environment. We only saw this issue past weekend in only one of the
> instance (all the brokers in that instance) among more than 100 instances.
> Broker hardware and configs are same across all the instances. I will keep
> digging.
>
> For 0.32, we have patched the broker before so we will be able apply this
> patch (if we go that route). We are currently performance testing 6.0.4 and
> if that goes well, we will probably using the fix that you make in 6.0.x.
>
> Thanks
> Ramayan
>
> On Thu, Oct 6, 2016 at 2:46 PM, Rob Godfrey <rob.j.godfrey@gmail.com> wrote:
>
>> OK - TTL was the most likely way... there are other possibilities
>> (basically it just needs the message to go from being available to
>> dequeued simultaneous to it being evaluated to be sent to a consumer -
>> competing consumers *could* do it, though I would have guessed it to
>> be unlikely).  It's possible there's another bug separate to the one I
>> identified that you are running in to, though nothing jumps out at me.
>> I believe you guys have been running 0.32 for a while... have you just
>> started seeing this issue - has something changed in your environment
>> or your messaging usage?
>>
>> We won't be doing a 0.32.x release, there's a *lot* of other stuff
>> that has been fixed in 6.0 which I would include in that before I'd
>> include this change... I think you may have already patched your 0.32
>> broker anyway, in which case you should be able to add the patch I put
>> on the JIRA.
>>
>> On 6 October 2016 at 23:33, Ramayan Tiwari <ramayan.tiwari@gmail.com>
>> wrote:
>> > Hi Rob,
>> >
>> > Thanks so much a prompt response and create JIRA to track this.
>> >
>> > We don't set a TTL on our messages though -- i.e., we use 0 (unlimited).
>> We
>> > do send it in non-persistent mode; not sure if that matters or could
>> > trigger the same bug?
>> >
>> > For the fix, would it be possible to fix it in 0.32 as well as in 6.0.x ?
>> >
>> > CCing, Helen from our team.
>> >
>> > Thanks,
>> >
>> > Ramayan
>> >
>> > On Thu, Oct 6, 2016 at 1:21 PM, Rob Godfrey <rob.j.godfrey@gmail.com>
>> wrote:
>> >
>> >> I've added a patch file which should apply to 0.32 on the following
>> >> JIRA: https://issues.apache.org/jira/browse/QPID-7451
>> >>
>> >> Hope this helps,
>> >> Rob
>> >>
>> >> On 6 October 2016 at 22:04, Rob Godfrey <rob.j.godfrey@gmail.com>
>> wrote:
>> >> > Hi Ramayan,
>> >> >
>> >> > this is exception indicates that a message has been deleted from the
>> >> > store, but the in-memory queue still references it.  Unfortunately
the
>> >> > exception doesn't really tell us anything about how the broker will
>> >> > have got to this state.
>> >> >
>> >> > Having looked at the code I have an idea about what may be happening
-
>> >> > do your messages have a TTL set?  I *think* that the AMQP 0-10 message
>> >> > path may be vulnerable to a race condition if a message expires at
>> >> > precisely the same time that the message is picked up to be sent to
a
>> >> > consumer (essentially the message has to be available to consume and
>> >> > then three lines of code later it must have been deleted).  The 0-9-1
>> >> > codepath is not vulnerable to this as it caches the size of the
>> >> > message (as you can see in the stack trace, the consumer is checking
>> >> > the size of the message to make sure that the consumer has enough
>> >> > credit to receive it).
>> >> >
>> >> > We're unlikely to do a patch release for 0.32, but we will likely be
>> >> > putting out a new 6.0.x release soon, and soon after a 6.1 release.
>> >> > Would you be able to upgrade to one of these, or would you prefer me
>> >> > to send you a patch file that you could apply to the 0.32 source to
>> >> > test?
>> >> >
>> >> > -- Rob
>> >> >
>> >> > On 6 October 2016 at 21:27, Ramayan Tiwari <ramayan.tiwari@gmail.com>
>> >> wrote:
>> >> >> Hi,
>> >> >>
>> >> >>
>> >> >> We are ran into this StoreException in our production environment
>> >> multiple
>> >> >> times on different brokers, which caused broker shutdown. We are
>> running
>> >> >> 0.32 Java broker with 0.16 client. I see that this was reported
and
>> >> fixed
>> >> >> here:
>> >> >> https://issues.apache.org/jira/browse/QPID-4012
>> >> >>
>> >> >> This is still happening, I don't have enough context to reproduce
>> this
>> >> >> locally. Any help is appreciated!
>> >> >>
>> >> >> Thanks
>> >> >> Ramayan
>> >> >>
>> >> >> *Exception*
>> >> >>
>> >> >> Uncaught exception, shutting down.
>> >> >> org.apache.qpid.server.store.StoreException: Metadata not found
for
>> >> message
>> >> >> with id 1762118451
>> >> >> at
>> >> >> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.
>> >> getMessageMetaData(AbstractBDBMessageStore.java:343)
>> >> >> at
>> >> >> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$
>> >> StoredBDBMessage.getMetaData(AbstractBDBMessageStore.java:1224)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.MessageTransferMessage.
>> >> getMetaData(MessageTransferMessage.java:41)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.MessageTransferMessage.
>> getSize(
>> >> MessageTransferMessage.java:56)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.ConsumerTarget_
>> >> 0_10.allocateCredit(ConsumerTarget_0_10.java:494)
>> >> >> at
>> >> >> org.apache.qpid.server.queue.QueueConsumerImpl.wouldSuspend(
>> >> QueueConsumerImpl.java:278)
>> >> >> at
>> >> >> org.apache.qpid.server.queue.AbstractQueue.attemptDelivery(
>> >> AbstractQueue.java:2059)
>> >> >> at
>> >> >> org.apache.qpid.server.queue.AbstractQueue.flushConsumer(
>> >> AbstractQueue.java:1981)
>> >> >> at
>> >> >> org.apache.qpid.server.queue.AbstractQueue.flushConsumer(
>> >> AbstractQueue.java:1957)
>> >> >> at
>> >> >> org.apache.qpid.server.queue.QueueConsumerImpl.flush(
>> >> QueueConsumerImpl.java:318)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.ConsumerTarget_
>> >> 0_10.flush(ConsumerTarget_0_10.java:605)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.ServerSessionDelegate.
>> >> messageFlush(ServerSessionDelegate.java:521)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.ServerSessionDelegate.
>> >> messageFlush(ServerSessionDelegate.java:82)
>> >> >> at org.apache.qpid.transport.MessageFlush.dispatch(
>> >> MessageFlush.java:87)
>> >> >> at
>> >> >> org.apache.qpid.transport.SessionDelegate.command(
>> >> SessionDelegate.java:55)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.ServerSessionDelegate.command(
>> >> ServerSessionDelegate.java:99)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.ServerSessionDelegate.command(
>> >> ServerSessionDelegate.java:82)
>> >> >> at org.apache.qpid.transport.Method.delegate(Method.java:159)
>> >> >> at org.apache.qpid.transport.Session.received(Session.java:596)
>> >> >> at org.apache.qpid.transport.Connection.dispatch(
>> Connection.java:452)
>> >> >> at
>> >> >> org.apache.qpid.transport.ConnectionDelegate.handle(
>> >> ConnectionDelegate.java:64)
>> >> >> at
>> >> >> org.apache.qpid.transport.ConnectionDelegate.handle(
>> >> ConnectionDelegate.java:40)
>> >> >> at
>> >> >> org.apache.qpid.transport.MethodDelegate.messageFlush(
>> >> MethodDelegate.java:143)
>> >> >> at org.apache.qpid.transport.MessageFlush.dispatch(
>> >> MessageFlush.java:87)
>> >> >> at
>> >> >> org.apache.qpid.transport.ConnectionDelegate.command(
>> >> ConnectionDelegate.java:54)
>> >> >> at
>> >> >> org.apache.qpid.transport.ConnectionDelegate.command(
>> >> ConnectionDelegate.java:40)
>> >> >> at org.apache.qpid.transport.Method.delegate(Method.java:159)
>> >> >> at org.apache.qpid.transport.Connection.received(
>> Connection.java:405)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.ServerConnection.access$001(
>> >> ServerConnection.java:64)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.ServerConnection$1.run(
>> >> ServerConnection.java:316)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.ServerConnection$1.run(
>> >> ServerConnection.java:312)
>> >> >> at java.security.AccessController.doPrivileged(Native Method)
>> >> >> at javax.security.auth.Subject.doAs(Subject.java:360)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.ServerConnection.received(
>> >> ServerConnection.java:311)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.ServerConnection.received(
>> >> ServerConnection.java:64)
>> >> >> at org.apache.qpid.transport.network.Assembler.emit(
>> Assembler.java:97)
>> >> >> at org.apache.qpid.transport.network.Assembler.assemble(
>> >> Assembler.java:198)
>> >> >> at org.apache.qpid.transport.network.Assembler.frame(
>> >> Assembler.java:131)
>> >> >> at org.apache.qpid.transport.network.Frame.delegate(Frame.java:128)
>> >> >> at org.apache.qpid.transport.network.Assembler.received(
>> >> Assembler.java:102)
>> >> >> at org.apache.qpid.transport.network.Assembler.received(
>> >> Assembler.java:44)
>> >> >> at
>> >> >> org.apache.qpid.transport.network.InputHandler.next(
>> >> InputHandler.java:199)
>> >> >> at
>> >> >> org.apache.qpid.transport.network.InputHandler.received(
>> >> InputHandler.java:114)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.ProtocolEngine_
>> >> 0_10.received(ProtocolEngine_0_10.java:179)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.v0_10.ProtocolEngine_
>> >> 0_10.received(ProtocolEngine_0_10.java:43)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.MultiVersionProtocolEngine.received(
>> >> MultiVersionProtocolEngine.java:153)
>> >> >> at
>> >> >> org.apache.qpid.server.protocol.MultiVersionProtocolEngine.received(
>> >> MultiVersionProtocolEngine.java:51)
>> >> >> at org.apache.qpid.transport.network.io.IoReceiver.run(
>> >> IoReceiver.java:161)
>> >> >> at java.lang.Thread.run(Thread.java:745)
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> >> For additional commands, e-mail: users-help@qpid.apache.org
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org


Mime
View raw message