Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7DB41200BA7 for ; Fri, 7 Oct 2016 00:37:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 7C4CF160AE0; Thu, 6 Oct 2016 22:37:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 628B1160ADB for ; Fri, 7 Oct 2016 00:37:04 +0200 (CEST) Received: (qmail 30116 invoked by uid 500); 6 Oct 2016 22:37:03 -0000 Mailing-List: contact users-help@qpid.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@qpid.apache.org Delivered-To: mailing list users@qpid.apache.org Received: (qmail 30099 invoked by uid 99); 6 Oct 2016 22:37:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Oct 2016 22:37:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 20CFD1A066B for ; Thu, 6 Oct 2016 22:37:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.198 X-Spam-Level: * X-Spam-Status: No, score=1.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id n9PXK-TAWZU2 for ; Thu, 6 Oct 2016 22:36:59 +0000 (UTC) Received: from mail-ua0-f171.google.com (mail-ua0-f171.google.com [209.85.217.171]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 4DF875FAC4 for ; Thu, 6 Oct 2016 22:36:58 +0000 (UTC) Received: by mail-ua0-f171.google.com with SMTP id p102so31011531uap.0 for ; Thu, 06 Oct 2016 15:36:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=3pFj8YUhYN5A4DdK6kxVzIojg9jN8TLuzK/MakYaZps=; b=0MdNOG7wOFocQwoj+9rW1azCxNbmrQtHGfIQYT9fgKtKEgYyUdNwcpCT4tH0EniD2Y G1GXhnvfD0tYdCjSp5pX2MZRNEIPQXaLIQTwjqZDt0f8t8C8sj6Z28WXzLfxGmRBdyFC sRHvLJwdETdvi/r7n4b3Qqn+zH5RT9M89ys7oqU6WjczKtlVfEf/XjaB12GXmHhHGY61 eF2UAiP2c2LOgUSpxXdErp8d5WeU5OA7gUVHgI79H6+ylDYTUh9UmECqahRxmfiGhKFQ 5XLhSLkF463WyVdXNPPPnc14+f0gFeqKtaqJrB3WH23zu84llZeYoC7CLyOtnjpZPfXE 8c7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=3pFj8YUhYN5A4DdK6kxVzIojg9jN8TLuzK/MakYaZps=; b=nDZjjlglz3joTRi9xBnyEuoUsAY4lFKn8b/60PTIR4+HTR/XNZyLixAvjH3EF6cGHK ftA3e5LfOfWJP1QB9jxffzSdAvizkVFBxN1/aAdGWhpvwOyDnqaMI24by3wWDs9XP3cw 04jX6JoMNkpPMvMhQGsDV++Q3iqodLvVwEB/NenszqCHAoIiHttLHxsNzWr8FPS4V6GG aGnfk3bCac2R0/g+eowNO+UBljuE5Xdn9yaSsAmtAs0+Wgh6c84g9Rn/Y13VkWNthkBM dXzK54A2mrWXXaX7BW5+ER49mN5/8Cfh3xY8Vdkr2V9yH3r8G0uofjij4HKd9pesMxxc rNZQ== X-Gm-Message-State: AA6/9RnQrdigefM1mA8tVKsQC7HVytOHbj+dJmcT1ENlQ6RSAZqtYKfRH0gPfj29AB2ogLuWMCU3rrC+KIC7Gw== X-Received: by 10.176.1.109 with SMTP id 100mr4080604uak.119.1475793411851; Thu, 06 Oct 2016 15:36:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.70.87 with HTTP; Thu, 6 Oct 2016 15:36:51 -0700 (PDT) In-Reply-To: References: From: Rob Godfrey Date: Fri, 7 Oct 2016 00:36:51 +0200 Message-ID: Subject: Re: org.apache.qpid.server.store.StoreException To: "users@qpid.apache.org" Cc: Helen Kwong Content-Type: text/plain; charset=UTF-8 archived-at: Thu, 06 Oct 2016 22:37:05 -0000 We're planning a 6.0.5 to be released really soon which should be a drop in replacement for 6.0.4 - this fix should be in it. The code here hasn't changed for a long, *long* time so you'll have been exposed to it for as long as you've been running Qpid... I think you'd have to be *really* unlucky to hit the bug I identified if you are not using TTL (even with TTL it's still a pretty unlucky thing to hit). -- Rob On 7 October 2016 at 00:05, Ramayan Tiwari wrote: > Yeah, we have been using this version of the broker in production for more > than a year. I am not aware of any change with respect to our production > environment. We only saw this issue past weekend in only one of the > instance (all the brokers in that instance) among more than 100 instances. > Broker hardware and configs are same across all the instances. I will keep > digging. > > For 0.32, we have patched the broker before so we will be able apply this > patch (if we go that route). We are currently performance testing 6.0.4 and > if that goes well, we will probably using the fix that you make in 6.0.x. > > Thanks > Ramayan > > On Thu, Oct 6, 2016 at 2:46 PM, Rob Godfrey wrote: > >> OK - TTL was the most likely way... there are other possibilities >> (basically it just needs the message to go from being available to >> dequeued simultaneous to it being evaluated to be sent to a consumer - >> competing consumers *could* do it, though I would have guessed it to >> be unlikely). It's possible there's another bug separate to the one I >> identified that you are running in to, though nothing jumps out at me. >> I believe you guys have been running 0.32 for a while... have you just >> started seeing this issue - has something changed in your environment >> or your messaging usage? >> >> We won't be doing a 0.32.x release, there's a *lot* of other stuff >> that has been fixed in 6.0 which I would include in that before I'd >> include this change... I think you may have already patched your 0.32 >> broker anyway, in which case you should be able to add the patch I put >> on the JIRA. >> >> On 6 October 2016 at 23:33, Ramayan Tiwari >> wrote: >> > Hi Rob, >> > >> > Thanks so much a prompt response and create JIRA to track this. >> > >> > We don't set a TTL on our messages though -- i.e., we use 0 (unlimited). >> We >> > do send it in non-persistent mode; not sure if that matters or could >> > trigger the same bug? >> > >> > For the fix, would it be possible to fix it in 0.32 as well as in 6.0.x ? >> > >> > CCing, Helen from our team. >> > >> > Thanks, >> > >> > Ramayan >> > >> > On Thu, Oct 6, 2016 at 1:21 PM, Rob Godfrey >> wrote: >> > >> >> I've added a patch file which should apply to 0.32 on the following >> >> JIRA: https://issues.apache.org/jira/browse/QPID-7451 >> >> >> >> Hope this helps, >> >> Rob >> >> >> >> On 6 October 2016 at 22:04, Rob Godfrey >> wrote: >> >> > Hi Ramayan, >> >> > >> >> > this is exception indicates that a message has been deleted from the >> >> > store, but the in-memory queue still references it. Unfortunately the >> >> > exception doesn't really tell us anything about how the broker will >> >> > have got to this state. >> >> > >> >> > Having looked at the code I have an idea about what may be happening - >> >> > do your messages have a TTL set? I *think* that the AMQP 0-10 message >> >> > path may be vulnerable to a race condition if a message expires at >> >> > precisely the same time that the message is picked up to be sent to a >> >> > consumer (essentially the message has to be available to consume and >> >> > then three lines of code later it must have been deleted). The 0-9-1 >> >> > codepath is not vulnerable to this as it caches the size of the >> >> > message (as you can see in the stack trace, the consumer is checking >> >> > the size of the message to make sure that the consumer has enough >> >> > credit to receive it). >> >> > >> >> > We're unlikely to do a patch release for 0.32, but we will likely be >> >> > putting out a new 6.0.x release soon, and soon after a 6.1 release. >> >> > Would you be able to upgrade to one of these, or would you prefer me >> >> > to send you a patch file that you could apply to the 0.32 source to >> >> > test? >> >> > >> >> > -- Rob >> >> > >> >> > On 6 October 2016 at 21:27, Ramayan Tiwari >> >> wrote: >> >> >> Hi, >> >> >> >> >> >> >> >> >> We are ran into this StoreException in our production environment >> >> multiple >> >> >> times on different brokers, which caused broker shutdown. We are >> running >> >> >> 0.32 Java broker with 0.16 client. I see that this was reported and >> >> fixed >> >> >> here: >> >> >> https://issues.apache.org/jira/browse/QPID-4012 >> >> >> >> >> >> This is still happening, I don't have enough context to reproduce >> this >> >> >> locally. Any help is appreciated! >> >> >> >> >> >> Thanks >> >> >> Ramayan >> >> >> >> >> >> *Exception* >> >> >> >> >> >> Uncaught exception, shutting down. >> >> >> org.apache.qpid.server.store.StoreException: Metadata not found for >> >> message >> >> >> with id 1762118451 >> >> >> at >> >> >> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore. >> >> getMessageMetaData(AbstractBDBMessageStore.java:343) >> >> >> at >> >> >> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$ >> >> StoredBDBMessage.getMetaData(AbstractBDBMessageStore.java:1224) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.MessageTransferMessage. >> >> getMetaData(MessageTransferMessage.java:41) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.MessageTransferMessage. >> getSize( >> >> MessageTransferMessage.java:56) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.ConsumerTarget_ >> >> 0_10.allocateCredit(ConsumerTarget_0_10.java:494) >> >> >> at >> >> >> org.apache.qpid.server.queue.QueueConsumerImpl.wouldSuspend( >> >> QueueConsumerImpl.java:278) >> >> >> at >> >> >> org.apache.qpid.server.queue.AbstractQueue.attemptDelivery( >> >> AbstractQueue.java:2059) >> >> >> at >> >> >> org.apache.qpid.server.queue.AbstractQueue.flushConsumer( >> >> AbstractQueue.java:1981) >> >> >> at >> >> >> org.apache.qpid.server.queue.AbstractQueue.flushConsumer( >> >> AbstractQueue.java:1957) >> >> >> at >> >> >> org.apache.qpid.server.queue.QueueConsumerImpl.flush( >> >> QueueConsumerImpl.java:318) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.ConsumerTarget_ >> >> 0_10.flush(ConsumerTarget_0_10.java:605) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.ServerSessionDelegate. >> >> messageFlush(ServerSessionDelegate.java:521) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.ServerSessionDelegate. >> >> messageFlush(ServerSessionDelegate.java:82) >> >> >> at org.apache.qpid.transport.MessageFlush.dispatch( >> >> MessageFlush.java:87) >> >> >> at >> >> >> org.apache.qpid.transport.SessionDelegate.command( >> >> SessionDelegate.java:55) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.ServerSessionDelegate.command( >> >> ServerSessionDelegate.java:99) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.ServerSessionDelegate.command( >> >> ServerSessionDelegate.java:82) >> >> >> at org.apache.qpid.transport.Method.delegate(Method.java:159) >> >> >> at org.apache.qpid.transport.Session.received(Session.java:596) >> >> >> at org.apache.qpid.transport.Connection.dispatch( >> Connection.java:452) >> >> >> at >> >> >> org.apache.qpid.transport.ConnectionDelegate.handle( >> >> ConnectionDelegate.java:64) >> >> >> at >> >> >> org.apache.qpid.transport.ConnectionDelegate.handle( >> >> ConnectionDelegate.java:40) >> >> >> at >> >> >> org.apache.qpid.transport.MethodDelegate.messageFlush( >> >> MethodDelegate.java:143) >> >> >> at org.apache.qpid.transport.MessageFlush.dispatch( >> >> MessageFlush.java:87) >> >> >> at >> >> >> org.apache.qpid.transport.ConnectionDelegate.command( >> >> ConnectionDelegate.java:54) >> >> >> at >> >> >> org.apache.qpid.transport.ConnectionDelegate.command( >> >> ConnectionDelegate.java:40) >> >> >> at org.apache.qpid.transport.Method.delegate(Method.java:159) >> >> >> at org.apache.qpid.transport.Connection.received( >> Connection.java:405) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.ServerConnection.access$001( >> >> ServerConnection.java:64) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.ServerConnection$1.run( >> >> ServerConnection.java:316) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.ServerConnection$1.run( >> >> ServerConnection.java:312) >> >> >> at java.security.AccessController.doPrivileged(Native Method) >> >> >> at javax.security.auth.Subject.doAs(Subject.java:360) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.ServerConnection.received( >> >> ServerConnection.java:311) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.ServerConnection.received( >> >> ServerConnection.java:64) >> >> >> at org.apache.qpid.transport.network.Assembler.emit( >> Assembler.java:97) >> >> >> at org.apache.qpid.transport.network.Assembler.assemble( >> >> Assembler.java:198) >> >> >> at org.apache.qpid.transport.network.Assembler.frame( >> >> Assembler.java:131) >> >> >> at org.apache.qpid.transport.network.Frame.delegate(Frame.java:128) >> >> >> at org.apache.qpid.transport.network.Assembler.received( >> >> Assembler.java:102) >> >> >> at org.apache.qpid.transport.network.Assembler.received( >> >> Assembler.java:44) >> >> >> at >> >> >> org.apache.qpid.transport.network.InputHandler.next( >> >> InputHandler.java:199) >> >> >> at >> >> >> org.apache.qpid.transport.network.InputHandler.received( >> >> InputHandler.java:114) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.ProtocolEngine_ >> >> 0_10.received(ProtocolEngine_0_10.java:179) >> >> >> at >> >> >> org.apache.qpid.server.protocol.v0_10.ProtocolEngine_ >> >> 0_10.received(ProtocolEngine_0_10.java:43) >> >> >> at >> >> >> org.apache.qpid.server.protocol.MultiVersionProtocolEngine.received( >> >> MultiVersionProtocolEngine.java:153) >> >> >> at >> >> >> org.apache.qpid.server.protocol.MultiVersionProtocolEngine.received( >> >> MultiVersionProtocolEngine.java:51) >> >> >> at org.apache.qpid.transport.network.io.IoReceiver.run( >> >> IoReceiver.java:161) >> >> >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org >> >> For additional commands, e-mail: users-help@qpid.apache.org >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org >> For additional commands, e-mail: users-help@qpid.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org For additional commands, e-mail: users-help@qpid.apache.org