Return-Path: X-Original-To: apmail-qpid-users-archive@www.apache.org Delivered-To: apmail-qpid-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70C7010A2B for ; Fri, 23 Aug 2013 09:10:01 +0000 (UTC) Received: (qmail 12939 invoked by uid 500); 23 Aug 2013 09:10:01 -0000 Delivered-To: apmail-qpid-users-archive@qpid.apache.org Received: (qmail 12834 invoked by uid 500); 23 Aug 2013 09:10:00 -0000 Mailing-List: contact users-help@qpid.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@qpid.apache.org Delivered-To: mailing list users@qpid.apache.org Received: (qmail 12822 invoked by uid 99); 23 Aug 2013 09:10:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Aug 2013 09:10:00 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of fraser.adams@blueyonder.co.uk designates 80.0.253.70 as permitted sender) Received: from [80.0.253.70] (HELO know-smtprelay-omc-6.server.virginmedia.net) (80.0.253.70) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Aug 2013 09:09:54 +0000 Received: from [192.168.1.104] ([82.38.114.236]) by know-smtprelay-6-imp with bizsmtp id G99X1m00Y565Hkq0199Xlz; Fri, 23 Aug 2013 10:09:31 +0100 X-Originating-IP: [82.38.114.236] X-Spam: 0 X-Authority: v=2.0 cv=L+FF2Jv8 c=1 sm=1 a=c+0xeZRTdqkZ4Fbgf1mM4g==:17 a=-TtDL2fbfKMA:10 a=3NElcqgl2aoA:10 a=IkcTkHD0fZMA:10 a=a5Gf7U6LAAAA:8 a=lZ-d4oO-XDQA:10 a=mV9VRH-2AAAA:8 a=Q6umgoOZVZy0pE6maKkA:9 a=QEXdDO2ut3YA:10 a=3FaiEiMkITi31NwA:21 a=sqhxA3jQz0K3blGo:21 a=c+0xeZRTdqkZ4Fbgf1mM4g==:117 Message-ID: <521726CB.50000@blueyonder.co.uk> Date: Fri, 23 Aug 2013 10:09:31 +0100 From: Fraser Adams User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5 MIME-Version: 1.0 To: users@qpid.apache.org Subject: Re: System stalling References: <20130821160803.256240@gmx.com> In-Reply-To: <20130821160803.256240@gmx.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi Jimmy, hope you are well! As an experiment one thing that you could try is messing with the link "reliability". As you know in the normal mode of operation it's necessary to periodically send acknowledgements from the consumer client application which then get passed back ultimately to the broker. I'm no expert on this but from my recollection if you are in a position particularly where circular queues are overflowing and you are continually trying to produce and consume and you have some fair level of prefetch/capacity on the consumer the mechanism for handling the acknowledgements on the broker is "sub-optimal" - I think it's a linear search or some such and there are conditions where catching up with acknowledgements becomes a bit "N squared". Gordon would be able to explain this way better than me - that's assuming this hypothesis is even relevant :-) Anyway if you try having a link: {reliability: unreliable} stanza in your consumer address string (as an example one of mine looks like the following - the address sting syntax isn't exactly trivial :-)). string address = "test_consumer; {create: receiver, node: {x-declare: {auto-delete: True, exclusive: True, arguments: {'qpid.policy_type': ring, 'qpid.max_size': 100000000}}, x-bindings: [{exchange: 'amq.match', queue: 'test_consumer', key: 'test1', arguments: {x-match: all, data-format: test}}]}, link: {reliability: unreliable}}"; Clearly your arguments would be different but hopefully it'll give you a kick start. The main down side of disabling link reliability is that if you have enabled prefetch and the consumer unexpectedly dies then all of the messages on the prefetch queue will be lost, whereas with reliable messaging the broker maintains references to all unacknowledged messages so would resent them (I *think* that's how it works.....) At the very least it's a fairly simple tweak to your consumer addresses that might rule out (or point to) acknowledgement shenanigans as being the root of your problem. From my own experience I always end up blaming this first if I hit performance weirdness with ring queues :-) HTH, Frase On 21/08/13 17:08, Jimmy Jones wrote: >>>>> I've got an simple processing system using the 0.22 C++ broker, all >>>>> on one box, where an external system posts messages to the default >>>>> headers exchange, and an ingest process receives them using a ring >>>>> queue, transforms them and outputs to a different headers exchange. >>>>> Various other processes pick messages of interest off that exchange >>>>> using ring queues. Recently however the system has been stalling - >>>>> I'm still receiving lots of data from the other system, but the >>>>> ingest process suddenly goes to <5% CPU usage and its queue fills up >>>>> and messages start getting discarded from the ring, the follow on >>>>> processes go to practically 0% CPU and qpidd hovers around 95-120% >>>>> CPU (normally its ~75%) and the rest of the system pretty much goes >>>>> idle (no swapping, there is free memory) >>>>> >>>>> I attached to the ingest process with gdb and it was stuck in send >>>>> (waitForCapacity/waitForCompletionImpl) - I notice this can block. >>>> Is there any queue bound to the second headers exchange, i.e. to the one >>>> this ingest process is sending to, that is not a ring queue? (If you run >>>> qpid-config queue -r, you get a quick listing of the queues and their >>>> bindings). >>> I've run qpid-config queue, and all my queues have --limit-policy=ring, apart >>> from a UUID one which I presume is qpid-config itself. Are there any other useful >>> debugging things I can do? >> What does qpid-stat -q show? Is it possible to test whether the broker >> is still responsive e,g, by sending and receiving messages through a >> test queue/exchange? Are there any errors in the logs? Are any of the >> queues durable (and messages persistent)? > qpid-stat -q is all zero's in the msg & bytes column, apart from the ingest queue, > and another overflowing ring queue I have. > > I did run qpid-tool when the system was broken to dump some stats. msgTotalDequeues > was slowly incremeneting on the ingest queue, so I presume messages were still being > delivered and the broker was responsive? > > The only logging I've got is syslog, and I just see a warning about unsent data, > presumably when the ingest process receives a SIGALARM. I'm happy to swich on more > logging, what would you recommend? > > None of my queues are durable, but I think incoming messages from the other system > are marked as durable. The exchange that the ingest process sends to is durable, > but I'm not setting any durable flags on outgoing messages (I presume the default > is off). > >> Another thing might be a ptrace of the broker process. Maybe two or >> three with a short delay between them. > I'll try this next time it goes haywire. > >> For some reason it seems like the broker is not sending back >> confirmation to the sender in the ingest process, causing that to block. >> Ring queues shouldn't be subject to producer flow control so we need to >> figure out what other reason there could be for that. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org > For additional commands, e-mail: users-help@qpid.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org For additional commands, e-mail: users-help@qpid.apache.org