Return-Path: X-Original-To: apmail-qpid-users-archive@www.apache.org Delivered-To: apmail-qpid-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 85C02F0AD for ; Wed, 21 Aug 2013 14:16:51 +0000 (UTC) Received: (qmail 80314 invoked by uid 500); 21 Aug 2013 14:16:51 -0000 Delivered-To: apmail-qpid-users-archive@qpid.apache.org Received: (qmail 79901 invoked by uid 500); 21 Aug 2013 14:16:47 -0000 Mailing-List: contact users-help@qpid.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@qpid.apache.org Delivered-To: mailing list users@qpid.apache.org Received: (qmail 79893 invoked by uid 99); 21 Aug 2013 14:16:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2013 14:16:46 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jimmyjones2@gmx.co.uk designates 212.227.17.22 as permitted sender) Received: from [212.227.17.22] (HELO mout.gmx.net) (212.227.17.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2013 14:16:38 +0000 Received: from mailout-eu.gmx.com ([10.1.101.214]) by mrigmx.server.lan (mrigmx002) with ESMTP (Nemesis) id 0LgKpW-1VoYwo13B6-00nf4C for ; Wed, 21 Aug 2013 16:16:18 +0200 Received: (qmail 3752 invoked by uid 0); 21 Aug 2013 14:16:18 -0000 Received: from 31.172.30.2 by rms-eu004 with HTTP Content-Type: text/plain; charset="utf-8" Date: Wed, 21 Aug 2013 16:16:16 +0200 From: "Jimmy Jones" Message-ID: <20130821141617.99380@gmx.com> MIME-Version: 1.0 Subject: Re: System stalling To: users@qpid.apache.org X-Flags: 0001 X-Mailer: GMX.com Web Mailer x-registered: 0 Content-Transfer-Encoding: 8bit X-GMX-UID: 5q5mcrZHeSEqI6EEq3whiXp+IGRvbwA7 X-Virus-Checked: Checked by ClamAV on apache.org > > I've got an simple processing system using the 0.22 C++ broker, all > > on one box, where an external system posts messages to the default > > headers exchange, and an ingest process receives them using a ring > > queue, transforms them and outputs to a different headers exchange. > > Various other processes pick messages of interest off that exchange > > using ring queues. Recently however the system has been stalling - > > I'm still receiving lots of data from the other system, but the > > ingest process suddenly goes to <5% CPU usage and its queue fills up > > and messages start getting discarded from the ring, the follow on > > processes go to practically 0% CPU and qpidd hovers around 95-120% > > CPU (normally its ~75%) and the rest of the system pretty much goes > > idle (no swapping, there is free memory) > > > > I attached to the ingest process with gdb and it was stuck in send > > (waitForCapacity/waitForCompletionImpl) - I notice this can block. > > Is there any queue bound to the second headers exchange, i.e. to the one > this ingest process is sending to, that is not a ring queue? (If you run > qpid-config queue -r, you get a quick listing of the queues and their > bindings). I've run qpid-config queue, and all my queues have --limit-policy=ring, apart from a UUID one which I presume is qpid-config itself. Are there any other useful debugging things I can do? > If there was a queue to which messages were enqueued that started to > apply rpoducer flow control, then that would block your ingest process > (and since the messages are still coming in, the broker would spend all > its time just removing old ones to make space). I'd expect the broker to use less CPU when discarding messages rather than shipping them to consumers? But I'm saying that without much knowledge of the code! > > However given the rest of the system is idle when this problem occurs > > I can't understand why this would happen. I added a SIGALARM handler > > around send with a timeout of 30s and the process did sometimes get > > killed. Looking at qpid-tool it does seem to still be processing > > messages, just extremely slowly. My other observation is from > > netstat, the Send-Q of qpidd to the ingest process is 16363, and the > > Recv-Q and Send-Q of the ingest process are both 0. > > > > Any ideas on what might be happening are very welcome! --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org For additional commands, e-mail: users-help@qpid.apache.org