Return-Path: Delivered-To: apmail-qpid-dev-archive@www.apache.org Received: (qmail 1571 invoked from network); 3 Nov 2009 20:17:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Nov 2009 20:17:10 -0000 Received: (qmail 62801 invoked by uid 500); 3 Nov 2009 20:17:10 -0000 Delivered-To: apmail-qpid-dev-archive@qpid.apache.org Received: (qmail 62725 invoked by uid 500); 3 Nov 2009 20:17:09 -0000 Mailing-List: contact dev-help@qpid.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@qpid.apache.org Delivered-To: mailing list dev@qpid.apache.org Received: (qmail 62706 invoked by uid 99); 3 Nov 2009 20:17:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Nov 2009 20:17:08 +0000 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cctrieloff@redhat.com designates 209.132.183.28 as permitted sender) Received: from [209.132.183.28] (HELO mx1.redhat.com) (209.132.183.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Nov 2009 20:17:01 +0000 Received: from int-mx04.intmail.prod.int.phx2.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.17]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id nA3KGcRH007248; Tue, 3 Nov 2009 15:16:39 -0500 Received: from localhost.localdomain (dhcp-100-19-90.bos.redhat.com [10.16.19.90]) by int-mx04.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id nA3KGcdm026493; Tue, 3 Nov 2009 15:16:38 -0500 Message-ID: <4AF08F93.1030402@redhat.com> Date: Tue, 03 Nov 2009 15:16:19 -0500 From: Carl Trieloff Reply-To: cctrieloff@redhat.com Organization: Red Hat User-Agent: Thunderbird 2.0.0.23 (X11/20090825) MIME-Version: 1.0 To: dev@qpid.apache.org CC: "users@qpid.apache.org" Subject: Re: An ill borker brings down the whole cluster References: <4AF08431.2000105@redhat.com> In-Reply-To: <4AF08431.2000105@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.67 on 10.5.11.17 X-Virus-Checked: Checked by ClamAV on apache.org Alan Conway wrote: > On 11/03/2009 06:13 AM, Shan Wang wrote: >> Hi All, >> >> We have two qpid 0.5 brokers running in cluster mode on two different >> boxes. The cluster works fine in normal cases, ie, if broker1 is >> shutdown cleanly, broker2 will keep on serving clients. But today we >> found one broker suddenly lost response to all connected clients and >> admin tools. All producer and consumer clients are still connected >> but failed to consume any messages from the queue. The command line >> admin tool failed with a time out error. The only error message we >> found is in the log of broker 1, which said this: >> >> 2009-oct-31 10:17:49 error 172.27.34.201:9908(READY/error) channel >> error 157487219 on 172.27.34.201:9908-389(local): transport-busy: >> Channel 1 already attached to guest@QPID.amq.failover676a76fa-56 >> 64-4e49-9bee-0538532fe261 (qpid/amqp_0_10/SessionHandler.cpp:150) >> (unresolved: 172.27.34.201:9908 172.27.34.202:13287 ) >> >> After only restarted broker 1, everything starts to work again. So >> surprisingly it seems when one of the brokers in the cluster suffered >> a problem, the whole cluster just stalled, at least from the >> consumer's point of view ( I can't be sure if the producer was >> working during the down time, after back to normal, consumer did >> receive messages sent sometime ago ). Consumer program uses >> FailoverManager and AsyncSession, basically not far from the failover >> example in the qpid developing doc. So can anyone please tell me what >> the above error message means and have we seen similar problems to >> the cluster before? >> > > There have been a number of cluster bugs fixed since 0.5, some of > which had the symptom of a "transport-busy" exception. Can you try a > trunk build and see if you have the same problems? or what distro and version of qpid are you running? Carl. --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscribe@qpid.apache.org