From users-return-2113-apmail-qpid-users-archive=qpid.apache.org@qpid.apache.org Tue Nov 03 19:27:44 2009 Return-Path: Delivered-To: apmail-qpid-users-archive@www.apache.org Received: (qmail 81188 invoked from network); 3 Nov 2009 19:27:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Nov 2009 19:27:44 -0000 Received: (qmail 689 invoked by uid 500); 3 Nov 2009 19:27:44 -0000 Delivered-To: apmail-qpid-users-archive@qpid.apache.org Received: (qmail 572 invoked by uid 500); 3 Nov 2009 19:27:43 -0000 Mailing-List: contact users-help@qpid.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@qpid.apache.org Delivered-To: mailing list users@qpid.apache.org Received: (qmail 553 invoked by uid 99); 3 Nov 2009 19:27:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Nov 2009 19:27:43 +0000 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of aconway@redhat.com designates 209.132.183.28 as permitted sender) Received: from [209.132.183.28] (HELO mx1.redhat.com) (209.132.183.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Nov 2009 19:27:35 +0000 Received: from int-mx03.intmail.prod.int.phx2.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.16]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id nA3JRD69019169; Tue, 3 Nov 2009 14:27:13 -0500 Received: from [10.11.11.175] (vpn-11-175.rdu.redhat.com [10.11.11.175]) by int-mx03.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id nA3JRCLV021122; Tue, 3 Nov 2009 14:27:12 -0500 Message-ID: <4AF08431.2000105@redhat.com> Date: Tue, 03 Nov 2009 14:27:45 -0500 From: Alan Conway Organization: Red Hat User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.4pre) Gecko/20091014 Fedora/3.0-2.8.b4.fc11 Thunderbird/3.0b4 MIME-Version: 1.0 To: dev@qpid.apache.org CC: "users@qpid.apache.org" Subject: Re: An ill borker brings down the whole cluster References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.67 on 10.5.11.16 X-Virus-Checked: Checked by ClamAV on apache.org On 11/03/2009 06:13 AM, Shan Wang wrote: > Hi All, > > We have two qpid 0.5 brokers running in cluster mode on two different boxes. The cluster works fine in normal cases, ie, if broker1 is shutdown cleanly, broker2 will keep on serving clients. But today we found one broker suddenly lost response to all connected clients and admin tools. All producer and consumer clients are still connected but failed to consume any messages from the queue. The command line admin tool failed with a time out error. The only error message we found is in the log of broker 1, which said this: > > 2009-oct-31 10:17:49 error 172.27.34.201:9908(READY/error) channel error 157487219 on 172.27.34.201:9908-389(local): transport-busy: Channel 1 already attached to guest@QPID.amq.failover676a76fa-56 > 64-4e49-9bee-0538532fe261 (qpid/amqp_0_10/SessionHandler.cpp:150) (unresolved: 172.27.34.201:9908 172.27.34.202:13287 ) > > After only restarted broker 1, everything starts to work again. So surprisingly it seems when one of the brokers in the cluster suffered a problem, the whole cluster just stalled, at least from the consumer's point of view ( I can't be sure if the producer was working during the down time, after back to normal, consumer did receive messages sent sometime ago ). Consumer program uses FailoverManager and AsyncSession, basically not far from the failover example in the qpid developing doc. So can anyone please tell me what the above error message means and have we seen similar problems to the cluster before? > There have been a number of cluster bugs fixed since 0.5, some of which had the symptom of a "transport-busy" exception. Can you try a trunk build and see if you have the same problems? --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:users-subscribe@qpid.apache.org