Return-Path: Delivered-To: apmail-qpid-users-archive@www.apache.org Received: (qmail 31560 invoked from network); 3 Nov 2009 21:42:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Nov 2009 21:42:17 -0000 Received: (qmail 77413 invoked by uid 500); 3 Nov 2009 21:42:17 -0000 Delivered-To: apmail-qpid-users-archive@qpid.apache.org Received: (qmail 77378 invoked by uid 500); 3 Nov 2009 21:42:17 -0000 Mailing-List: contact users-help@qpid.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@qpid.apache.org Delivered-To: mailing list users@qpid.apache.org Received: (qmail 77359 invoked by uid 99); 3 Nov 2009 21:42:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Nov 2009 21:42:14 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00 X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [207.126.144.137] (HELO eu1sys200aog114.obsmtp.com) (207.126.144.137) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 03 Nov 2009 21:42:12 +0000 Received: from source ([193.30.41.134]) by eu1sys200aob114.postini.com ([207.126.147.11]) with SMTP ID DSNKSvCjmeinDFP8YH6IQFGSg1igYX4GNXVZ@postini.com; Tue, 03 Nov 2009 21:41:48 UTC Received: from BMPRDEXC203.igi.ig.local (lmprdexc-vip203 [172.24.11.195]) by bsprdinf008.iggroup.local (8.13.8+Sun/8.12.10) with ESMTP id nA3LfibJ017634; Tue, 3 Nov 2009 21:41:45 GMT Received: from PRDEXC101.igi.ig.local ([fe80::b86f:4c56:54b7:71a]) by BMPRDEXC203.igi.ig.local ([fe80::9400:8571:239b:1a53%15]) with mapi; Tue, 3 Nov 2009 21:41:44 +0000 From: Shan Wang To: "dev@qpid.apache.org" , "cctrieloff@redhat.com" CC: "users@qpid.apache.org" Date: Tue, 3 Nov 2009 21:41:42 +0000 Subject: RE: An ill borker brings down the whole cluster Thread-Topic: An ill borker brings down the whole cluster Thread-Index: AcpcwqCAuoFnJMD1R0iQoSVSFY40NQACopYw Message-ID: References: <4AF08431.2000105@redhat.com> <4AF08F93.1030402@redhat.com> In-Reply-To: <4AF08F93.1030402@redhat.com> Accept-Language: en-US, en-GB Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, en-GB x-ig-disclaimer: IG-Disclaimer-Set Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Client side we are still using 0.4, I'm not sure about the exact version, s= hould be last version before 0.5. Cluster side we are using 0.5.752581-26.el5. Unfortunately I haven't got the environment to build qpid myself so I can't= use latest trunk. -----Original Message----- From: Carl Trieloff [mailto:cctrieloff@redhat.com] Sent: 03 November 2009 20:16 To: dev@qpid.apache.org Cc: users@qpid.apache.org Subject: Re: An ill borker brings down the whole cluster Alan Conway wrote: > On 11/03/2009 06:13 AM, Shan Wang wrote: >> Hi All, >> >> We have two qpid 0.5 brokers running in cluster mode on two different >> boxes. The cluster works fine in normal cases, ie, if broker1 is >> shutdown cleanly, broker2 will keep on serving clients. But today we >> found one broker suddenly lost response to all connected clients and >> admin tools. All producer and consumer clients are still connected >> but failed to consume any messages from the queue. The command line >> admin tool failed with a time out error. The only error message we >> found is in the log of broker 1, which said this: >> >> 2009-oct-31 10:17:49 error 172.27.34.201:9908(READY/error) channel >> error 157487219 on 172.27.34.201:9908-389(local): transport-busy: >> Channel 1 already attached to guest@QPID.amq.failover676a76fa-56 >> 64-4e49-9bee-0538532fe261 (qpid/amqp_0_10/SessionHandler.cpp:150) >> (unresolved: 172.27.34.201:9908 172.27.34.202:13287 ) >> >> After only restarted broker 1, everything starts to work again. So >> surprisingly it seems when one of the brokers in the cluster suffered >> a problem, the whole cluster just stalled, at least from the >> consumer's point of view ( I can't be sure if the producer was >> working during the down time, after back to normal, consumer did >> receive messages sent sometime ago ). Consumer program uses >> FailoverManager and AsyncSession, basically not far from the failover >> example in the qpid developing doc. So can anyone please tell me what >> the above error message means and have we seen similar problems to >> the cluster before? >> > > There have been a number of cluster bugs fixed since 0.5, some of > which had the symptom of a "transport-busy" exception. Can you try a > trunk build and see if you have the same problems? or what distro and version of qpid are you running? Carl. --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscribe@qpid.apache.org The information contained in this email is strictly confidential and for th= e use of the addressee only, unless otherwise indicated. If you are not the= intended recipient, please do not read, copy, use or disclose to others th= is message or any attachment. Please also notify the sender by replying to = this email or by telephone (+44 (0)20 7896 0011) and then delete the email = and any copies of it. Opinions, conclusions (etc.) that do not relate to th= e official business of this company shall be understood as neither given no= r endorsed by it. IG Index Ltd is a company registered in England and Wales= under number 01190902. VAT registration number 761 2978 07. Registered Off= ice: Friars House, 157-168 Blackfriars Road, London SE1 8EZ. Authorised and= regulated by the Financial Services Authority. FSA Register number 114059. --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:users-subscribe@qpid.apache.org