From users-return-2132-apmail-qpid-users-archive=qpid.apache.org@qpid.apache.org Wed Nov 04 16:19:12 2009 Return-Path: Delivered-To: apmail-qpid-users-archive@www.apache.org Received: (qmail 88268 invoked from network); 4 Nov 2009 16:19:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Nov 2009 16:19:11 -0000 Received: (qmail 37279 invoked by uid 500); 4 Nov 2009 16:19:11 -0000 Delivered-To: apmail-qpid-users-archive@qpid.apache.org Received: (qmail 37210 invoked by uid 500); 4 Nov 2009 16:19:11 -0000 Mailing-List: contact users-help@qpid.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@qpid.apache.org Delivered-To: mailing list users@qpid.apache.org Received: (qmail 37190 invoked by uid 99); 4 Nov 2009 16:19:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Nov 2009 16:19:11 +0000 X-ASF-Spam-Status: No, hits=-5.6 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE,RCVD_IN_DNSWL_MED X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cctrieloff@redhat.com designates 209.132.183.28 as permitted sender) Received: from [209.132.183.28] (HELO mx1.redhat.com) (209.132.183.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Nov 2009 16:19:08 +0000 Received: from int-mx03.intmail.prod.int.phx2.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.16]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id nA4GIimL004105; Wed, 4 Nov 2009 11:18:44 -0500 Received: from localhost.localdomain (dhcp-100-19-90.bos.redhat.com [10.16.19.90]) by int-mx03.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id nA4GIgKN027558; Wed, 4 Nov 2009 11:18:43 -0500 Message-ID: <4AF1A950.6050503@redhat.com> Date: Wed, 04 Nov 2009 11:18:24 -0500 From: Carl Trieloff Reply-To: cctrieloff@redhat.com Organization: Red Hat User-Agent: Thunderbird 2.0.0.23 (X11/20090825) MIME-Version: 1.0 To: Shan Wang CC: "dev@qpid.apache.org" , "users@qpid.apache.org" Subject: Re: An ill borker brings down the whole cluster References: <4AF08431.2000105@redhat.com> <4AF08F93.1030402@redhat.com> <4AF19C62.6070801@redhat.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------030302070800080802080508" X-Scanned-By: MIMEDefang 2.67 on 10.5.11.16 --------------030302070800080802080508 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit yes. Shan Wang wrote: > We need to run the clients on redhat4 machines, does redhat provide prebuilt qpid client libs for redhat4, the only ones I have in hand are built for redhat5. > > > > > -----Original Message----- > From: Carl Trieloff [mailto:cctrieloff@redhat.com] > Sent: 04 November 2009 15:23 > To: dev@qpid.apache.org > Cc: users@qpid.apache.org > Subject: Re: An ill borker brings down the whole cluster > > > Are you able to use the matching client for the broker - just to rule > that out? i.e. make sure we are not chasing something that is fixed, or > version mismatch related. > > Carl. > > Shan Wang wrote: > >> Client side we are still using 0.4, I'm not sure about the exact version, should be last version before 0.5. >> Cluster side we are using 0.5.752581-26.el5. >> >> Unfortunately I haven't got the environment to build qpid myself so I can't use latest trunk. >> >> -----Original Message----- >> From: Carl Trieloff [mailto:cctrieloff@redhat.com] >> Sent: 03 November 2009 20:16 >> To: dev@qpid.apache.org >> Cc: users@qpid.apache.org >> Subject: Re: An ill borker brings down the whole cluster >> >> Alan Conway wrote: >> >> >>> On 11/03/2009 06:13 AM, Shan Wang wrote: >>> >>> >>>> Hi All, >>>> >>>> We have two qpid 0.5 brokers running in cluster mode on two different >>>> boxes. The cluster works fine in normal cases, ie, if broker1 is >>>> shutdown cleanly, broker2 will keep on serving clients. But today we >>>> found one broker suddenly lost response to all connected clients and >>>> admin tools. All producer and consumer clients are still connected >>>> but failed to consume any messages from the queue. The command line >>>> admin tool failed with a time out error. The only error message we >>>> found is in the log of broker 1, which said this: >>>> >>>> 2009-oct-31 10:17:49 error 172.27.34.201:9908(READY/error) channel >>>> error 157487219 on 172.27.34.201:9908-389(local): transport-busy: >>>> Channel 1 already attached to guest@QPID.amq.failover676a76fa-56 >>>> 64-4e49-9bee-0538532fe261 (qpid/amqp_0_10/SessionHandler.cpp:150) >>>> (unresolved: 172.27.34.201:9908 172.27.34.202:13287 ) >>>> >>>> After only restarted broker 1, everything starts to work again. So >>>> surprisingly it seems when one of the brokers in the cluster suffered >>>> a problem, the whole cluster just stalled, at least from the >>>> consumer's point of view ( I can't be sure if the producer was >>>> working during the down time, after back to normal, consumer did >>>> receive messages sent sometime ago ). Consumer program uses >>>> FailoverManager and AsyncSession, basically not far from the failover >>>> example in the qpid developing doc. So can anyone please tell me what >>>> the above error message means and have we seen similar problems to >>>> the cluster before? >>>> >>>> >>>> >>> There have been a number of cluster bugs fixed since 0.5, some of >>> which had the symptom of a "transport-busy" exception. Can you try a >>> trunk build and see if you have the same problems? >>> >>> >> or what distro and version of qpid are you running? >> >> Carl. >> >> --------------------------------------------------------------------- >> Apache Qpid - AMQP Messaging Implementation >> Project: http://qpid.apache.org >> Use/Interact: mailto:dev-subscribe@qpid.apache.org >> >> >> The information contained in this email is strictly confidential and for the use of the addressee only, unless otherwise indicated. If you are not the intended recipient, please do not read, copy, use or disclose to others this message or any attachment. Please also notify the sender by replying to this email or by telephone (+44 (0)20 7896 0011) and then delete the email and any copies of it. Opinions, conclusions (etc.) that do not relate to the official business of this company shall be understood as neither given nor endorsed by it. IG Index Ltd is a company registered in England and Wales under number 01190902. VAT registration number 761 2978 07. Registered Office: Friars House, 157-168 Blackfriars Road, London SE1 8EZ. Authorised and regulated by the Financial Services Authority. FSA Register number 114059. >> >> --------------------------------------------------------------------- >> Apache Qpid - AMQP Messaging Implementation >> Project: http://qpid.apache.org >> Use/Interact: mailto:dev-subscribe@qpid.apache.org >> >> >> > > > The information contained in this email is strictly confidential and for the use of the addressee only, unless otherwise indicated. If you are not the intended recipient, please do not read, copy, use or disclose to others this message or any attachment. Please also notify the sender by replying to this email or by telephone (+44 (0)20 7896 0011) and then delete the email and any copies of it. Opinions, conclusions (etc.) that do not relate to the official business of this company shall be understood as neither given nor endorsed by it. IG Index Ltd is a company registered in England and Wales under number 01190902. VAT registration number 761 2978 07. Registered Office: Friars House, 157-168 Blackfriars Road, London SE1 8EZ. Authorised and regulated by the Financial Services Authority. FSA Register number 114059. > --------------030302070800080802080508--