Mailing-List: contact users-help@qpid.apache.org; run by ezmlm
Precedence: bulk
Reply-To: users@qpid.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
From: Shan Wang <Shan.Wang@igindex.co.uk>
To: "dev@qpid.apache.org" <dev@qpid.apache.org>,
        "cctrieloff@redhat.com"
	<cctrieloff@redhat.com>
CC: "users@qpid.apache.org" <users@qpid.apache.org>
Date: Tue, 3 Nov 2009 21:41:42 +0000
Subject: RE: An ill borker brings down the whole cluster
Thread-Topic: An ill borker brings down the whole cluster
Thread-Index: AcpcwqCAuoFnJMD1R0iQoSVSFY40NQACopYw
Message-ID: 
 <C190ADE085279E4AAA53D80AA839E373096ED28399@PRDEXC101.igi.ig.local>
References: 
 <C190ADE085279E4AAA53D80AA839E373096ED282B8@PRDEXC101.igi.ig.local>
 <4AF08431.2000105@redhat.com> <4AF08F93.1030402@redhat.com>
In-Reply-To: <4AF08F93.1030402@redhat.com>
Accept-Language: en-US, en-GB
Content-Language: en-US
acceptlanguage: en-US, en-GB
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Client side we are still using 0.4, I'm not sure about the exact version, s=
hould be last version before 0.5.
Cluster side we are using 0.5.752581-26.el5.

Unfortunately I haven't got the environment to build qpid myself so I can't=
 use latest trunk.

-----Original Message-----
From: Carl Trieloff [mailto:cctrieloff@redhat.com]
Sent: 03 November 2009 20:16
To: dev@qpid.apache.org
Cc: users@qpid.apache.org
Subject: Re: An ill borker brings down the whole cluster

Alan Conway wrote:
> On 11/03/2009 06:13 AM, Shan Wang wrote:
>> Hi All,
>>
>> We have two qpid 0.5 brokers running in cluster mode on two different
>> boxes. The cluster works fine in normal cases, ie, if broker1 is
>> shutdown cleanly, broker2 will keep on serving clients. But today we
>> found one broker suddenly lost response to all connected clients and
>> admin tools. All producer and consumer clients are still connected
>> but failed to consume any messages from the queue. The command line
>> admin tool failed with a time out error. The only error message we
>> found is in the log of broker 1, which said this:
>>
>> 2009-oct-31 10:17:49 error 172.27.34.201:9908(READY/error) channel
>> error 157487219 on 172.27.34.201:9908-389(local): transport-busy:
>> Channel 1 already attached to guest@QPID.amq.failover676a76fa-56
>> 64-4e49-9bee-0538532fe261 (qpid/amqp_0_10/SessionHandler.cpp:150)
>> (unresolved: 172.27.34.201:9908 172.27.34.202:13287 )
>>
>> After only restarted broker 1, everything starts to work again. So
>> surprisingly it seems when one of the brokers in the cluster suffered
>> a problem, the whole cluster just stalled, at least from the
>> consumer's point of view ( I can't be sure if the producer was
>> working during the down time, after back to normal, consumer did
>> receive messages sent sometime ago ). Consumer program uses
>> FailoverManager and AsyncSession, basically not far from the failover
>> example in the qpid developing doc. So can anyone please tell me what
>> the above error message means and have we seen similar problems to
>> the cluster before?
>>
>
> There have been a number of cluster bugs fixed since 0.5, some of
> which had the symptom of a "transport-busy" exception. Can you try a
> trunk build and see if you have the same problems?

or what distro and version of qpid are you running?

Carl.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


The information contained in this email is strictly confidential and for th=
e use of the addressee only, unless otherwise indicated. If you are not the=
 intended recipient, please do not read, copy, use or disclose to others th=
is message or any attachment. Please also notify the sender by replying to =
this email or by telephone (+44 (0)20 7896 0011) and then delete the email =
and any copies of it. Opinions, conclusions (etc.) that do not relate to th=
e official business of this company shall be understood as neither given no=
r endorsed by it. IG Index Ltd is a company registered in England and Wales=
 under number 01190902. VAT registration number 761 2978 07. Registered Off=
ice: Friars House, 157-168 Blackfriars Road, London SE1 8EZ. Authorised and=
 regulated by the Financial Services Authority. FSA Register number 114059.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org