qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Conway <acon...@redhat.com>
Subject Re: An ill borker brings down the whole cluster
Date Tue, 03 Nov 2009 19:27:45 GMT
On 11/03/2009 06:13 AM, Shan Wang wrote:
> Hi All,
> We have two qpid 0.5 brokers running in cluster mode on two different boxes. The cluster
works fine in normal cases, ie, if broker1 is shutdown cleanly, broker2 will keep on serving
clients. But today we found one broker suddenly lost response to all connected clients and
admin tools. All producer and consumer clients are still connected but failed to consume any
messages from the queue. The command line admin tool failed with a time out error. The only
error message we found is in the log of broker 1, which said this:
> 2009-oct-31 10:17:49 error channel error 157487219 on transport-busy: Channel 1 already attached to guest@QPID.amq.failover676a76fa-56
> 64-4e49-9bee-0538532fe261 (qpid/amqp_0_10/SessionHandler.cpp:150) (unresolved: )
> After only restarted broker 1, everything starts to work again. So surprisingly it seems
when one of the brokers in the cluster suffered a problem, the whole cluster just stalled,
at least from the consumer's point of view ( I can't be sure if the producer was working during
the down time, after back to normal, consumer did receive messages sent sometime ago ). Consumer
program uses FailoverManager and AsyncSession, basically not far from the failover example
in the qpid developing doc. So can anyone please tell me what the above error message means
and have we seen similar problems to the cluster before?

There have been a number of cluster bugs fixed since 0.5, some of which had the 
symptom of a "transport-busy" exception. Can you try a trunk build and see if 
you have the same problems?

Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org

View raw message