qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Moravec <pmora...@redhat.com>
Subject Re: 0.14 cluster never survives more than an hour or so.
Date Thu, 12 Apr 2012 11:39:04 GMT
Hi Paul,
this usually happens as a consequence of cluster split-brain. Are you using CMAN (Cluster
Manager)?

(Technically, when split brain occurs, two (or more) qpid brokers think they are the elder
nodes (elder node = "the managing" node, usually the node that is oldest in the cluster).
But there can be just one elder node in a cluster, as the elder node periodically invokes
periodicProcessing task cluster-wide that can run just one at a time. When more elder nodes
are present, all invokes the task on every cluster member, causing more tasks to be executed
- that is prevented by broker shutdown.)

Kind regards,
Pavel Moravec


----- Original Message -----
> From: "Paul Colby" <paul@colby.id.au>
> To: users@qpid.apache.org
> Sent: Thursday, April 12, 2012 5:08:01 AM
> Subject: 0.14 cluster never survives more than an hour or so.
> 
> Hi guys,
> 
> I'm having an issue with my new 0.14 cluster, where the same
> configuration
> was fine with 0.12.
> 
> The cluster starts up, and all brokers are happy.  Then, with no
> client
> activity at all, after some seemingly random amount time (usually
> around 30
> minutes to an hour) all brokers in the cluster (three, in this case)
> report
> the following error:
> 
> critical Error delivering frames: Cluster timer drop non-existent
> task
> ManagementAgent::periodicProcessing
> (qpid/cluster/ClusterTimer.cpp:128)
> 
> Then they all shutdown, leaving their respective stores dirty :(
> 
> Any ideas what might be going wrong here?
> 
> Thanks,
> 
> pc
> ----
> http://colby.id.au
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org


Mime
View raw message