qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Colby <p...@colby.id.au>
Subject Re: 0.14 cluster never survives more than an hour or so.
Date Fri, 13 Apr 2012 09:02:14 GMT
Alas the patch at  https://issues.apache.org/jira/browse/QPID-3369  has not
fixed the issue.

Interestingly though, it did move the error to a different line, but with a
very similar message. eg

Apr 13 17:04:17 gateway02 qpidd[32258]: 2012-04-13 17:04:17 critical Error
delivering frames: Cluster timer wakeup non-existent task
ManagementAgent::periodicProcessing (qpid/cluster/ClusterTimer.cpp:112)

So it's moved from  ClusterTimer::deliverDrop
to ClusterTimer::deliverWakeup instead... but with the same effectual
result.

pc
----
http://colby.id.au


On Fri, Apr 13, 2012 at 9:30 AM, Paul Colby <paul@colby.id.au> wrote:

> Thanks Pavel and Gordon, I really appreciate you guys getting back to me
> so quickly :)
>
> I'm not currently using cman.  I hadn't been using it on 0.12 either.  I
> suspect that split-brain is not the case, since the test cluster in
> question on on virtual machines all within a single host, with *very*
> reliable virtual networking between them.  After reading your response, I
> did have a quick look at setting up cman to verify either way, but that's
> not proving to be quick and easy, so I'll come back to it shortly.
>
> The https://issues.apache.org/jira/browse/QPID-3369 issue does look
> interesting.  I'll apply the patch suggested there and see what difference
> it makes.
>
> Thanks again.  I'll let you know how it goes :)
>
> pc
> ----
> http://colby.id.au
>
>
>
> On Thu, Apr 12, 2012 at 9:39 PM, Pavel Moravec <pmoravec@redhat.com>wrote:
>
>> Hi Paul,
>> this usually happens as a consequence of cluster split-brain. Are you
>> using CMAN (Cluster Manager)?
>>
>> (Technically, when split brain occurs, two (or more) qpid brokers think
>> they are the elder nodes (elder node = "the managing" node, usually the
>> node that is oldest in the cluster). But there can be just one elder node
>> in a cluster, as the elder node periodically invokes periodicProcessing
>> task cluster-wide that can run just one at a time. When more elder nodes
>> are present, all invokes the task on every cluster member, causing more
>> tasks to be executed - that is prevented by broker shutdown.)
>>
>> Kind regards,
>> Pavel Moravec
>>
>>
>> ----- Original Message -----
>> > From: "Paul Colby" <paul@colby.id.au>
>> > To: users@qpid.apache.org
>> > Sent: Thursday, April 12, 2012 5:08:01 AM
>> > Subject: 0.14 cluster never survives more than an hour or so.
>> >
>> > Hi guys,
>> >
>> > I'm having an issue with my new 0.14 cluster, where the same
>> > configuration
>> > was fine with 0.12.
>> >
>> > The cluster starts up, and all brokers are happy.  Then, with no
>> > client
>> > activity at all, after some seemingly random amount time (usually
>> > around 30
>> > minutes to an hour) all brokers in the cluster (three, in this case)
>> > report
>> > the following error:
>> >
>> > critical Error delivering frames: Cluster timer drop non-existent
>> > task
>> > ManagementAgent::periodicProcessing
>> > (qpid/cluster/ClusterTimer.cpp:128)
>> >
>> > Then they all shutdown, leaving their respective stores dirty :(
>> >
>> > Any ideas what might be going wrong here?
>> >
>> > Thanks,
>> >
>> > pc
>> > ----
>> > http://colby.id.au
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message