qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Conway <acon...@redhat.com>
Subject Re: Cluster failing to resurrect durable static route
Date Thu, 06 Jan 2011 20:55:10 GMT
On 12/28/2010 07:00 PM, Mark Moseley wrote:
> Sorry in advance that this is long. I've tried to explain it as
> succinctly but thoroughly as possible.
> I've got a 2-node qpid test cluster at each of 2 datacenters, which
> are federated together with a single durable static route between
> each. Qpid is version 0.8. Corosync and openais are stock Squeeze
> (1.2.1-3 and 1.1.2-2, respectively). OS is Squeeze, 32-bit, on Dell
> Poweredge 1950s, kernel 2.6.36. The static route is durable and is set
> up over SSL.
> This is quite possibly just a conceptual problem with how I'm setting
> this up, so if anyone has a 'right way' to do it, I'm all ears :)
> Just a prelim: Call them cluster A with nodes A1 and A2, and cluster B
> with nodes B1 and B2. The static route is defined as A1->B1 for an
> exchange on cluster B (call it exchangeB), and the other route is
> B1->A1 for an exchange on cluster A (call it exchangeA). After setting
> this up, things seem to work pretty well. I can send from any node in
> cluster A to exchangeB and it's received by any receiving node in
> cluster B. Running "qpid-config ... exchanges --bindings" on cluster A
> nodes show the route to cluster B for exchangeB and vice versa. That
> seems to be good.
> The trouble I'm having regards failover. I'm finding that if I fail
> the cluster in the order where the node with the route on it lives:
> * Kill A1, kill A2, start A2, start A1  ->  The bindings on cluster B
> for exchangeA get set back up automatically
> Also, after I kill A1, the route seems to fail over correctly to A2,
> i.e. with A1 dead and A2 still alive, looking at qpid-route on B1 or
> B2 says:
> Exchange 'exchangeA' (direct)
>      bind [mytopic] =>  bridge_queue_1_f6d80145-67d2-4659-b26e-80c4da3ae85b
> If I stop the cluster in this order:
> * Kill A2, kill A1, start A1, start A2  ->  The bindings on cluster B
> for exchangeA don't get set up, i.e. on B1 or B2, qpid-route says:
> Exchange 'exchangeA' (direct)
> Am I doing something wrong or is this a known limitation? I'd expect
> that regardless of ordering, a durable route would come back up on its
> own, on either node. I'd also think that if it was a limitation, it'd
> happen in the other order, when A2 was the last node standing,
> considering the route was created for A1.

I think you have uncovered a bug, can you create a JIRA for it and assign it to 
me  initially? Detailed instructions on how to reproduce are greatly appreciated.

> I had tried earlier to use source routes for my routing and they
> seemed to do better at coming back after failover but on the source
> clusters' side, the non-primary node (A2) would often blow up when
> cluster B was down and a node in cluster B came back online, always
> saying this in A2's qpid logs ( is A1, is A2):
> 2010-12-28 17:19:37 info ACL Allow id:walclust@QPID action:create
> ObjectType:link Name:
> 2010-12-28 17:19:37 info Connection is a federation link
> 2010-12-28 17:19:39 error Channel exception: not-attached: Channel 1
> is not attached (qpid/amqp_0_10/SessionHandler.cpp:39)
> 2010-12-28 17:19:39 critical cluster( READY/error) local
> error 3054 did not occur on member not-attached:
> Channel 1 is not)
> 2010-12-28 17:19:39 critical Error delivering frames: local error did
> not occur on all cluster members : not-attached: Channel 1 is not
> attached (qpid/a)
> 2010-12-28 17:19:39 notice cluster( LEFT/error) leaving
> cluster walclust
> 2010-12-28 17:19:39 notice Shut down

This also sounds like a bug, can you create a separate JIRA for it? Assign to me 
as well.

Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org

View raw message