Mailing-List: contact users-help@qpid.apache.org; run by ezmlm
Precedence: bulk
Reply-To: users@qpid.apache.org
Received-SPF: pass (athena.apache.org: domain of rob.j.godfrey@gmail.com
 designates 74.125.82.170 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAEqX17OyzreHr9NeFvp=W-oyGDyyKaT_xhNNpWcoDDcFKtW1dw@mail.gmail.com>
References: 
 <CAEqX17M8Z_Z+J8gfkyuO-=a8+ZjwwWU8YAK3NOjftiJz7Pczbg@mail.gmail.com>
	<CACsaS95kUWSh+TGH6_fuAotQ55GA2zdaO-1si0vd+w8O+2RbRw@mail.gmail.com>
	<CAEqX17OQCyLWoCvYpgEbdfeum0UCmY7VJd-yViMMGXNFDnj7LQ@mail.gmail.com>
	<CACsaS96B9cKGax3MSZdVRMcKbjzAKK+qGrH-aN6f24+vz5h0rA@mail.gmail.com>
	<CAEqX17OyzreHr9NeFvp=W-oyGDyyKaT_xhNNpWcoDDcFKtW1dw@mail.gmail.com>
Date: Sat, 21 Jan 2012 21:15:30 +0100
Message-ID: 
 <CACsaS97KB5vhC1PwmuW3+SFcGurzRzX+v4rZj8DteM2uUF4sag@mail.gmail.com>
Subject: Re: Qpid Java Broker High Availability solution?
From: Rob Godfrey <rob.j.godfrey@gmail.com>
To: users@qpid.apache.org
Content-Type: multipart/alternative; boundary=0016e6d99c09788fa304b70f75d6

--0016e6d99c09788fa304b70f75d6
Content-Type: text/plain; charset=ISO-8859-1

So... don't get your hopes up too high... but I am going to look at
utilising BDB's HA capabilities to implement some sort of Active-Passive HA
solution... it looks like it shouldn't be *too* much work at first glance
(non master nodes block on startup waiting to be elected master, and then
configure themselves from the now-master BDB instance).

At best this is going to be a bit of a hobby project for me as it's not
something that is strictly necessary for my personal end users,

Cheers,
Rob

On 20 January 2012 17:34, Praveen M <lefthandmagic@gmail.com> wrote:

> Ah. okie, got it :) I was wondering if you were using some replication
> software that augments BDB that I wasn't aware of.
>
> A SAN explains your architecture. Thanks a lot for writing back :)
>
> On Fri, Jan 20, 2012 at 8:29 AM, Rob Godfrey <rob.j.godfrey@gmail.com
> >wrote:
>
> > On 20 January 2012 17:13, Praveen M <lefthandmagic@gmail.com> wrote:
> >
> > > Hi Rob,
> > >
> > > Thanks for writing. Please see inline.
> > >
> > > On Fri, Jan 20, 2012 at 1:35 AM, Rob Godfrey <rob.j.godfrey@gmail.com
> > > >wrote:
> > >
> > > > Hi Praveen,
> > > >
> > > > On 14 January 2012 02:47, Praveen M <lefthandmagic@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > >   Are there any java broker high availability/clustering solutions
> > that
> > > > > are currently present? I tried googling around and didn't find
> > anything
> > > > to
> > > > > my luck.
> > > > >
> > > > > Can you please suggest a HA strategy that you've used working with
> > the
> > > > Qpid
> > > > > Java Broker?
> > > > >
> > > > >
> > > > So where I work we have two separate strategies for "HA" and disaster
> > > > recovery.
> > > >
> > > > For HA we use synchronous replication of the BDB store, with external
> > > > software monitoring the availability of the primary broker machine.
>  If
> > > the
> > > > primary broker machine goes down, the external software starts up the
> > > > secondary broker machine, which points to the synchronously
> replicated
> > > > instance of the store... it can also handle reassignment of the IP
> > > address
> > > > / DNS name.
> > > >
> > >
> > > *Is there a reason that you use an external software to monitor the
> > > availability of the primary broker machine.?*
> > > *Shouldn't the connection failover model be sufficient enough for this?
> > Or
> > > does the failover model have any limitations? *
> > > *
> > >
> > >
> > The JMS clients failover automatically, the architectural design was not
> > driven by limits in the failover model... however the HA solution is not
> > focused solely on Qpid and aims to provide a service which is as seamless
> > as possible to end user applications
> >
> >
> > > *Also, you mention synchronous replication of BDB. Can you please
> write a
> > > bit about how you go about doing this? I think with syncCommit false,
> > sync
> > > replication could be something that could work for us too without
> > > really jeopardizing the enqueue latencies.*
> > >
> > >
> > >
> > The synchronous replication in our case is done at the "hardware" level.
> > The storage attached to the machines provides this replication.
> >
> >
> > > > For DR we take regular snapshots of the BDB store files and ship
> these
> > > > using an FTP-like mechanism to a DR site.  Clearly with this solution
> > you
> > > > run the risk of loss as you only have a snapshot from a known point
> in
> > > > time, not from the very moment the system went down.
> > > >
> > > > *Ah yes, this runs the risk of losing messages. Did you not consider
> a
> > > synchronous replication in this case too?*
> > >
> >
> > DR sites are necessarily far enough away from primary sites to make
> > synchronous replication (at least at the storage level) impractical.
> >
> >
> > > *Or is it because of the distance of the DR site that could contribute
> to
> > > high latency round trips. Just curious.*
> > >
> > >
> > Exactly.
> >
> > In general the message broker forms only one part of an application, in a
> > DR scenario many different components with their own stores will have to
> be
> > restarted.  At this point the application design needs to be able to
> > recover - most importantly applications need to tolerate duplicates cause
> > by replaying from a point earlier in time than the point at which failure
> > occurred.
> >
> >
> > > In our model our transaction store which contains a copy of the message
> > > will be DR'ed.
> > >
> > >
> > > > > I found a Message Federation design proposal document, but I'm
> > guessing
> > > > > it's not implemented yet (Please correct me if I'm wrong).
> > > > >
> > > > >
> > > > There is an alpha/beta implementation of Message Federation in the
> Java
> > > > Broker, which follows the same design as that in the C++ broker and
> > uses
> > > > the same toolset to create routes.  This code is broken in the most
> > > recent
> > > > releases of the Java Broker, but should work "better" from trunk...
> > > however
> > > > I'm not going to give any guarantees on it's suitability for a
> > production
> > > > system right now (I hope to be doing some serious testing/fixing over
> > the
> > > > next couple of months).
> > > >
> > > >
> > > > > I plan to spin off two brokers on two different machines and use a
> > > > failover
> > > > > connection model to route messages to one if the other goes down.
> > This
> > > > > works well for message enqueues.
> > > > > But still, I'd run the risk of not being able to process the
> messages
> > > in
> > > > > the broker that just went down (until it's back up). It will be
> nice
> > to
> > > > > know if someone had solved a similar problem by other
> > > > > strategies/solutions available with the broker.
> > > > >
> > > > > Also, has someone tried replicating the database used for
> > > > > the persistent store to solve this problem (BDB/Derby ?)
> > > > >
> > > > >
> > > > As above, we use replication, but managed by hardware/external
> > software.
> > > > I've not yet tried using BDB's own HA solutions to provide
> replication.
> > > >
> > > > *well. Is the replication  too driven by an external software. I'm
> > > curious on how you go about doing a synchronous*
> > > *replication with BDB (as this is the route that we might want to
> take).
> > > Any tips here will be useful. *
> > > *
> > > *
> > >
> >
> > As above the replication I describe is at the storage level. Essentially
> > we're talking about facilities offered by certain Storage Area Network
> > products :-)
> >
> >
> > > *If you are allowed to talk about the hardware/external software piece
> > I'd
> > > love to hear more about your HA*
> > > *architecture. (I do understand sometimes NDAs might stop you. If so,
> > it's
> > > okie).*
> > >
> > >
> > >
> > We use a standard commercial High Availability Cluster software for this
> > purpose, I'm not really at liberty to say which of these products we use
> -
> > but I imagine that all are equally functional in this area.
> >
> > Cheers,
> > Rob
> >
>
>
>
> --
> -Praveen
>

--0016e6d99c09788fa304b70f75d6--