qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Godfrey <rob.j.godf...@gmail.com>
Subject Re: Qpid Java Broker High Availability solution?
Date Fri, 20 Jan 2012 16:29:18 GMT
On 20 January 2012 17:13, Praveen M <lefthandmagic@gmail.com> wrote:

> Hi Rob,
>
> Thanks for writing. Please see inline.
>
> On Fri, Jan 20, 2012 at 1:35 AM, Rob Godfrey <rob.j.godfrey@gmail.com
> >wrote:
>
> > Hi Praveen,
> >
> > On 14 January 2012 02:47, Praveen M <lefthandmagic@gmail.com> wrote:
> >
> > > Hi,
> > >
> > >   Are there any java broker high availability/clustering solutions that
> > > are currently present? I tried googling around and didn't find anything
> > to
> > > my luck.
> > >
> > > Can you please suggest a HA strategy that you've used working with the
> > Qpid
> > > Java Broker?
> > >
> > >
> > So where I work we have two separate strategies for "HA" and disaster
> > recovery.
> >
> > For HA we use synchronous replication of the BDB store, with external
> > software monitoring the availability of the primary broker machine.  If
> the
> > primary broker machine goes down, the external software starts up the
> > secondary broker machine, which points to the synchronously replicated
> > instance of the store... it can also handle reassignment of the IP
> address
> > / DNS name.
> >
>
> *Is there a reason that you use an external software to monitor the
> availability of the primary broker machine.?*
> *Shouldn't the connection failover model be sufficient enough for this? Or
> does the failover model have any limitations? *
> *
>
>
The JMS clients failover automatically, the architectural design was not
driven by limits in the failover model... however the HA solution is not
focused solely on Qpid and aims to provide a service which is as seamless
as possible to end user applications


> *Also, you mention synchronous replication of BDB. Can you please write a
> bit about how you go about doing this? I think with syncCommit false, sync
> replication could be something that could work for us too without
> really jeopardizing the enqueue latencies.*
>
>
>
The synchronous replication in our case is done at the "hardware" level.
The storage attached to the machines provides this replication.


> > For DR we take regular snapshots of the BDB store files and ship these
> > using an FTP-like mechanism to a DR site.  Clearly with this solution you
> > run the risk of loss as you only have a snapshot from a known point in
> > time, not from the very moment the system went down.
> >
> > *Ah yes, this runs the risk of losing messages. Did you not consider a
> synchronous replication in this case too?*
>

DR sites are necessarily far enough away from primary sites to make
synchronous replication (at least at the storage level) impractical.


> *Or is it because of the distance of the DR site that could contribute to
> high latency round trips. Just curious.*
>
>
Exactly.

In general the message broker forms only one part of an application, in a
DR scenario many different components with their own stores will have to be
restarted.  At this point the application design needs to be able to
recover - most importantly applications need to tolerate duplicates cause
by replaying from a point earlier in time than the point at which failure
occurred.


> In our model our transaction store which contains a copy of the message
> will be DR'ed.
>
>
> > > I found a Message Federation design proposal document, but I'm guessing
> > > it's not implemented yet (Please correct me if I'm wrong).
> > >
> > >
> > There is an alpha/beta implementation of Message Federation in the Java
> > Broker, which follows the same design as that in the C++ broker and uses
> > the same toolset to create routes.  This code is broken in the most
> recent
> > releases of the Java Broker, but should work "better" from trunk...
> however
> > I'm not going to give any guarantees on it's suitability for a production
> > system right now (I hope to be doing some serious testing/fixing over the
> > next couple of months).
> >
> >
> > > I plan to spin off two brokers on two different machines and use a
> > failover
> > > connection model to route messages to one if the other goes down. This
> > > works well for message enqueues.
> > > But still, I'd run the risk of not being able to process the messages
> in
> > > the broker that just went down (until it's back up). It will be nice to
> > > know if someone had solved a similar problem by other
> > > strategies/solutions available with the broker.
> > >
> > > Also, has someone tried replicating the database used for
> > > the persistent store to solve this problem (BDB/Derby ?)
> > >
> > >
> > As above, we use replication, but managed by hardware/external software.
> > I've not yet tried using BDB's own HA solutions to provide replication.
> >
> > *well. Is the replication  too driven by an external software. I'm
> curious on how you go about doing a synchronous*
> *replication with BDB (as this is the route that we might want to take).
> Any tips here will be useful. *
> *
> *
>

As above the replication I describe is at the storage level. Essentially
we're talking about facilities offered by certain Storage Area Network
products :-)


> *If you are allowed to talk about the hardware/external software piece I'd
> love to hear more about your HA*
> *architecture. (I do understand sometimes NDAs might stop you. If so, it's
> okie).*
>
>
>
We use a standard commercial High Availability Cluster software for this
purpose, I'm not really at liberty to say which of these products we use -
but I imagine that all are equally functional in this area.

Cheers,
Rob

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message