Return-Path: X-Original-To: apmail-qpid-users-archive@www.apache.org Delivered-To: apmail-qpid-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CC86F9BB3 for ; Sat, 21 Jan 2012 20:15:58 +0000 (UTC) Received: (qmail 34504 invoked by uid 500); 21 Jan 2012 20:15:58 -0000 Delivered-To: apmail-qpid-users-archive@qpid.apache.org Received: (qmail 34463 invoked by uid 500); 21 Jan 2012 20:15:58 -0000 Mailing-List: contact users-help@qpid.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@qpid.apache.org Delivered-To: mailing list users@qpid.apache.org Received: (qmail 34455 invoked by uid 99); 21 Jan 2012 20:15:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Jan 2012 20:15:57 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rob.j.godfrey@gmail.com designates 74.125.82.170 as permitted sender) Received: from [74.125.82.170] (HELO mail-we0-f170.google.com) (74.125.82.170) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Jan 2012 20:15:51 +0000 Received: by werp12 with SMTP id p12so1261993wer.15 for ; Sat, 21 Jan 2012 12:15:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=WmhYeT4l9kGCy7DKZgCTJenmEaJukWW637cuUCVV68k=; b=mPIlSXM9cte1tdU75J0AW54xwA9YsfiZ5DlMr6ZfZIJbNyKz2ABcNEnsXmifbMFiFH 9cUBNvYmRMyo41UqWXSd0c4XQ1j/VHzw1JFEE0z0M2UrzhbkArUsErE3lEfxB4Bmq7Xl 1PX213sWhirXc/vGTpumzfaSp/1VD2Q6UESnY= MIME-Version: 1.0 Received: by 10.216.134.36 with SMTP id r36mr1637836wei.40.1327176930070; Sat, 21 Jan 2012 12:15:30 -0800 (PST) Received: by 10.216.29.81 with HTTP; Sat, 21 Jan 2012 12:15:30 -0800 (PST) In-Reply-To: References: Date: Sat, 21 Jan 2012 21:15:30 +0100 Message-ID: Subject: Re: Qpid Java Broker High Availability solution? From: Rob Godfrey To: users@qpid.apache.org Content-Type: multipart/alternative; boundary=0016e6d99c09788fa304b70f75d6 --0016e6d99c09788fa304b70f75d6 Content-Type: text/plain; charset=ISO-8859-1 So... don't get your hopes up too high... but I am going to look at utilising BDB's HA capabilities to implement some sort of Active-Passive HA solution... it looks like it shouldn't be *too* much work at first glance (non master nodes block on startup waiting to be elected master, and then configure themselves from the now-master BDB instance). At best this is going to be a bit of a hobby project for me as it's not something that is strictly necessary for my personal end users, Cheers, Rob On 20 January 2012 17:34, Praveen M wrote: > Ah. okie, got it :) I was wondering if you were using some replication > software that augments BDB that I wasn't aware of. > > A SAN explains your architecture. Thanks a lot for writing back :) > > On Fri, Jan 20, 2012 at 8:29 AM, Rob Godfrey >wrote: > > > On 20 January 2012 17:13, Praveen M wrote: > > > > > Hi Rob, > > > > > > Thanks for writing. Please see inline. > > > > > > On Fri, Jan 20, 2012 at 1:35 AM, Rob Godfrey > > >wrote: > > > > > > > Hi Praveen, > > > > > > > > On 14 January 2012 02:47, Praveen M wrote: > > > > > > > > > Hi, > > > > > > > > > > Are there any java broker high availability/clustering solutions > > that > > > > > are currently present? I tried googling around and didn't find > > anything > > > > to > > > > > my luck. > > > > > > > > > > Can you please suggest a HA strategy that you've used working with > > the > > > > Qpid > > > > > Java Broker? > > > > > > > > > > > > > > So where I work we have two separate strategies for "HA" and disaster > > > > recovery. > > > > > > > > For HA we use synchronous replication of the BDB store, with external > > > > software monitoring the availability of the primary broker machine. > If > > > the > > > > primary broker machine goes down, the external software starts up the > > > > secondary broker machine, which points to the synchronously > replicated > > > > instance of the store... it can also handle reassignment of the IP > > > address > > > > / DNS name. > > > > > > > > > > *Is there a reason that you use an external software to monitor the > > > availability of the primary broker machine.?* > > > *Shouldn't the connection failover model be sufficient enough for this? > > Or > > > does the failover model have any limitations? * > > > * > > > > > > > > The JMS clients failover automatically, the architectural design was not > > driven by limits in the failover model... however the HA solution is not > > focused solely on Qpid and aims to provide a service which is as seamless > > as possible to end user applications > > > > > > > *Also, you mention synchronous replication of BDB. Can you please > write a > > > bit about how you go about doing this? I think with syncCommit false, > > sync > > > replication could be something that could work for us too without > > > really jeopardizing the enqueue latencies.* > > > > > > > > > > > The synchronous replication in our case is done at the "hardware" level. > > The storage attached to the machines provides this replication. > > > > > > > > For DR we take regular snapshots of the BDB store files and ship > these > > > > using an FTP-like mechanism to a DR site. Clearly with this solution > > you > > > > run the risk of loss as you only have a snapshot from a known point > in > > > > time, not from the very moment the system went down. > > > > > > > > *Ah yes, this runs the risk of losing messages. Did you not consider > a > > > synchronous replication in this case too?* > > > > > > > DR sites are necessarily far enough away from primary sites to make > > synchronous replication (at least at the storage level) impractical. > > > > > > > *Or is it because of the distance of the DR site that could contribute > to > > > high latency round trips. Just curious.* > > > > > > > > Exactly. > > > > In general the message broker forms only one part of an application, in a > > DR scenario many different components with their own stores will have to > be > > restarted. At this point the application design needs to be able to > > recover - most importantly applications need to tolerate duplicates cause > > by replaying from a point earlier in time than the point at which failure > > occurred. > > > > > > > In our model our transaction store which contains a copy of the message > > > will be DR'ed. > > > > > > > > > > > I found a Message Federation design proposal document, but I'm > > guessing > > > > > it's not implemented yet (Please correct me if I'm wrong). > > > > > > > > > > > > > > There is an alpha/beta implementation of Message Federation in the > Java > > > > Broker, which follows the same design as that in the C++ broker and > > uses > > > > the same toolset to create routes. This code is broken in the most > > > recent > > > > releases of the Java Broker, but should work "better" from trunk... > > > however > > > > I'm not going to give any guarantees on it's suitability for a > > production > > > > system right now (I hope to be doing some serious testing/fixing over > > the > > > > next couple of months). > > > > > > > > > > > > > I plan to spin off two brokers on two different machines and use a > > > > failover > > > > > connection model to route messages to one if the other goes down. > > This > > > > > works well for message enqueues. > > > > > But still, I'd run the risk of not being able to process the > messages > > > in > > > > > the broker that just went down (until it's back up). It will be > nice > > to > > > > > know if someone had solved a similar problem by other > > > > > strategies/solutions available with the broker. > > > > > > > > > > Also, has someone tried replicating the database used for > > > > > the persistent store to solve this problem (BDB/Derby ?) > > > > > > > > > > > > > > As above, we use replication, but managed by hardware/external > > software. > > > > I've not yet tried using BDB's own HA solutions to provide > replication. > > > > > > > > *well. Is the replication too driven by an external software. I'm > > > curious on how you go about doing a synchronous* > > > *replication with BDB (as this is the route that we might want to > take). > > > Any tips here will be useful. * > > > * > > > * > > > > > > > As above the replication I describe is at the storage level. Essentially > > we're talking about facilities offered by certain Storage Area Network > > products :-) > > > > > > > *If you are allowed to talk about the hardware/external software piece > > I'd > > > love to hear more about your HA* > > > *architecture. (I do understand sometimes NDAs might stop you. If so, > > it's > > > okie).* > > > > > > > > > > > We use a standard commercial High Availability Cluster software for this > > purpose, I'm not really at liberty to say which of these products we use > - > > but I imagine that all are equally functional in this area. > > > > Cheers, > > Rob > > > > > > -- > -Praveen > --0016e6d99c09788fa304b70f75d6--