Return-Path: Delivered-To: apmail-geronimo-activemq-dev-archive@www.apache.org Received: (qmail 8211 invoked from network); 8 Mar 2006 20:43:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 8 Mar 2006 20:43:15 -0000 Received: (qmail 64983 invoked by uid 500); 8 Mar 2006 20:43:14 -0000 Delivered-To: apmail-geronimo-activemq-dev-archive@geronimo.apache.org Received: (qmail 64951 invoked by uid 500); 8 Mar 2006 20:43:14 -0000 Mailing-List: contact activemq-dev-help@geronimo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: activemq-dev@geronimo.apache.org Delivered-To: mailing list activemq-dev@geronimo.apache.org Received: (qmail 64942 invoked by uid 99); 8 Mar 2006 20:43:14 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Mar 2006 12:43:14 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of rajdavies@gmail.com designates 64.233.182.192 as permitted sender) Received: from [64.233.182.192] (HELO nproxy.gmail.com) (64.233.182.192) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Mar 2006 12:43:13 -0800 Received: by nproxy.gmail.com with SMTP id y38so218639nfb for ; Wed, 08 Mar 2006 12:42:52 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:mime-version:in-reply-to:references:content-type:message-id:content-transfer-encoding:from:subject:date:to:x-mailer; b=EseaHgF2k6bKBWfZtcyYn+xdI7yDQZQMCn0EtioEqoz/rCyU8kg5VPILN/ZAI3F93/gDZtjwt16Vz8u/JMLqtJuaYevlDsNQh+IY7Nj6UPLyKvCda5kIEkjRKwqJKZquvOKsJEvUh3GC/pL6h0rOxZCE5mVRIS/lHteupy4otiE= Received: by 10.49.67.12 with SMTP id u12mr568029nfk; Wed, 08 Mar 2006 12:42:52 -0800 (PST) Received: from ?192.168.15.104? ( [86.134.101.71]) by mx.gmail.com with ESMTP id p72sm1025268nfc.2006.03.08.12.42.51; Wed, 08 Mar 2006 12:42:51 -0800 (PST) Mime-Version: 1.0 (Apple Message framework v746.2) In-Reply-To: References: <4CAF5EC6A35347429158CE18A07247F309896CB6@exch-van01.intl.businessobjects.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <4C8B19C8-89DF-4478-9F92-433EB34E4372@gmail.com> Content-Transfer-Encoding: 7bit From: Rob Davies Subject: Re: improve master/slave topology Date: Wed, 8 Mar 2006 20:42:50 +0000 To: activemq-dev@geronimo.apache.org X-Mailer: Apple Mail (2.746.2) X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Is not that simple - while the slave is syncing, there are also running clients that are acknowledging messages (and hence they get deleted). We could record all the message exchanges (adds/deletes/new durable subscribers/delete subscribers etc. etc.) - but is it really likely that the slave will ever catch up without a pause ? This type of synchronization gets very difficult very quickly. We haven't even gone through edge cases (fail-over scenarios whilst the master/slave are still syncing for example). Which is why my preference is pause processing whilst a bulk transfer happens. In reality, as we prefer shared-nothing architectures, this involves copying journal files and database files from one machine to another - which can be done relatively quickly - so pausing the clients won't be too onerous. cheers, Rob I On 8 Mar 2006, at 19:28, Sridhar Komandur wrote: > On 3/8/06, Ning Li wrote: >> >> Bulk synch is a good idea, I think we can find a way to do it in >> current >> system, like create a topic and every message comes in will be >> sent to >> that topic, when the secondary comes up, it can pull those >> messages. Or >> we can find other ways to do it. > > > Yes, an internally created (persisted) queue at the primary > to store stuff when the secondary is not in sight. When the > secondary comes > up > it drains from that subject ? Sounds like a good idea to me. > > > One difficulty is we cannot pause the primary broker, it is hard > for the >> secondary to catch up with both the historic and ongoing messages, I >> think there is a timing issue in it. I guess that is why James >> recommended pausing the primary broker. >> >> I am not sure if we can find a way to do both dynamic synch and bulk >> synch at the same time in the current system that will be great. > > > > It can be done - we need a notion of ordering among all the > messages (coming > from both dynamic as well as bulk synch). This ordering can be > provided by > the message arrival time stamp at the primary. > > Once we do this it is a matter of inserting the incoming messages > (without > worrying about the source) to the same target store. We can even > have the > bulk synch proceed in a lazy fashion - a background task at the > primary (and > possibly at the secondary) for a couple of reasons: > - latest messages are more relevant/important > - latest messages could in fact be retransmissions of the old, so > it is ok > to process the old messeges later for recovery purposes > > Regards > - Sridhar > > Thanks. >> >> Ning >> -----Original Message----- >> From: sridharkomandur@gmail.com [mailto:sridharkomandur@gmail.com] On >> Behalf Of Sridhar Komandur >> Sent: Wednesday, March 08, 2006 9:59 AM >> To: activemq-dev@geronimo.apache.org >> Subject: Re: improve master/slave topology >> >> I like the idea of broker-broker synchronization. One of the >> issues to >> resolve is how reliable this synch activity needs to be ? A >> transactional >> approach is too heavy weight for the common case. >> >> I think a middle ground based on TCP may be good enough. We can >> divide >> the >> synchronization into two phases: >> - dynamic synch : messages are sent to the partner on an ongoing >> basis >> - bulk synch: a new secondary comes up and its state needs to be >> brought >> up >> to par with primary >> >> Thanks >> Regards >> - Sridhar >> >> On 3/6/06, Ning Li wrote: >>> >>> Hi, >>> >>> This is a continued discussion about dynamically reintroduce the >> master >>> after a failure, the original discussion is here. >>> >>> http://forums.activemq.org/posts/list/468.page#1653 >>> >>> James idea about pausing the slave and synchronize two DBs is better >>> than stopping the slave and doing a manual sync. But I doubt this is >>> acceptable to us, as in real production environment, we won't be >>> able >> to >>> pause the only message broker unless for a really short interval (I >>> guess have to less than one minute otherwise the end user will >>> notice >>> it). >>> >>> Maybe a broker-broker synchronization protocol is the ultimate >> solution, >>> just we are not sure how to get there. Any recommendation or >>> suggestions? >>> >>> >>> Thanks >>> >>> Ning >>> >>