Return-Path: Delivered-To: apmail-geronimo-dev-archive@www.apache.org Received: (qmail 87630 invoked from network); 2 Feb 2006 15:22:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 2 Feb 2006 15:22:58 -0000 Received: (qmail 5593 invoked by uid 500); 2 Feb 2006 15:22:53 -0000 Delivered-To: apmail-geronimo-dev-archive@geronimo.apache.org Received: (qmail 5534 invoked by uid 500); 2 Feb 2006 15:22:53 -0000 Mailing-List: contact dev-help@geronimo.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: dev@geronimo.apache.org List-Id: Delivered-To: mailing list dev@geronimo.apache.org Received: (qmail 5515 invoked by uid 99); 2 Feb 2006 15:22:52 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Feb 2006 07:22:52 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [209.235.255.182] (HELO jetty3.inetu.net) (209.235.255.182) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Feb 2006 07:22:52 -0800 Received: (qmail 50623 invoked from network); 2 Feb 2006 15:22:28 -0000 Received: from host86-137-51-202.range86-137.btcentralplus.com (HELO ?192.168.0.4?) (jules@86.137.51.202) by jetty3.inetu.net with AES256-SHA encrypted SMTP; 2 Feb 2006 15:22:28 -0000 Message-ID: <43E22375.1090501@coredevelopers.net> Date: Thu, 02 Feb 2006 15:21:25 +0000 From: Jules Gosnell User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929) X-Accept-Language: en-us, en MIME-Version: 1.0 To: wadi-dev@incubator.apache.org CC: dev@geronimo.apache.org, dev@wadi.codehaus.org Subject: Re: Replication using totem protocol References: <59575.68.101.239.161.1136400404.squirrel@68.101.239.161> <01CD8D63-412D-4B7C-B7C9-C77228829439@iq80.com> <58577.68.101.239.161.1137097683.squirrel@68.101.239.161> <60263.68.101.239.161.1137109409.squirrel@68.101.239.161> <0A4CB6D2-AE7F-43E5-9A7C-D463F67E1FA0@iq80.com> <43CB94D4.9000904@coredevelopers.net> <43CC2744.8050105@coredevelopers.net> <43CC362B.3080900@coredevelopers.net> <43CCADAB.5020707@coredevelopers.net> <43CE09A6.2010108@coredevelopers.net> <6.2.5.6.2.20060202123257.02709900@bea.com> In-Reply-To: <6.2.5.6.2.20060202123257.02709900@bea.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Andy Piper wrote: > At 09:25 AM 1/18/2006, Jules Gosnell wrote: > >> I haven't been able to convince myself to take the quorum approach >> because... >> >> shared-something approach: >> - the shared something is a Single Point of Failure (SPoF) - although >> you could use an HA something. > > > That's how WAS and WLS do it. Use an HA database, SAN or dual-ported > scsi. The latter is cheap. The former are probably already available > to customers if they really care about availability. Well, I guess we will have to consider making something along these lines available... - I guess we need a pluggable QuorumStrategy. > >> - If the node holding the lock 'goes crazy', but does not die, the >> rest of the > > > This is generally why you use leases. Then your craziness is only > believed for a fixed amount of time. Understood. > >> cluster becomes a fragment - so it becomes an SPoF as well. >> - used in isolation, it does not take into account that the lock may >> be held by the smallest cluster fragment > > > You generally solve this again with leases. i.e. a lock that is valid > for some period. i don't follow you here - but we have lost quite a bit of context. i think that I was saying that if the fragment that owned the shared-something was the smaller of the two, then 'freezing' the larger fragment would not be optimal - but, I guess you could use the shared-something to negotiate between the two fragments and decide which to freeze and which to allow to continue... I don't see leases helping here - but maybe i have mitaken the context ? > >> shared-nothing approach: > > > Nice in theory but tricky to implement well. Consensus works well here. > >> - I prefer this approach, but, as you have stated, if the two halves >> are equally sized... >> - What if there are two concurrent fractures (does this happen?) >> - ActiveCluster notifies you of one membership change at a time - so >> you would have to decide on an algorithm for 'chunking' node loss, so >> that you could decide when a fragmentation had occurred... > > > If you really want to do this reliably you have to assume that AC will > send you bogus notifications. Ideally you want to achieve a consensus > on membership to avoid this. It sounds like totem solves some of these > issues. Totem does seem to have some advanced consensus stuff, which, I am ?assuming?, relies on its virtual synchrony. This stuff would probably be very useful under ActiveCluster to manage membership change and partition notifications, as it would, I understand, guarantee that every node received a consistant view of what was going on. For the peer->peer messaging aspect of AC (1->1 and 1->all), I don't think VS is required. In fact it might be an unwelcome overhead. I don't know enough about the internals of AC and Totem to know if it would be possible to reuse Totem's VS/consensus stuff on-top-of/along-side AMQs e.g. peer:// protocol stack and underneath AC's membership notification API, but it seems to me that ultimately the best solution would be a hybrid, that uses these approaches where needed and not where not... Have I got the right end of the stick ? Perhaps you can choose which messages are virtually synchronous and which are not in Totem ? I am pretty sure though, that it was using muticast, so is not the best solution for 1->1 messaging.... Jules > > andy -- "Open Source is a self-assembling organism. You dangle a piece of string into a super-saturated solution and a whole operating-system crystallises out around it." /********************************** * Jules Gosnell * Partner * Core Developers Network (Europe) * * www.coredevelopers.net * * Open Source Training & Support. **********************************/