Mailing-List: contact dev-help@geronimo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@geronimo.apache.org
Received-SPF: pass (asf.osuosl.org: domain of andyp@bea.com designates
 63.96.162.5 as permitted sender)
Message-Id: <6.2.5.6.2.20060202123257.02709900@bea.com>
Date: Thu, 02 Feb 2006 12:41:28 +0000
To: dev@geronimo.apache.org
From: Andy Piper <andyp@bea.com>
Subject: Re: Replication using totem protocol
Cc: wadi-dev@incubator.apache.org, dev@geronimo.apache.org,
   dev@wadi.codehaus.org
In-Reply-To: <43CE09A6.2010108@coredevelopers.net>
References: <59575.68.101.239.161.1136400404.squirrel@68.101.239.161>
 <01CD8D63-412D-4B7C-B7C9-C77228829439@iq80.com>
 <58577.68.101.239.161.1137097683.squirrel@68.101.239.161>
 <B02DF16E-025E-4522-8937-27A03683E0DB@iq80.com>
 <60263.68.101.239.161.1137109409.squirrel@68.101.239.161>
 <0A4CB6D2-AE7F-43E5-9A7C-D463F67E1FA0@iq80.com>
 <Pine.BSO.4.58.0601131638050.32175@ida.bway.net>
 <43CB94D4.9000904@coredevelopers.net>
 <Pine.BSO.4.58.0601161313450.7694@ida.bway.net>
 <43CC2744.8050105@coredevelopers.net>
 <Pine.BSO.4.58.0601161829110.7694@ida.bway.net>
 <43CC362B.3080900@coredevelopers.net>
 <Pine.BSO.4.58.0601161913030.7694@ida.bway.net>
 <43CCADAB.5020707@coredevelopers.net>
 <Pine.BSO.4.58.0601171234080.31016@ida.bway.net>
 <43CE09A6.2010108@coredevelopers.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed

At 09:25 AM 1/18/2006, Jules Gosnell wrote:
>I haven't been able to convince myself to take the quorum approach because...
>
>shared-something approach:
>- the shared something is a Single Point of Failure (SPoF) - 
>although you could use an HA something.

That's how WAS and WLS do it. Use an HA database, SAN or dual-ported 
scsi. The latter is cheap. The former are probably already available 
to customers if they really care about availability.

>- If the node holding the lock 'goes crazy', but does not die, the 
>rest of the

This is generally why you use leases. Then your craziness is only 
believed for a fixed amount of time.

>cluster becomes a fragment - so it becomes an SPoF as well.
>- used in isolation, it does not take into account that the lock may 
>be held by the smallest cluster fragment

You generally solve this again with leases. i.e. a lock that is valid 
for some period.

>shared-nothing approach:

Nice in theory but tricky to implement well. Consensus works well here.

>- I prefer this approach, but, as you have stated, if the two halves 
>are equally sized...
>- What if there are two concurrent fractures (does this happen?)
>- ActiveCluster notifies you of one membership change at a time - so 
>you would have to decide on an algorithm for 'chunking' node loss, 
>so that you could decide when a fragmentation had occurred...

If you really want to do this reliably you have to assume that AC 
will send you bogus notifications. Ideally you want to achieve a 
consensus on membership to avoid this. It sounds like totem solves 
some of these issues.

andy