Mailing-List: contact dev-help@geronimo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@geronimo.apache.org
Received-SPF: neutral (asf.osuosl.org: local policy)
Message-ID: <43E22375.1090501@coredevelopers.net>
Date: Thu, 02 Feb 2006 15:21:25 +0000
From: Jules Gosnell <jules@coredevelopers.net>
User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
MIME-Version: 1.0
To: wadi-dev@incubator.apache.org
CC: dev@geronimo.apache.org,  dev@wadi.codehaus.org
Subject: Re: Replication using totem protocol
References: <59575.68.101.239.161.1136400404.squirrel@68.101.239.161>
 <01CD8D63-412D-4B7C-B7C9-C77228829439@iq80.com>
 <58577.68.101.239.161.1137097683.squirrel@68.101.239.161>
 <B02DF16E-025E-4522-8937-27A03683E0DB@iq80.com>
 <60263.68.101.239.161.1137109409.squirrel@68.101.239.161>
 <0A4CB6D2-AE7F-43E5-9A7C-D463F67E1FA0@iq80.com>
 <Pine.BSO.4.58.0601131638050.32175@ida.bway.net>
 <43CB94D4.9000904@coredevelopers.net>
 <Pine.BSO.4.58.0601161313450.7694@ida.bway.net>
 <43CC2744.8050105@coredevelopers.net>
 <Pine.BSO.4.58.0601161829110.7694@ida.bway.net>
 <43CC362B.3080900@coredevelopers.net>
 <Pine.BSO.4.58.0601161913030.7694@ida.bway.net>
 <43CCADAB.5020707@coredevelopers.net>
 <Pine.BSO.4.58.0601171234080.31016@ida.bway.net>
 <43CE09A6.2010108@coredevelopers.net>
 <6.2.5.6.2.20060202123257.02709900@bea.com>
In-Reply-To: <6.2.5.6.2.20060202123257.02709900@bea.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Andy Piper wrote:

> At 09:25 AM 1/18/2006, Jules Gosnell wrote:
>
>> I haven't been able to convince myself to take the quorum approach 
>> because...
>>
>> shared-something approach:
>> - the shared something is a Single Point of Failure (SPoF) - although 
>> you could use an HA something.
>
>
> That's how WAS and WLS do it. Use an HA database, SAN or dual-ported 
> scsi. The latter is cheap. The former are probably already available 
> to customers if they really care about availability.

Well, I guess we will have to consider making something along these 
lines available... - I guess we need a pluggable QuorumStrategy.

>
>> - If the node holding the lock 'goes crazy', but does not die, the 
>> rest of the
>
>
> This is generally why you use leases. Then your craziness is only 
> believed for a fixed amount of time.

Understood.

>
>> cluster becomes a fragment - so it becomes an SPoF as well.
>> - used in isolation, it does not take into account that the lock may 
>> be held by the smallest cluster fragment
>
>
> You generally solve this again with leases. i.e. a lock that is valid 
> for some period.

i don't follow you here - but we have lost quite a bit of context. i 
think that I was saying that if the fragment that owned the 
shared-something was the smaller of the two, then 'freezing' the larger 
fragment would not be optimal - but, I guess you could use the 
shared-something to negotiate between the two fragments and decide which 
to freeze and which to allow to continue...

I don't see leases helping here - but maybe i have mitaken the context ?

>
>> shared-nothing approach:
>
>
> Nice in theory but tricky to implement well. Consensus works well here.
>
>> - I prefer this approach, but, as you have stated, if the two halves 
>> are equally sized...
>> - What if there are two concurrent fractures (does this happen?)
>> - ActiveCluster notifies you of one membership change at a time - so 
>> you would have to decide on an algorithm for 'chunking' node loss, so 
>> that you could decide when a fragmentation had occurred...
>
>
> If you really want to do this reliably you have to assume that AC will 
> send you bogus notifications. Ideally you want to achieve a consensus 
> on membership to avoid this. It sounds like totem solves some of these 
> issues.

Totem does seem to have some advanced consensus stuff, which, I am 
?assuming?, relies on its virtual synchrony. This stuff would probably 
be very useful under ActiveCluster to manage membership change and 
partition notifications, as it would, I understand, guarantee that every 
node received a consistant view of what was going on.

For the peer->peer messaging aspect of AC (1->1 and 1->all), I don't 
think VS is required. In fact it might be an unwelcome overhead. I don't 
know enough about the internals of AC and Totem to know if it would be 
possible to reuse Totem's VS/consensus stuff on-top-of/along-side AMQs 
e.g. peer:// protocol stack and underneath AC's membership notification 
API, but it seems to me that ultimately the best solution would be a 
hybrid, that uses these approaches where needed and not where not...

Have I got the right end of the stick ? Perhaps you can choose which 
messages are virtually synchronous and which are not in Totem ? I am 
pretty sure though, that it was using muticast, so is not the best 
solution for 1->1 messaging....


Jules

>
> andy 


-- 
"Open Source is a self-assembling organism. You dangle a piece of
string into a super-saturated solution and a whole operating-system
crystallises out around it."

/**********************************
 * Jules Gosnell
 * Partner
 * Core Developers Network (Europe)
 *
 *    www.coredevelopers.net
 *
 * Open Source Training & Support.
 **********************************/