incubator-wadi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jules Gosnell <>
Subject Re: Replication using totem protocol
Date Wed, 18 Jan 2006 09:25:58 GMT
lichtner wrote:

>On Tue, 17 Jan 2006, Jules Gosnell wrote:
>>just when you thought that this thread would die :-)
>I think Jeff Genender wanted a discussion to be sparked, and it worked.
>>So, I am wondering how might I use e.g. a shared disc or majority voting
>>in this situation ? In order to decide which fragment was the original
>>cluster and which was the piece that had broken off ? but then what
>>would the piece that had broken off do ? shutdown ?
>Wait to rejoin the cluster. Since it is not "the" cluster, it waits. It is
>not safe to make any updates.
>_How_ a groups decides it is "the" cluster can be done in several ways.
>Shared-disk cluster can do by a locking operation on a disk (I would have
>to research the details on this), a cluster with a database can get a lock
>from the database (and keep the connection open). And one way to do this
>in a shared-nothing cluster is to use a quorum of N/2 + 1, where is the
>maximum number of nodes. Clearly it has to be the majority or else you can
>have a split-brain cluster.
I haven't been able to convince myself to take the quorum approach 

shared-something approach:
- the shared something is a Single Point of Failure (SPoF) - although 
you could use an HA something.
- If the node holding the lock 'goes crazy', but does not die, the rest 
of the cluster becomes a fragment - so it becomes an SPoF as well.
- used in isolation, it does not take into account that the lock may be 
held by the smallest cluster fragment

shared-nothing approach:
- I prefer this approach, but, as you have stated, if the two halves are 
equally sized...
- What if there are two concurrent fractures (does this happen?)
- ActiveCluster notifies you of one membership change at a time - so you 
would have to decide on an algorithm for 'chunking' node loss, so that 
you could decide when a fragmentation had occurred...

perhaps a hybrid of the two would be able to cover more bases... - 
shared-nothing falling back to shared-something if your fragment is 
sized N/2.

As far as my plans for WADI, I think I am happy to stick with the, 'rely 
on affinity and keep going' approach.

As far as situations where a distributed object may have more than one 
client, I can see that quorum offers the hope of a solution, but, 
without some very careful thought, I would still be hesitant to stake my 
shirt on it :-) for the reasons given above...

I hadn't really considered 'pausing' a cluster fragment, so this is a 
useful idea. I guess that I have been thinking more in terms of 
long-lived fractures, rather than short-lived ones. If the latter are 
that much more common, then this is great input and I need to take it 
into account.

The issue about 'chunking' node loss interests me... I see that the 
EVS4J Listener returns a set of members, so it is possible to express 
the loss of more than one node. How is membership decided and node loss 
aggregated ?

Thanks again for your time,


>>Do you think that we need to worry about situations where a piece of
>>state has more than one client, so a network partition may result in two
>>copies diverging in different and incompatible directions, rather than
>>only one diverging.
>If you use a quorum or quorum-resource as above you do not have this
>problem. You can turn down the requests or let them block until the
>cluster re-discovers the 'failed' nodes.
>>I can imagine this happening in an Entity Bean (but
>>we should be able to use the DB to resolve this) or an application POJO.
>>I haven't considered the latter case and it looks pretty hopeless to me,
>>unless you have some alternative route over which the two fragments can
>>communicate... but then, if you did, would you not pair it with your
>>original network, so that the one failed over to the other or replicated
>>its activity, so that you never perceived a split in the first place ?
>>Is this a common solution, or do people use other mechanisms here ?
>I do believe that membership and quorum is all you need.

"Open Source is a self-assembling organism. You dangle a piece of
string into a super-saturated solution and a whole operating-system
crystallises out around it."

 * Jules Gosnell
 * Partner
 * Core Developers Network (Europe)
 * Open Source Training & Support.

View raw message