geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajith Attapattu <>
Subject Re: Replication using totem protocol
Date Tue, 17 Jan 2006 19:29:26 GMT
Can u guys talk more about locking mechanisms pros and cons wrt in memory
replication and storaged backed replication.

Also what if a node goes down while the lock is aquirred?? I assume there is
a time out.

When it comes to partition (either network/power failure or vistual) or
healing (same new nodes comming up as well??) what are some of the
algorithms and stratergies that are widely used to handle those situations
?? any pointers will be great.
(All I know is there is no algorithm that garuntees 100% recovery and
fail-over, but a reasonable expectation is that all is not lost and can
continue from some where)

so if u are in the middle of filling a 10 page application on the web and
while in the 9th page and the server goes down, if you can restart again
with the 7 or 8th page (a resonable percentage of data was preserved through
merge/split/change) I guess it would be tolarable if not excellent in a very
busy server.

I guess the sucess of any clustering framework is not to solve all concerns
regarding every possible solution, but to have a good abstraction of the
high level concerns (and delay the impl conerns to as late as application
level if possible) BUT!!! bundle with a few sensible impls/stratergies, so
that if people have very specific situations they can make the decesions by
themselves to how they are going to manage performance vs compliance vs
scalability and HA.



On 1/17/06, lichtner <> wrote:
> On Tue, 17 Jan 2006, Jules Gosnell wrote:
> > just when you thought that this thread would die :-)
> I think Jeff Genender wanted a discussion to be sparked, and it worked.
> > So, I am wondering how might I use e.g. a shared disc or majority voting
> > in this situation ? In order to decide which fragment was the original
> > cluster and which was the piece that had broken off ? but then what
> > would the piece that had broken off do ? shutdown ?
> Wait to rejoin the cluster. Since it is not "the" cluster, it waits. It is
> not safe to make any updates.
> _How_ a groups decides it is "the" cluster can be done in several ways.
> Shared-disk cluster can do by a locking operation on a disk (I would have
> to research the details on this), a cluster with a database can get a lock
> from the database (and keep the connection open). And one way to do this
> in a shared-nothing cluster is to use a quorum of N/2 + 1, where is the
> maximum number of nodes. Clearly it has to be the majority or else you can
> have a split-brain cluster.
> > Do you think that we need to worry about situations where a piece of
> > state has more than one client, so a network partition may result in two
> > copies diverging in different and incompatible directions, rather than
> > only one diverging.
> If you use a quorum or quorum-resource as above you do not have this
> problem. You can turn down the requests or let them block until the
> cluster re-discovers the 'failed' nodes.
> > I can imagine this happening in an Entity Bean (but
> > we should be able to use the DB to resolve this) or an application POJO.
> > I haven't considered the latter case and it looks pretty hopeless to me,
> > unless you have some alternative route over which the two fragments can
> > communicate... but then, if you did, would you not pair it with your
> > original network, so that the one failed over to the other or replicated
> > its activity, so that you never perceived a split in the first place ?
> > Is this a common solution, or do people use other mechanisms here ?
> I do believe that membership and quorum is all you need.
> Guglielmo

View raw message