geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lichtner <>
Subject Re: Replication using totem protocol
Date Tue, 17 Jan 2006 19:41:14 GMT

On Tue, 17 Jan 2006, Rajith Attapattu wrote:

> Can u guys talk more about locking mechanisms pros and cons wrt in memory
> replication and storaged backed replication.

I don't know what you have in mind here by 'storage-backed'.

> Also what if a node goes down while the lock is aquirred?? I assume there is
> a time out.

Which architecture do you have in mind here? I think the question is
relevant if you use a standalone lock server, but if you don't then you
just put the lock queue with the data item in question.

> When it comes to partition (either network/power failure or vistual) or
> healing (same new nodes comming up as well??) what are some of the
> algorithms and stratergies that are widely used to handle those situations
> ?? any pointers will be great.

I believe the best strategy depends on what type of state the application
has. Clearly if the state took zero time to transfer over you could
compare version numbers, transfer the state to the nodes that happen to be
out-of-date, and you are back in business. OTOH if the state is 1Gb you
will take a different approach. There is not much to look up here. Think
about it carefull and you can come up with the best state transfer for
your application.

Session state is easier than others because it consists of miryads small,
independent data items that do not support concurrent access.

> so if u are in the middle of filling a 10 page application on the web and
> while in the 9th page and the server goes down, if you can restart again
> with the 7 or 8th page (a resonable percentage of data was preserved through
> merge/split/change) I guess it would be tolarable if not excellent in a very
> busy server.

Since this is a question about availability consider a cluster, say 4
nodes, with a minimum R=2, say, where all the sessions are replicated on
_each_ node. If you want to guarantee that the user's work is _never_
lost, just send all session updates to yourself in a totem-protocol 'safe'
message, which is delivered only after the message has been received (but
not delivered) by all the nodes, and wait for your own message to arrive.
This takes between 1 and 2 token rotations, which on 4 nodes I guess would
be between 10-20 milliseconds, which is not a lot as http request
latencies go.

As a result of this after an http request returns, the work done is likely
to survive node crashes up to 4 - R = 2 node crashes.

View raw message