geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jules Gosnell <>
Subject Web State Replication... (long)
Date Thu, 30 Oct 2003 11:10:03 GMT

I thought I would present my current design for state replication. 
Please see
the enclosed diagram.

First, some terms :

n - the number of nodes in the cluster
s - a single copy of one node's state (a small coloured box)
b - the number of 'buddies' (n) in a partition and therefore copies of 
each 's'



State is replicated clockwise
Red's state is backed up by Green and Yellow
Green's state is backed up by Yellow and Red

(2) introduction of a new node


Blue joins cluster
Yellow and Red take responsibility for backing up Blue's state 
(cuurently empty).
Blue takes responsibility for backing up Green and Red's state.
A copy of Green's state 'migrates' from Red to Blue
A copy of Red's state 'migrates' to Blue
Since blue entered with no state the total migration cost is 2s

(3) node loss

Blue leaves the cluster
The lost copy of Red's state is replicated from Green to Yellow
The lost copy of Green's state is replicated from Yellow to Red
The lost copy of Blues state is replicated from Red/Yellow to Green
The total migration cost is 3s

In summary

a node entering the cluster costs (b-1)*s
a node leaving the cluster costs b*s

these costs are constant, regardless of the value of n (number of
nodes in cluster)

breaking each node's state into more than the one piece suggested here and
scattering these more widely across a larger cluster will complicate the 
and not reduce these costs.

You'll note that in (3) each node is now carrying the state of 4, not
3, nodes and that furthermore, if there were other nodes in the
cluster, they would not be sharing this burden.

The solution is to balance the state around the cluster.

How/Whether this can be done depends upon your load-balancer as
discussed in a previous posting.

If you can explicitly inform your load-balancer to 'unstick' a session
from one node and 'restick' it to another, then you have a mechanism
whereby you could split the state in the blue boxes up into smaller
pieces, migrate them equally to all nodes in the cluster and notify
the load-balancer of their new locations (i.e. divide the contents of the
blue boxes and subsume them into R,G and Y boxes).

If you cannot coordinate with your load-balancer like this, you may
have to wait for it to try to deliver a request to Blue and failover
to another node. If you can predict which node this will be, you can
proactively have migrated the state to it. If you cannot predict where
this will be, you have to wait for the request to arrive and then
reactively migrate the state in underneath it or then forward/redirect
the request to the node that has the state.

This piece of decision making needs to be abstracted into an
e.g. LoadBalancerPolicy class which can be plugged in to provide the
correct behaviour for your deployment environment.

The passivation of aging sessions into a shared store can mitigate
migration costs by reducing the number of active (in-vm) sessions that
each node's state comprises, whilst maintaining their availability.

Migration should always be sourced from a non-primary node since there
will be less contention on backed up state (which whill only be
receiving writes) than on primary state (which will be the subject of
reads and writes). Furthermore, using the scheme I presented for lazy
demarshalling, you can see that the cost of marshalling backed-up
state will be significantly less than that of marshalling primary
state as potentially large object trees will already have been
flattened into byte[].

The example above has only considered the web tier. If this is
colocated with e.g. an EJB tier, then the cost of migration goes up,
since web state is likely to have associations with EJB state which
are best maintained locally. This means that if you migrate web state
you should consider migrating the related EJB state along with it. So
a wholistic approach which considers all tiers together is required.

I hope that the model that I am proposing here is sufficiently generic
to be of use for the clustering of other tiers/services as well.

That's enough for now,


 * Jules Gosnell
 * Partner
 * Core Developers Network (Europe)

View raw message