geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jules Gosnell <>
Subject Re: Web State Replication... (long)
Date Fri, 31 Oct 2003 00:52:43 GMT

I've give the more-than-one-replication-bucket-per-node a  little more 
thought ...

I'm not sure that the extra complexity will merit the perceived gains in 
terms of balancing load associated with cluster growing/shrinking around 
the whole cluster, instead of just between the nodes immediately 
surrounding the point of node entry/exit, however, this is an area that 
we should consider more closely. Perhaps we could even generalise the 
algorithm to allow the configuration of replication-buckets-per-node....

I'll keep on it.


Jules Gosnell wrote:

> Guys,
> I understand exactly what you are both saying and you can relax - at 
> migration time, I am working at the session level - that is one 
> bucket=one session - so if you have 10 sessions and you want to leave 
> a cluster of 11 nodes, provided that your load-balancer can handle it, 
> you can migrate 1 session to each node.
> However at replication time 1 bucket=the whole state of the node - 
> i.e. replication groups are organised at the node level - not at 
> single session level. Having each session remember where each of it's 
> copies is is just too much overhead and as I pointed out in my last 
> mail, I can't see any advantage in terms of resilience in every node 
> holding (n*1/n)*b*s or 1*b*s sessions, or some division between - the 
> point is it will always add up to b*s, which is the number of sessions 
> and backups that will need to rehomed if you lose a node. It is the 
> granularity at which the rehoming takes place that is important, and 
> as I have show, this is the most granular it can be.
> Of course, their is no reason why all migration should be done at the 
> single session level - load-balancer allowing - a node could put in a 
> bid for several thousand sessions and have them all batched and 
> migrated across in a single transaction.
> We are describing pretty much the same thing in different terms.
> Happier :-)  ?
> Jules
> Dain Sundstrom wrote:
>> Jules,
>> IIRC James point is having a lot more buckets then nodes makes adding 
>> and reorganizing state much easier.  Of course in the case of a 
>> failure you still have bulk transfer of data, but the bulk transfer 
>> is spread across the cluster.  This help avoid a dominions style 
>> cascade delete where the first node dies and then is backup dies from 
>> the bulk transfer load, and then that backup dies and so on.
>> Anyway, I think the big benefit is the ease of redistributing 
>> sessions.  Instead of a new node saying I'll take these 3k sessions, 
>> it says I'll take these three buckets.  The load is much less but I 
>> think the biggest benefit is the code should be easier to debug, 
>> understand and write.
>> It is not important now.  As long as we keep the interface simple and 
>> clean we can try many implementations until something fits.
>> -dain
>> On Thursday, October 30, 2003, at 11:43 AM, Jules Gosnell wrote:
>>> James Strachan wrote:
>>>> n Thursday, October 30, 2003, at 12:19  pm, gianny DAMOUR wrote:
>>>>> Hello,
>>>>> Just a couple of questions regarding this design:
>>>>> - Is it possible to configure the weight of a node? If yes, is the 
>>>>> same auto-partitioning policy applicable? My concern is that a 
>>>>> "clockwise" policy may add a significant load on nodes hosted by 
>>>>> low spec hosts.
>>>> This is partly a problem for the sticky load balancer to deal with 
>>>> i.e. it should load requests to primary machines based on spec/power.
>>>> If we partitioned the session data into buckets (rather than one 
>>>> big lump), then the buckets of session data can be distributed 
>>>> evenly around the cluster so that each session bucket has N buddies 
>>>> (replicas) but that a load-balancing algorithm could be used to 
>>>> distribute the buckets based on (say) a host spec weighting or 
>>>> whatnot. e.g. nodes in the cluster could limit how many buckets to 
>>>> accept due to their lack of resources etc.
>>>> Imagine having 1 massive box and 2 small ones in a cluster - you'd 
>>>> probably want to give the big box more buckets than the smaller 
>>>> ones. The previous model Jules described still holds (that was a 
>>>> view of 1 session bucket) - its just that the total session state 
>>>> for a machine might be spread over many buckets.
>>>> Having multiple buckets could also help spread the load of 
>>>> recovering from a node failure in larger clusters.
>>> James, I have given this quite a bit of thought... and whilst it was 
>>> initially appealing and seemed a sensible extension of my train of 
>>> thought, I have not been able to find any advantage in splitting one 
>>> nodes state into mutiple buckets....
>>> If a node joins or leaves, you still have exactly the same amount of 
>>> state to shift around the cluster.
>>> If you back up your sessions off-node, then whether these are all on 
>>> one backup node, or spread over 10 makes no difference, since in the 
>>> first case if you lose the backup node you have to shift 100% x 1 
>>> nodes state. In the second case you have to shift 10% x 10 nodes 
>>> state (since the backup node will be carrying 10% of the state of 
>>> another 9 nodes as well as your own). Initially it looks more 
>>> resilient but...
>>> So I am sticking, by virtue of Occam's razor, to the simpler 
>>> approach for them moment, until someone can draw attention to a 
>>> situation where the extra complexity of a higher granularity 
>>> replication strategy is worth the gain.
>>> Thinking about it, my current design is probably hybrid - since 
>>> whilst a nodes state is all held in a single bucket, individual 
>>> sessions may be migrated out of that bucket and into another one on 
>>> another node. So it is replication granularity that is set to 
>>> node-level, but migration granularity is at session level. I guess 
>>> you are suggesting that a bucket is somewhere between the two of 
>>> these and is the level at which both are replicated and migrated ? 
>>> I'll give it some more thought :-)
>>> Jules
>>>>> - I have the feeling that one can not configure a preferred 
>>>>> replication group for primary sessions of a specific node: if four 
>>>>> nodes are available, I would like to configure that sessions of 
>>>>> the first node should be replicated by the third node, if 
>>>>> available, or the fourth one.
>>>>> - Is it not an overhead to have b-1 replica? AFAIK, a single 
>>>>> secondary should be enough.
>>>> It all depends on your risk profile I suppose. I backup is usually 
>>>> enough but you may want 2 for extra resilience - especially as one 
>>>> of those could be in a separate DR zone for really serious 
>>>> fail-over scenarios.
>>>> James
>>>> -------
>>> -- 
>>> /*************************************
>>> * Jules Gosnell
>>> * Partner
>>> * Core Developers Network (Europe)
>>> *
>>> *************************************/
>> /*************************
>>  * Dain Sundstrom
>>  * Partner
>>  * Core Developers Network
>>  *************************/

 * Jules Gosnell
 * Partner
 * Core Developers Network (Europe)

View raw message