geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jules Gosnell <ju...@coredevelopers.net>
Subject Re: Web State Replication... (long)
Date Thu, 30 Oct 2003 18:51:48 GMT
Guys,

I understand exactly what you are both saying and you can relax - at 
migration time, I am working at the session level - that is one 
bucket=one session - so if you have 10 sessions and you want to leave a 
cluster of 11 nodes, provided that your load-balancer can handle it, you 
can migrate 1 session to each node.

However at replication time 1 bucket=the whole state of the node - i.e. 
replication groups are organised at the node level - not at single 
session level. Having each session remember where each of it's copies is 
is just too much overhead and as I pointed out in my last mail, I can't 
see any advantage in terms of resilience in every node holding 
(n*1/n)*b*s or 1*b*s sessions, or some division between - the point is 
it will always add up to b*s, which is the number of sessions and 
backups that will need to rehomed if you lose a node. It is the 
granularity at which the rehoming takes place that is important, and as 
I have show, this is the most granular it can be.

Of course, their is no reason why all migration should be done at the 
single session level - load-balancer allowing - a node could put in a 
bid for several thousand sessions and have them all batched and migrated 
across in a single transaction.

We are describing pretty much the same thing in different terms.

Happier :-)  ?


Jules


Dain Sundstrom wrote:

> Jules,
>
> IIRC James point is having a lot more buckets then nodes makes adding 
> and reorganizing state much easier.  Of course in the case of a 
> failure you still have bulk transfer of data, but the bulk transfer is 
> spread across the cluster.  This help avoid a dominions style cascade 
> delete where the first node dies and then is backup dies from the bulk 
> transfer load, and then that backup dies and so on.
>
> Anyway, I think the big benefit is the ease of redistributing 
> sessions.  Instead of a new node saying I'll take these 3k sessions, 
> it says I'll take these three buckets.  The load is much less but I 
> think the biggest benefit is the code should be easier to debug, 
> understand and write.
>
> It is not important now.  As long as we keep the interface simple and 
> clean we can try many implementations until something fits.
>
> -dain
>
> On Thursday, October 30, 2003, at 11:43 AM, Jules Gosnell wrote:
>
>> James Strachan wrote:
>>
>>> n Thursday, October 30, 2003, at 12:19  pm, gianny DAMOUR wrote:
>>>
>>>> Hello,
>>>>
>>>> Just a couple of questions regarding this design:
>>>>
>>>> - Is it possible to configure the weight of a node? If yes, is the 
>>>> same auto-partitioning policy applicable? My concern is that a 
>>>> "clockwise" policy may add a significant load on nodes hosted by 
>>>> low spec hosts.
>>>
>>>
>>>
>>> This is partly a problem for the sticky load balancer to deal with 
>>> i.e. it should load requests to primary machines based on spec/power.
>>>
>>> If we partitioned the session data into buckets (rather than one big 
>>> lump), then the buckets of session data can be distributed evenly 
>>> around the cluster so that each session bucket has N buddies 
>>> (replicas) but that a load-balancing algorithm could be used to 
>>> distribute the buckets based on (say) a host spec weighting or 
>>> whatnot. e.g. nodes in the cluster could limit how many buckets to 
>>> accept due to their lack of resources etc.
>>>
>>> Imagine having 1 massive box and 2 small ones in a cluster - you'd 
>>> probably want to give the big box more buckets than the smaller 
>>> ones. The previous model Jules described still holds (that was a 
>>> view of 1 session bucket) - its just that the total session state 
>>> for a machine might be spread over many buckets.
>>>
>>> Having multiple buckets could also help spread the load of 
>>> recovering from a node failure in larger clusters.
>>
>>
>> James, I have given this quite a bit of thought... and whilst it was 
>> initially appealing and seemed a sensible extension of my train of 
>> thought, I have not been able to find any advantage in splitting one 
>> nodes state into mutiple buckets....
>>
>> If a node joins or leaves, you still have exactly the same amount of 
>> state to shift around the cluster.
>>
>> If you back up your sessions off-node, then whether these are all on 
>> one backup node, or spread over 10 makes no difference, since in the 
>> first case if you lose the backup node you have to shift 100% x 1 
>> nodes state. In the second case you have to shift 10% x 10 nodes 
>> state (since the backup node will be carrying 10% of the state of 
>> another 9 nodes as well as your own). Initially it looks more 
>> resilient but...
>>
>> So I am sticking, by virtue of Occam's razor, to the simpler approach 
>> for them moment, until someone can draw attention to a situation 
>> where the extra complexity of a higher granularity replication 
>> strategy is worth the gain.
>>
>>
>> Thinking about it, my current design is probably hybrid - since 
>> whilst a nodes state is all held in a single bucket, individual 
>> sessions may be migrated out of that bucket and into another one on 
>> another node. So it is replication granularity that is set to 
>> node-level, but migration granularity is at session level. I guess 
>> you are suggesting that a bucket is somewhere between the two of 
>> these and is the level at which both are replicated and migrated ? 
>> I'll give it some more thought :-)
>>
>>
>> Jules
>>
>>
>>>
>>>
>>>
>>>>
>>>> - I have the feeling that one can not configure a preferred 
>>>> replication group for primary sessions of a specific node: if four 
>>>> nodes are available, I would like to configure that sessions of the 
>>>> first node should be replicated by the third node, if available, or 
>>>> the fourth one.
>>>>
>>>> - Is it not an overhead to have b-1 replica? AFAIK, a single 
>>>> secondary should be enough.
>>>
>>>
>>>
>>> It all depends on your risk profile I suppose. I backup is usually 
>>> enough but you may want 2 for extra resilience - especially as one 
>>> of those could be in a separate DR zone for really serious fail-over 
>>> scenarios.
>>>
>>> James
>>> -------
>>> http://radio.weblogs.com/0112098/
>>>
>>
>>
>> -- 
>> /*************************************
>> * Jules Gosnell
>> * Partner
>> * Core Developers Network (Europe)
>> * http://www.coredevelopers.net
>> *************************************/
>>
>>
>>
>
> /*************************
>  * Dain Sundstrom
>  * Partner
>  * Core Developers Network
>  *************************/
>


-- 
/*************************************
 * Jules Gosnell
 * Partner
 * Core Developers Network (Europe)
 * http://www.coredevelopers.net
 *************************************/



Mime
View raw message