geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "n. alex rupp" <rupp0...@umn.edu>
Subject Re: Web State Replication... (long)
Date Sat, 01 Nov 2003 20:53:08 GMT
For those mundane individuals amongst us, the definition of a *tranche*
follows, shamelessly pilfered from the OED.  And it is, I might warn you,
*anything* but quotidian.

Jules, weren't you supposed to be translating the Vulgate or something this
weekend?  : )

Nequaquam Vacuum,
--
N.




[a. F. tranche, f. trancher to cut: see TRENCH.]
    1. A cutting, a cut; a piece cut off, a slice.
    2. transf. and fig. Esp. in Econ., spec. an instalment of a loan, a
quota, a block of bonds or (esp. government) stock.
    3. tranche de vie [lit. 'slice of life'], a representation of quotidian
existence, spec. in literature or painting; also attrib.








----- Original Message ----- 
From: "Jules Gosnell" <jules@coredevelopers.net>
To: <geronimo-dev@incubator.apache.org>
Sent: Saturday, November 01, 2003 7:55 AM
Subject: Re: Web State Replication... (long)


> Further thoughs :
>
> in (2) you'll notice that each node ends up carrying ((n-1)*(b-1))+b
> tranches,
> so the number of tranches will grow along with the number of nodes in the
> cluster. This, you might think, will lead to scalability issues - it
would,
> excepting for the fact that, as n increases the size of each tranche
> decreases
> accordingly.
>
> In (3) where we are left with state that needs to be rebalanced after
> the loss
> of a node, there is at least one further option which  we have not
> considered.
>
> - since the Blue state is already nicely balanced across the cluster
> there is
> no immediate need to move it. If we adopt the strategy of waiting for
> sessions
> to be pulled out of it and adopted elsewhere as and when needed, and in
the
> meantime another node joins the cluster, we could simply call it 'Blue'
and
> allow it to replace the node we just lost with minumum fuss. If these
> tranches
>  eventually became empty of sessions due to timeout, passivation and
>  migration, we could drop them from the cluster anyway.
>
>
>
> Jules
>
>
>
> Jules Gosnell wrote:
>
> >
> > OK,
> >
> > Here is the latest and greatest.
> >
> > I have introduced a new parameter - 't' - the number of 'tranches'
> > that each nodes state is cut into so that it can be replicated across
> > a collection of different nodes instead of just going to one single
> > one. This is in response to feedback on this thread.
> >
> > Initially, I didn't see much benefit in this extra complexity, but now
> > I am coming round to it :-)
> >
> > If you want to full understand the contents of this posting, you will
> > need to have digested the previous posting I made with a similar
> > diagram.
> >
> > I'll walk through the same scenario that I presented earlier with t=1,
> > except now I will go to the opposite end of the spectrum and make
> > t=(n-1) - i.e. each node splits its state into the same number of
> > tranches as there are nodes in the cluster (excepting itself) and
> > stores one tranche with each. This is further complicated by the
> > parameter 'b' (number of buddies in a partition), which now
> > effectively becomes the number of copies of each tranche present in
> > the cluster.
> >
> > (1)
> >
> > n=3
> > b=3
> > t=(n-1)=2
> >
> > Red splits it's primary state into (n-1) tranches and starting with
> > the first tranche and replicates a copy of it to the following 'b'-1
> > nodes. Then it takes the next tranche, starts one node further out and
> > does the same thing. It excludes itself from this process and simply
> > wraps around the clock if it runs out of nodes.
> >
> > Each node does exactly the same thing, resulting in state being
> > equally balanced around the cluster.
> >
> >
> > (2)
> >
> > n=4
> > b=3
> > t=(n-1)=3
> >
> > Blue joins the cluster.
> >
> > Everyone increases their number of primary tranches by one and
> > allocates (b-1) backup tranches for Blue.
> >
> > The cluster reorganises the replication relationships between
> > nodes:tranches in accordance with the algorithm specified above. (I
> > have to work out the nitty gritty of how efficiently I can do this).
> >
> >
> > (3)
> >
> > n=3
> > b=3
> > t=(n-1)=2
> >
> > Blue leaves the cluster.
> >
> > The diagram shows the state immediately after Blue has left - before a
> > rebalancing of state.
> >
> > I suggest that Red, Green and Yellow tranches rearrange themselves as
> > efficiently as possible back to the same layout as in (1). We are then
> > left with every node carrying an extra b-1 Blue tranches.
> >
> > The sessions contained in these can either be proactively merged into
> > other nearby primary tranches and replicated according to their
> > partioning. Or lazily pulled onto the next node to receive a request
> > that requires them and assimilated at that point, or perhaps pushed
> > straight out to shared store, to be lazily loaded and adopted, as
> > required, by which ever node first needs them..
> >
> >
> >
> > I shall look at genericising my initial design to allow the
> > parameterisation of 't' to a value between 1 and n-1.
> >
> > 't' will basically control whether the joining/leaving of a node
> > wreaks a large amount of havoc on a small amount of nodes or a small
> > amount on a large amount.
> >
> >
> > That's it for now,
> >
> >
> > Jules
> >
> >
> >
> > Jules Gosnell wrote:
> >
> >> Hmmm...
> >>
> >> I've give the more-than-one-replication-bucket-per-node a  little
> >> more thought ...
> >>
> >> I'm not sure that the extra complexity will merit the perceived gains
> >> in terms of balancing load associated with cluster growing/shrinking
> >> around the whole cluster, instead of just between the nodes
> >> immediately surrounding the point of node entry/exit, however, this
> >> is an area that we should consider more closely. Perhaps we could
> >> even generalise the algorithm to allow the configuration of
> >> replication-buckets-per-node....
> >>
> >> I'll keep on it.
> >>
> >> Jules
> >>
> >>
> >>
> >> Jules Gosnell wrote:
> >>
> >>> Guys,
> >>>
> >>> I understand exactly what you are both saying and you can relax - at
> >>> migration time, I am working at the session level - that is one
> >>> bucket=one session - so if you have 10 sessions and you want to
> >>> leave a cluster of 11 nodes, provided that your load-balancer can
> >>> handle it, you can migrate 1 session to each node.
> >>>
> >>> However at replication time 1 bucket=the whole state of the node -
> >>> i.e. replication groups are organised at the node level - not at
> >>> single session level. Having each session remember where each of
> >>> it's copies is is just too much overhead and as I pointed out in my
> >>> last mail, I can't see any advantage in terms of resilience in every
> >>> node holding (n*1/n)*b*s or 1*b*s sessions, or some division between
> >>> - the point is it will always add up to b*s, which is the number of
> >>> sessions and backups that will need to rehomed if you lose a node.
> >>> It is the granularity at which the rehoming takes place that is
> >>> important, and as I have show, this is the most granular it can be.
> >>>
> >>> Of course, their is no reason why all migration should be done at
> >>> the single session level - load-balancer allowing - a node could put
> >>> in a bid for several thousand sessions and have them all batched and
> >>> migrated across in a single transaction.
> >>>
> >>> We are describing pretty much the same thing in different terms.
> >>>
> >>> Happier :-)  ?
> >>>
> >>>
> >>> Jules
> >>>
> >>>
> >>> Dain Sundstrom wrote:
> >>>
> >>>> Jules,
> >>>>
> >>>> IIRC James point is having a lot more buckets then nodes makes
> >>>> adding and reorganizing state much easier.  Of course in the case
> >>>> of a failure you still have bulk transfer of data, but the bulk
> >>>> transfer is spread across the cluster.  This help avoid a dominions
> >>>> style cascade delete where the first node dies and then is backup
> >>>> dies from the bulk transfer load, and then that backup dies and so
on.
> >>>>
> >>>> Anyway, I think the big benefit is the ease of redistributing
> >>>> sessions.  Instead of a new node saying I'll take these 3k
> >>>> sessions, it says I'll take these three buckets.  The load is much
> >>>> less but I think the biggest benefit is the code should be easier
> >>>> to debug, understand and write.
> >>>>
> >>>> It is not important now.  As long as we keep the interface simple
> >>>> and clean we can try many implementations until something fits.
> >>>>
> >>>> -dain
> >>>>
> >>>> On Thursday, October 30, 2003, at 11:43 AM, Jules Gosnell wrote:
> >>>>
> >>>>> James Strachan wrote:
> >>>>>
> >>>>>> n Thursday, October 30, 2003, at 12:19  pm, gianny DAMOUR wrote:
> >>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> Just a couple of questions regarding this design:
> >>>>>>>
> >>>>>>> - Is it possible to configure the weight of a node? If yes,
is
> >>>>>>> the same auto-partitioning policy applicable? My concern
is that
> >>>>>>> a "clockwise" policy may add a significant load on nodes
hosted
> >>>>>>> by low spec hosts.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> This is partly a problem for the sticky load balancer to deal
> >>>>>> with i.e. it should load requests to primary machines based
on
> >>>>>> spec/power.
> >>>>>>
> >>>>>> If we partitioned the session data into buckets (rather than
one
> >>>>>> big lump), then the buckets of session data can be distributed
> >>>>>> evenly around the cluster so that each session bucket has N
> >>>>>> buddies (replicas) but that a load-balancing algorithm could
be
> >>>>>> used to distribute the buckets based on (say) a host spec
> >>>>>> weighting or whatnot. e.g. nodes in the cluster could limit
how
> >>>>>> many buckets to accept due to their lack of resources etc.
> >>>>>>
> >>>>>> Imagine having 1 massive box and 2 small ones in a cluster -
> >>>>>> you'd probably want to give the big box more buckets than the
> >>>>>> smaller ones. The previous model Jules described still holds
> >>>>>> (that was a view of 1 session bucket) - its just that the total
> >>>>>> session state for a machine might be spread over many buckets.
> >>>>>>
> >>>>>> Having multiple buckets could also help spread the load of
> >>>>>> recovering from a node failure in larger clusters.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> James, I have given this quite a bit of thought... and whilst it
> >>>>> was initially appealing and seemed a sensible extension of my
> >>>>> train of thought, I have not been able to find any advantage in
> >>>>> splitting one nodes state into mutiple buckets....
> >>>>>
> >>>>> If a node joins or leaves, you still have exactly the same amount
> >>>>> of state to shift around the cluster.
> >>>>>
> >>>>> If you back up your sessions off-node, then whether these are all
> >>>>> on one backup node, or spread over 10 makes no difference, since
> >>>>> in the first case if you lose the backup node you have to shift
> >>>>> 100% x 1 nodes state. In the second case you have to shift 10% x
> >>>>> 10 nodes state (since the backup node will be carrying 10% of the
> >>>>> state of another 9 nodes as well as your own). Initially it looks
> >>>>> more resilient but...
> >>>>>
> >>>>> So I am sticking, by virtue of Occam's razor, to the simpler
> >>>>> approach for them moment, until someone can draw attention to a
> >>>>> situation where the extra complexity of a higher granularity
> >>>>> replication strategy is worth the gain.
> >>>>>
> >>>>>
> >>>>> Thinking about it, my current design is probably hybrid - since
> >>>>> whilst a nodes state is all held in a single bucket, individual
> >>>>> sessions may be migrated out of that bucket and into another one
> >>>>> on another node. So it is replication granularity that is set to
> >>>>> node-level, but migration granularity is at session level. I guess
> >>>>> you are suggesting that a bucket is somewhere between the two of
> >>>>> these and is the level at which both are replicated and migrated
?
> >>>>> I'll give it some more thought :-)
> >>>>>
> >>>>>
> >>>>> Jules
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> - I have the feeling that one can not configure a preferred
> >>>>>>> replication group for primary sessions of a specific node:
if
> >>>>>>> four nodes are available, I would like to configure that
> >>>>>>> sessions of the first node should be replicated by the third
> >>>>>>> node, if available, or the fourth one.
> >>>>>>>
> >>>>>>> - Is it not an overhead to have b-1 replica? AFAIK, a single
> >>>>>>> secondary should be enough.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> It all depends on your risk profile I suppose. I backup is
> >>>>>> usually enough but you may want 2 for extra resilience -
> >>>>>> especially as one of those could be in a separate DR zone for
> >>>>>> really serious fail-over scenarios.
> >>>>>>
> >>>>>> James
> >>>>>> -------
> >>>>>> http://radio.weblogs.com/0112098/
> >>>>>>
> >>>>>
> >>>>>
> >>>>> -- 
> >>>>> /*************************************
> >>>>> * Jules Gosnell
> >>>>> * Partner
> >>>>> * Core Developers Network (Europe)
> >>>>> * http://www.coredevelopers.net
> >>>>> *************************************/
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>> /*************************
> >>>>  * Dain Sundstrom
> >>>>  * Partner
> >>>>  * Core Developers Network
> >>>>  *************************/
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
> >
> > ------------------------------------------------------------------------
> >
>
>
> -- 
> /*************************************
>  * Jules Gosnell
>  * Partner
>  * Core Developers Network (Europe)
>  * http://www.coredevelopers.net
>  *************************************/
>
>
>



Mime
View raw message