helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kanak Biscuitwala <kana...@hotmail.com>
Subject RE: TaskRebalancer
Date Mon, 20 Jan 2014 04:49:48 GMT

This sounds a lot like what we did in AutoRebalanceStrategy. There's an interface called ReplicaPlacementScheme
that the algorithm calls into, and a DefaultPlacementScheme that just does evenly balanced
assignment.
The simplest thing we could do is have a task rebalancer config and set a switch for which
placement scheme to use. The current task rebalancer already has to specify things like the
DAG, so this could just be another field to add on.
> Date: Sun, 19 Jan 2014 13:14:33 -0800
> Subject: Re: TaskRebalancer
> From: g.kishore@gmail.com
> To: dev@helix.apache.org
> CC: dev@helix.incubator.apache.org; user@helix.incubator.apache.org
> 
> Thanks Jason, I was looking at the rebalancer. Looks like target resource
> is mandatory. What do you suggest is the right way to make target resource
> optional.
> 
> This is my understanding of what task rebalancer is doing today.
> 
> It assumes that the system is already hosting a resource something like a
> database, index etc. Now one can use the task framework to launch arbitrary
> tasks on nodes hosting these resources. For example lets say there is a
> database MyDB with 3 partitions and 2 replicas and using Master Slave state
> model and 3 nodes N1 N2 N3. In a happy state the cluster might look like
> this
> 
> {
>   "id":"MyDB",
>   "mapFields":{
>     "MyDB_0":{
>       "N1":"MASTER",
>       "N2":"SLAVE"
>     },
>     "MyDB_1":{
>       "N2":"MASTER",
>       "N3":"SLAVE"
>     },
>     "MyDB_2":{
>       "N1":"SLAVE",
>       "N3":"MASTER"
>     }
>   }
> }
> 
> Lets say one wants to take backup of these databases but run only the
> SLAVEs. One can define the back up task and launch 3 back up tasks (one for
> each partition) only on SLAVEs.
> 
> What we have currently works perfectly for this scenario. One has to simply
> define the target resource and state for the backup tasks and they will be
> launched in appropriate place. So in this scenario, back task for
> partitions 0,1,2 will be launched at N2, N3, and N1.
> 
> But what if the tasks dont have any target resource and can be run on any
> node N1 N2 or N3 and the only requirement is distribute the tasks evenly.
> 
> We should decouple the logic of where a task is placed from the logic of
> distributing the tasks. For example, we can abstract out the placement
> constraint from the rebalancer logic. So we can have a placement provider
> that computes placement randomly and one that computes placement based on
> another resource. Probably another one that computes placement based on
> data locality.
> 
> What is the right way to approach this ?
> 
> thanks,
> Kishore G
> 
> 
> On Sun, Jan 19, 2014 at 10:12 AM, Zhen Zhang <nehzgnahz@gmail.com> wrote:
> 
> > TestTaskRebalancer and TestTaskRebalancerStopResume are examples.
> >
> > Thanks,
> > Jason
> >
> >
> > On Sun, Jan 19, 2014 at 9:20 AM, kishore g <g.kishore@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I am trying to use TaskRebalancer but not able to understand how it
> > works,
> > > is there any example I can try?
> > >
> > > thanks,
> > > Kishore G
> > >
> >
 		 	   		  
Mime
View raw message