helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: TaskRebalancer
Date Mon, 20 Jan 2014 06:00:11 GMT
Actually that makes a lot of sense. Let me look at that.


On Sun, Jan 19, 2014 at 8:49 PM, Kanak Biscuitwala <kanak.b@hotmail.com>wrote:

>
> This sounds a lot like what we did in AutoRebalanceStrategy. There's an
> interface called ReplicaPlacementScheme that the algorithm calls into, and
> a DefaultPlacementScheme that just does evenly balanced assignment.
>
> The simplest thing we could do is have a task rebalancer config and set a
> switch for which placement scheme to use. The current task rebalancer
> already has to specify things like the DAG, so this could just be another
> field to add on.
>
> > Date: Sun, 19 Jan 2014 13:14:33 -0800
> > Subject: Re: TaskRebalancer
> > From: g.kishore@gmail.com
> > To: dev@helix.apache.org
> > CC: dev@helix.incubator.apache.org; user@helix.incubator.apache.org
>
> >
> > Thanks Jason, I was looking at the rebalancer. Looks like target resource
> > is mandatory. What do you suggest is the right way to make target
> resource
> > optional.
> >
> > This is my understanding of what task rebalancer is doing today.
> >
> > It assumes that the system is already hosting a resource something like a
> > database, index etc. Now one can use the task framework to launch
> arbitrary
> > tasks on nodes hosting these resources. For example lets say there is a
> > database MyDB with 3 partitions and 2 replicas and using Master Slave
> state
> > model and 3 nodes N1 N2 N3. In a happy state the cluster might look like
> > this
> >
> > {
> > "id":"MyDB",
> > "mapFields":{
> > "MyDB_0":{
> > "N1":"MASTER",
> > "N2":"SLAVE"
> > },
> > "MyDB_1":{
> > "N2":"MASTER",
> > "N3":"SLAVE"
> > },
> > "MyDB_2":{
> > "N1":"SLAVE",
> > "N3":"MASTER"
> > }
> > }
> > }
> >
> > Lets say one wants to take backup of these databases but run only the
> > SLAVEs. One can define the back up task and launch 3 back up tasks (one
> for
> > each partition) only on SLAVEs.
> >
> > What we have currently works perfectly for this scenario. One has to
> simply
> > define the target resource and state for the backup tasks and they will
> be
> > launched in appropriate place. So in this scenario, back task for
> > partitions 0,1,2 will be launched at N2, N3, and N1.
> >
> > But what if the tasks dont have any target resource and can be run on any
> > node N1 N2 or N3 and the only requirement is distribute the tasks evenly.
> >
> > We should decouple the logic of where a task is placed from the logic of
> > distributing the tasks. For example, we can abstract out the placement
> > constraint from the rebalancer logic. So we can have a placement provider
> > that computes placement randomly and one that computes placement based on
> > another resource. Probably another one that computes placement based on
> > data locality.
> >
> > What is the right way to approach this ?
> >
> > thanks,
> > Kishore G
> >
> >
> > On Sun, Jan 19, 2014 at 10:12 AM, Zhen Zhang <nehzgnahz@gmail.com>
> wrote:
> >
> > > TestTaskRebalancer and TestTaskRebalancerStopResume are examples.
> > >
> > > Thanks,
> > > Jason
> > >
> > >
> > > On Sun, Jan 19, 2014 at 9:20 AM, kishore g <g.kishore@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to use TaskRebalancer but not able to understand how it
> > > works,
> > > > is there any example I can try?
> > > >
> > > > thanks,
> > > > Kishore G
> > > >
> > >
>

Mime
View raw message