Mailing-List: contact user-help@helix.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@helix.apache.org
Received-SPF: pass (athena.apache.org: domain of g.kishore@gmail.com
 designates 209.85.212.171 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <BAY173-W45E5C0E3F4FDDA088A390BEDA50@phx.gbl>
References: 
 <CABaj-QZEkrHfJT_qrgccW5dqH0HnVnt8VxQn25imbcwjKEsR-w@mail.gmail.com>
	<CALyTCcFbbW_vJxrWfxsirToAv183vjwY83Q-Miijmfrp4nFjPg@mail.gmail.com>
	<CABaj-QYQZ+=0O-mEQ4R7H4vtUvGDbsrTdvU-h6eW_RHkMidOGw@mail.gmail.com>
	<BAY173-W45E5C0E3F4FDDA088A390BEDA50@phx.gbl>
Date: Sun, 19 Jan 2014 22:00:11 -0800
Message-ID: 
 <CABaj-QbqeDqAh9Rkj+J3SvqWtgVQu5ptpA2uJCmn_fQPuVdnAA@mail.gmail.com>
Subject: Re: TaskRebalancer
From: kishore g <g.kishore@gmail.com>
To: user@helix.apache.org
Content-Type: multipart/alternative; boundary=001a11c3709ccfa61004f0609ae4

--001a11c3709ccfa61004f0609ae4
Content-Type: text/plain; charset=ISO-8859-1

Actually that makes a lot of sense. Let me look at that.


On Sun, Jan 19, 2014 at 8:49 PM, Kanak Biscuitwala <kanak.b@hotmail.com>wrote:

>
> This sounds a lot like what we did in AutoRebalanceStrategy. There's an
> interface called ReplicaPlacementScheme that the algorithm calls into, and
> a DefaultPlacementScheme that just does evenly balanced assignment.
>
> The simplest thing we could do is have a task rebalancer config and set a
> switch for which placement scheme to use. The current task rebalancer
> already has to specify things like the DAG, so this could just be another
> field to add on.
>
> > Date: Sun, 19 Jan 2014 13:14:33 -0800
> > Subject: Re: TaskRebalancer
> > From: g.kishore@gmail.com
> > To: dev@helix.apache.org
> > CC: dev@helix.incubator.apache.org; user@helix.incubator.apache.org
>
> >
> > Thanks Jason, I was looking at the rebalancer. Looks like target resource
> > is mandatory. What do you suggest is the right way to make target
> resource
> > optional.
> >
> > This is my understanding of what task rebalancer is doing today.
> >
> > It assumes that the system is already hosting a resource something like a
> > database, index etc. Now one can use the task framework to launch
> arbitrary
> > tasks on nodes hosting these resources. For example lets say there is a
> > database MyDB with 3 partitions and 2 replicas and using Master Slave
> state
> > model and 3 nodes N1 N2 N3. In a happy state the cluster might look like
> > this
> >
> > {
> > "id":"MyDB",
> > "mapFields":{
> > "MyDB_0":{
> > "N1":"MASTER",
> > "N2":"SLAVE"
> > },
> > "MyDB_1":{
> > "N2":"MASTER",
> > "N3":"SLAVE"
> > },
> > "MyDB_2":{
> > "N1":"SLAVE",
> > "N3":"MASTER"
> > }
> > }
> > }
> >
> > Lets say one wants to take backup of these databases but run only the
> > SLAVEs. One can define the back up task and launch 3 back up tasks (one
> for
> > each partition) only on SLAVEs.
> >
> > What we have currently works perfectly for this scenario. One has to
> simply
> > define the target resource and state for the backup tasks and they will
> be
> > launched in appropriate place. So in this scenario, back task for
> > partitions 0,1,2 will be launched at N2, N3, and N1.
> >
> > But what if the tasks dont have any target resource and can be run on any
> > node N1 N2 or N3 and the only requirement is distribute the tasks evenly.
> >
> > We should decouple the logic of where a task is placed from the logic of
> > distributing the tasks. For example, we can abstract out the placement
> > constraint from the rebalancer logic. So we can have a placement provider
> > that computes placement randomly and one that computes placement based on
> > another resource. Probably another one that computes placement based on
> > data locality.
> >
> > What is the right way to approach this ?
> >
> > thanks,
> > Kishore G
> >
> >
> > On Sun, Jan 19, 2014 at 10:12 AM, Zhen Zhang <nehzgnahz@gmail.com>
> wrote:
> >
> > > TestTaskRebalancer and TestTaskRebalancerStopResume are examples.
> > >
> > > Thanks,
> > > Jason
> > >
> > >
> > > On Sun, Jan 19, 2014 at 9:20 AM, kishore g <g.kishore@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to use TaskRebalancer but not able to understand how it
> > > works,
> > > > is there any example I can try?
> > > >
> > > > thanks,
> > > > Kishore G
> > > >
> > >
>

--001a11c3709ccfa61004f0609ae4
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Actually that makes a lot of sense. Let me look at that. <=
br></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On S=
un, Jan 19, 2014 at 8:49 PM, Kanak Biscuitwala <span dir=3D"ltr">&lt;<a hre=
f=3D"mailto:kanak.b@hotmail.com" target=3D"_blank">kanak.b@hotmail.com</a>&=
gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">


<div><div dir=3D"ltr"><br>This sounds a lot like what we did in AutoRebalan=
ceStrategy. There&#39;s an interface called ReplicaPlacementScheme that the=
 algorithm calls into, and a DefaultPlacementScheme that just does evenly b=
alanced assignment.<div>
<br></div><div>The simplest thing we could do is have a task rebalancer con=
fig and set a switch for which placement scheme to use. The current task re=
balancer already has to specify things like the DAG, so this could just be =
another field to add on.<div>
<br></div><div><div>&gt; Date: Sun, 19 Jan 2014 13:14:33 -0800<br>&gt; Subj=
ect: Re: TaskRebalancer<br>&gt; From: <a href=3D"mailto:g.kishore@gmail.com=
" target=3D"_blank">g.kishore@gmail.com</a><br>&gt; To: <a href=3D"mailto:d=
ev@helix.apache.org" target=3D"_blank">dev@helix.apache.org</a><br>
&gt; CC: <a href=3D"mailto:dev@helix.incubator.apache.org" target=3D"_blank=
">dev@helix.incubator.apache.org</a>; <a href=3D"mailto:user@helix.incubato=
r.apache.org" target=3D"_blank">user@helix.incubator.apache.org</a><div><di=
v class=3D"h5">
<br>&gt; <br>&gt; Thanks Jason, I was looking at the rebalancer. Looks like=
 target resource<br>&gt; is mandatory. What do you suggest is the right way=
 to make target resource<br>&gt; optional.<br>&gt; <br>&gt; This is my unde=
rstanding of what task rebalancer is doing today.<br>
&gt; <br>&gt; It assumes that the system is already hosting a resource some=
thing like a<br>&gt; database, index etc. Now one can use the task framewor=
k to launch arbitrary<br>&gt; tasks on nodes hosting these resources. For e=
xample lets say there is a<br>
&gt; database MyDB with 3 partitions and 2 replicas and using Master Slave =
state<br>&gt; model and 3 nodes N1 N2 N3. In a happy state the cluster migh=
t look like<br>&gt; this<br>&gt; <br>&gt; {<br>&gt;   &quot;id&quot;:&quot;=
MyDB&quot;,<br>
&gt;   &quot;mapFields&quot;:{<br>&gt;     &quot;MyDB_0&quot;:{<br>&gt;    =
   &quot;N1&quot;:&quot;MASTER&quot;,<br>&gt;       &quot;N2&quot;:&quot;SL=
AVE&quot;<br>&gt;     },<br>&gt;     &quot;MyDB_1&quot;:{<br>&gt;       &qu=
ot;N2&quot;:&quot;MASTER&quot;,<br>
&gt;       &quot;N3&quot;:&quot;SLAVE&quot;<br>&gt;     },<br>&gt;     &quo=
t;MyDB_2&quot;:{<br>&gt;       &quot;N1&quot;:&quot;SLAVE&quot;,<br>&gt;   =
    &quot;N3&quot;:&quot;MASTER&quot;<br>&gt;     }<br>&gt;   }<br>&gt; }<b=
r>
&gt; <br>&gt; Lets say one wants to take backup of these databases but run =
only the<br>&gt; SLAVEs. One can define the back up task and launch 3 back =
up tasks (one for<br>&gt; each partition) only on SLAVEs.<br>&gt; <br>&gt; =
What we have currently works perfectly for this scenario. One has to simply=
<br>
&gt; define the target resource and state for the backup tasks and they wil=
l be<br>&gt; launched in appropriate place. So in this scenario, back task =
for<br>&gt; partitions 0,1,2 will be launched at N2, N3, and N1.<br>&gt; <b=
r>
&gt; But what if the tasks dont have any target resource and can be run on =
any<br>&gt; node N1 N2 or N3 and the only requirement is distribute the tas=
ks evenly.<br>&gt; <br>&gt; We should decouple the logic of where a task is=
 placed from the logic of<br>
&gt; distributing the tasks. For example, we can abstract out the placement=
<br>&gt; constraint from the rebalancer logic. So we can have a placement p=
rovider<br>&gt; that computes placement randomly and one that computes plac=
ement based on<br>
&gt; another resource. Probably another one that computes placement based o=
n<br>&gt; data locality.<br>&gt; <br>&gt; What is the right way to approach=
 this ?<br>&gt; <br>&gt; thanks,<br>&gt; Kishore G<br>&gt; <br>&gt; <br>
&gt; On Sun, Jan 19, 2014 at 10:12 AM, Zhen Zhang &lt;<a href=3D"mailto:neh=
zgnahz@gmail.com" target=3D"_blank">nehzgnahz@gmail.com</a>&gt; wrote:<br>&=
gt; <br>&gt; &gt; TestTaskRebalancer and TestTaskRebalancerStopResume are e=
xamples.<br>
&gt; &gt;<br>&gt; &gt; Thanks,<br>&gt; &gt; Jason<br>&gt; &gt;<br>&gt; &gt;=
<br>&gt; &gt; On Sun, Jan 19, 2014 at 9:20 AM, kishore g &lt;<a href=3D"mai=
lto:g.kishore@gmail.com" target=3D"_blank">g.kishore@gmail.com</a>&gt; wrot=
e:<br>
&gt; &gt;<br>&gt; &gt; &gt; Hi,<br>&gt; &gt; &gt;<br>&gt; &gt; &gt; I am tr=
ying to use TaskRebalancer but not able to understand how it<br>&gt; &gt; w=
orks,<br>&gt; &gt; &gt; is there any example I can try?<br>&gt; &gt; &gt;<b=
r>
&gt; &gt; &gt; thanks,<br>&gt; &gt; &gt; Kishore G<br>&gt; &gt; &gt;<br>&gt=
; &gt;<br></div></div></div></div></div> 		 	   		  </div></div>
</blockquote></div><br></div>

--001a11c3709ccfa61004f0609ae4--