Mailing-List: contact user-help@helix.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@helix.incubator.apache.org
Received-SPF: pass (nike.apache.org: domain of kanak.b@hotmail.com designates
 65.54.190.214 as permitted sender)
Message-ID: <BAY173-W3D0E66A0006D1E5541D5DEDCB0@phx.gbl>
Content-Type: multipart/alternative;
	boundary="_d8bd00c6-7252-4e96-a55f-e24739b4f988_"
From: Kanak Biscuitwala <kanak.b@hotmail.com>
To: "user@helix.incubator.apache.org" <user@helix.incubator.apache.org>
CC: "vusilly@gmail.com" <vusilly@gmail.com>
Subject: RE: helix rebalancing for multiple resources
Date: Wed, 1 Jan 2014 21:26:00 -0800
Importance: Normal
In-Reply-To: 
 <CALT6Rfddz==R8WjtvhK26rftGAsGn5pMYqrmn1gGiNtUsNOmSA@mail.gmail.com>
References: 
 <CALT6RfeRhKsq3t0jR3frmVp2dHpm2Et392BsSexBiU5HRATPQg@mail.gmail.com>,<BAY173-W15A8B09D8478C95AC334CCEDCB0@phx.gbl>,<CALT6Rfddz==R8WjtvhK26rftGAsGn5pMYqrmn1gGiNtUsNOmSA@mail.gmail.com>
MIME-Version: 1.0

--_d8bd00c6-7252-4e96-a55f-e24739b4f988_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Not sure I follow. Is your problem that Helix creates the cluster as a chil=
d of the root node (e.g. /clusterName) while you would like it to be someth=
ing else (e.g. /path/to/custom/root/clusterName)?
I'm also unclear about what you mean about discovering ZK servers. How woul=
d you be able to leverage a path in ZK to discover ZK?
Right now Helix requires long-running ZK servers and assumes that you as th=
e application know how to connect to them (i.e. you know the hosts/ports). =
If that assumption holds=2C I believe it should work independent of deploym=
ent (cloud provider=2C private datacenter=2C or anything else).
I'm not really sure what you're trying to adapt with the adapter. Could you=
 clarify?
I'm on #apachehelix on freenode if that's more convenient.
Thanks=2CKanak
Date: Wed=2C 1 Jan 2014 21:07:36 -0800
Subject: Re: helix rebalancing for multiple resources
From: vusilly@gmail.com
To: kanak.b@hotmail.com
CC: user@helix.incubator.apache.org

Yes=2C that is helpful.
Another big requirement that I forgot to mention is running this on a cloud=
 service provider=2C like AWS.  We already have shared zookeeper setup ther=
e with our own client.  Ideally=2C I could inject a custom client for helix=
 to use for operations=2C where the main differences we would require is a =
custom top level path (/appname) that is required by our client=2C and that=
 would handle discovering and connecting to the zookeeper servers.=0A=

Is support for AWS and other cloud providers on the roadmap?
Also=2C for the short-term=2C do you see any complications in us creating a=
n adapter client that helix would use to bridge that gap?  Or would it be m=
uch more complicated than I am hoping for?=0A=

Thanks
Vu


On Wed=2C Jan 1=2C 2014 at 8:36 PM=2C Kanak Biscuitwala <kanak.b@hotmail.co=
m> wrote:
=0A=
=0A=
=0A=
=0A=
Resending since I realized you might not be registered on the user list yet=
. By the way=2C for your specific use case=2C I would personally lean towar=
ds the CustomCodeRunner along with the CUSTOMIZED IdealState rebalance mode=
. Then when nodes enter and exit=2C you can change the IdealState yourself =
and Helix will fire the transitions. This will most easily give you the pol=
icy-driven global view you're looking for.=0A=

---

Hi Vu=2C=0A=
Your understanding is basically correct. The controller will rebalance each=
 resource in sequence=2C at most one controller pipeline execution is going=
 on at any one time=2C and there is no parallelism within the controller pi=
peline (other than batch reading and writing the cluster at the beginning a=
nd end).=0A=
Here are some things that may be of use to know:=0A=
1. You can plug in your own code to help decide how to rebalance your clust=
er in one of two ways:=0A=
   - Using the CustomCodeRunner on the participant side so that you can upd=
ate the IdealState whenever the cluster changes: https://github.com/apache/=
incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apach=
e/helix/participant/HelixCustomCodeRunner.java?source=3Dc=0A=
   - Implementing a Rebalancer with USER_DEFINED rebalance mode: https://gi=
thub.com/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/mai=
n/java/org/apache/helix/controller/rebalancer/Rebalancer.java?source=3Dc=0A=

In either case=2C Helix will still fire transitions according to constraint=
s and react to node entry/exit.=0A=

2. Helix supports adding tags to nodes (via InstanceConfig)=2C and specifyi=
ng tags in each resource IdealState. Then=2C a tagged resource will only be=
 assigned to nodes with the corresponding tag present.=0A=

3. You can specify max partitions per resource per node in the IdealState o=
f the resource (this should be 1 in your case)=0A=

4. You can combine any of the above 3 if that makes sense (e.g. change node=
 tags whenever a cluster change happens=2C thus constraining how Helix will=
 assign everything)=0A=

Is that helpful?=0A=

KanakDate: Wed=2C 1 Jan 2014 20:31:56 -0800
Subject: helix rebalancing for multiple resources
From: vusilly@gmail.com
=0A=
To: user@helix.incubator.apache.org

Hi=2C=0A=
We're looking into creating something like a distributed task processing cl=
uster.  We already have existing code for the processing task on a single h=
ost.  So that results in stronger restrictions on what we're doing:=0A=
=0A=
- partitioned task A: single partition needs to be assigned to a single nod=
e and a node may have only a single partitioned task=0A=
=0A=
- another set of non-partitioned tasks (e.g. B=2C C=2C D) also needs to be =
assigned nodes=2C but it would be most efficient of those tasks are assigne=
d to separate nodes so any single node has at most 1 task (either partition=
ed A=2C B=2C C=2C D=2C etc.)=0A=
=0A=

This seems to require a global view of a tasks.  However=2C from the exampl=
es and the Rebalancer code=2C it appears that the resource mappings/assignm=
ents are independent of each another.  Is that correct?  If so=2C is Apache=
 Helix the right framework for us=2C given the requirements above?=0A=
=0A=

I saw that it might be possible to find the current resource assignment for=
 other resources during the rebalancing calculation methods=2C but I was th=
en concerned about concurrency issues--if the rebalance for task A and reba=
lance for B was computed at the same time.=0A=
=0A=

Thanks for any and all feedback.
=0A=
=0A=
Vu Nguyen 		 	   		  =0A=

 		 	   		  =

--_d8bd00c6-7252-4e96-a55f-e24739b4f988_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<style><!--
.hmmessage P
{
margin:0px=3B
padding:0px
}
body.hmmessage
{
font-size: 12pt=3B
font-family:Calibri
}
--></style></head>
<body class=3D'hmmessage'><div dir=3D'ltr'>Not sure I follow. Is your probl=
em that Helix creates the cluster as a child of the root node (e.g. /cluste=
rName) while you would like it to be something else (e.g. /path/to/custom/r=
oot/clusterName)?<div><br></div><div>I'm also unclear about what you mean a=
bout discovering ZK servers. How would you be able to leverage a path in ZK=
 to discover ZK?</div><div><br></div><div>Right now Helix requires long-run=
ning ZK servers and assumes that you as the application know how to connect=
 to them (i.e. you know the hosts/ports). If that assumption holds=2C I bel=
ieve it should work independent of deployment (cloud provider=2C private da=
tacenter=2C or anything else).</div><div><br></div><div>I'm not really sure=
 what you're trying to adapt with the adapter. Could you clarify?</div><div=
><br></div><div>I'm on #apachehelix on freenode if that's more convenient.<=
/div><div><br></div><div>Thanks=2C</div><div>Kanak<br><div><hr id=3D"stopSp=
elling">Date: Wed=2C 1 Jan 2014 21:07:36 -0800<br>Subject: Re: helix rebala=
ncing for multiple resources<br>From: vusilly@gmail.com<br>To: kanak.b@hotm=
ail.com<br>CC: user@helix.incubator.apache.org<br><br><div dir=3D"ltr">Yes=
=2C that is helpful.<div><br></div><div>Another big requirement that I forg=
ot to mention is running this on a cloud service provider=2C like AWS. &nbs=
p=3BWe already have shared zookeeper setup there with our own client. &nbsp=
=3BIdeally=2C I could inject a custom client for helix to use for operation=
s=2C where the main differences we would require is a custom top level path=
 (/appname) that is required by our client=2C and that would handle discove=
ring and connecting to the zookeeper servers.</div>=0A=
<div><br></div><div>Is support for AWS and other cloud providers on the roa=
dmap?</div><div><br></div><div>Also=2C for the short-term=2C do you see any=
 complications in us creating an adapter client that helix would use to bri=
dge that gap? &nbsp=3BOr would it be much more complicated than I am hoping=
 for?</div>=0A=
<div><br></div><div>Thanks</div><div><br></div><div>Vu</div><div><br><div><=
br></div><div><br><div><br></div></div></div></div><div class=3D"ecxgmail_e=
xtra"><br><br><div class=3D"ecxgmail_quote">On Wed=2C Jan 1=2C 2014 at 8:36=
 PM=2C Kanak Biscuitwala <span dir=3D"ltr">&lt=3B<a href=3D"mailto:kanak.b@=
hotmail.com" target=3D"_blank" onclick=3D"window.open('https://mail.google.=
com/mail/?view=3Dcm&amp=3Btf=3D1&amp=3Bto=3Dkanak.b@hotmail.com&amp=3Bcc=3D=
&amp=3Bbcc=3D&amp=3Bsu=3D&amp=3Bbody=3D'=2C'_blank')=3Breturn false=3B">kan=
ak.b@hotmail.com</a>&gt=3B</span> wrote:<br>=0A=
<blockquote class=3D"ecxgmail_quote" style=3D"border-left:1px #ccc solid=3B=
padding-left:1ex=3B">=0A=
=0A=
=0A=
<div><div dir=3D"ltr">Resending since I realized you might not be registere=
d on the user list yet. By the way=2C for your specific use case=2C I would=
 personally lean towards the CustomCodeRunner along with the CUSTOMIZED Ide=
alState rebalance mode. Then when nodes enter and exit=2C you can change th=
e IdealState yourself and Helix will fire the transitions. This will most e=
asily give you the policy-driven global view you're looking for.<div>=0A=
<br></div><div>---<br><span style=3D"color:rgb(68=2C68=2C68)=3Bfont-family:=
Calibri=3Bline-height:22.719999313354492px=3B"><br></span><div><span style=
=3D"color:rgb(68=2C68=2C68)=3Bfont-family:Calibri=3Bline-height:22.71999931=
3354492px=3B">Hi Vu=2C</span><br style=3D"line-height:22.719999313354492px=
=3Bcolor:rgb(68=2C68=2C68)=3Bfont-family:Calibri=3B">=0A=
<br style=3D"line-height:22.719999313354492px=3Bcolor:rgb(68=2C68=2C68)=3Bf=
ont-family:Calibri=3B"><span style=3D"color:rgb(68=2C68=2C68)=3Bfont-family=
:Calibri=3Bline-height:22.719999313354492px=3B">Your understanding is basic=
ally correct. The controller will rebalance each resource in sequence=2C at=
 most one controller pipeline execution is going on at any one time=2C and =
there is no parallelism within the controller pipeline (other than batch re=
ading and writing the cluster at the beginning and end).</span><br style=3D=
"line-height:22.719999313354492px=3Bcolor:rgb(68=2C68=2C68)=3Bfont-family:C=
alibri=3B">=0A=
<br style=3D"line-height:22.719999313354492px=3Bcolor:rgb(68=2C68=2C68)=3Bf=
ont-family:Calibri=3B"><span style=3D"color:rgb(68=2C68=2C68)=3Bfont-family=
:Calibri=3Bline-height:22.719999313354492px=3B">Here are some things that m=
ay be of use to know:</span><br style=3D"line-height:22.719999313354492px=
=3Bcolor:rgb(68=2C68=2C68)=3Bfont-family:Calibri=3B">=0A=
<br style=3D"line-height:22.719999313354492px=3Bcolor:rgb(68=2C68=2C68)=3Bf=
ont-family:Calibri=3B"><span style=3D"color:rgb(68=2C68=2C68)=3Bfont-family=
:Calibri=3Bline-height:22.719999313354492px=3B">1. You can plug in your own=
 code to help decide how to rebalance your cluster in one of two ways:</spa=
n><div style=3D"line-height:22.719999313354492px=3Bcolor:rgb(68=2C68=2C68)=
=3Bfont-family:Calibri=3B">=0A=
&nbsp=3B &nbsp=3B- Using the CustomCodeRunner on the participant side so th=
at you can update the IdealState whenever the cluster changes:&nbsp=3B<a hr=
ef=3D"https://github.com/apache/incubator-helix/blob/helix-0.6.2-release/he=
lix-core/src/main/java/org/apache/helix/participant/HelixCustomCodeRunner.j=
ava?source=3Dc" style=3D"font-weight:inherit=3Bcolor:rgb(0=2C104=2C207)=3B"=
 target=3D"_blank">https://github.com/apache/incubator-helix/blob/helix-0.6=
.2-release/helix-core/src/main/java/org/apache/helix/participant/HelixCusto=
mCodeRunner.java?source=3Dc</a></div>=0A=
<div style=3D"line-height:22.719999313354492px=3Bcolor:rgb(68=2C68=2C68)=3B=
font-family:Calibri=3B">&nbsp=3B &nbsp=3B- Implementing a Rebalancer with U=
SER_DEFINED rebalance mode:&nbsp=3B<a href=3D"https://github.com/apache/inc=
ubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apache/h=
elix/controller/rebalancer/Rebalancer.java?source=3Dc" style=3D"font-weight=
:inherit=3Bcolor:rgb(0=2C104=2C207)=3B" target=3D"_blank">https://github.co=
m/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/=
org/apache/helix/controller/rebalancer/Rebalancer.java?source=3Dc</a></div>=
=0A=
<div style=3D"line-height:22.719999313354492px=3Bcolor:rgb(68=2C68=2C68)=3B=
font-family:Calibri=3B"><br></div><div style=3D"line-height:22.719999313354=
492px=3Bcolor:rgb(68=2C68=2C68)=3Bfont-family:Calibri=3B">In either case=2C=
 Helix will still fire transitions according to constraints and react to no=
de entry/exit.</div>=0A=
<div style=3D"line-height:22.719999313354492px=3Bcolor:rgb(68=2C68=2C68)=3B=
font-family:Calibri=3B"><br></div><div style=3D"line-height:22.719999313354=
492px=3Bcolor:rgb(68=2C68=2C68)=3Bfont-family:Calibri=3B">2. Helix supports=
 adding tags to nodes (via InstanceConfig)=2C and specifying tags in each r=
esource IdealState. Then=2C a tagged resource will only be assigned to node=
s with the corresponding tag present.</div>=0A=
<div style=3D"line-height:22.719999313354492px=3Bcolor:rgb(68=2C68=2C68)=3B=
font-family:Calibri=3B"><br></div><div style=3D"line-height:22.719999313354=
492px=3Bcolor:rgb(68=2C68=2C68)=3Bfont-family:Calibri=3B">3. You can specif=
y max partitions per resource per node in the IdealState of the resource (t=
his should be 1 in your case)</div>=0A=
<div style=3D"line-height:22.719999313354492px=3Bcolor:rgb(68=2C68=2C68)=3B=
font-family:Calibri=3B"><br></div><div style=3D"line-height:22.719999313354=
492px=3Bcolor:rgb(68=2C68=2C68)=3Bfont-family:Calibri=3B">4. You can combin=
e any of the above 3 if that makes sense (e.g. change node tags whenever a =
cluster change happens=2C thus constraining how Helix will assign everythin=
g)</div>=0A=
<div style=3D"line-height:22.719999313354492px=3Bcolor:rgb(68=2C68=2C68)=3B=
font-family:Calibri=3B"><br></div><div style=3D"line-height:22.719999313354=
492px=3Bcolor:rgb(68=2C68=2C68)=3Bfont-family:Calibri=3B">Is that helpful?<=
/div><div style=3D"line-height:22.719999313354492px=3Bcolor:rgb(68=2C68=2C6=
8)=3Bfont-family:Calibri=3B">=0A=
<br></div><div style=3D"line-height:22.719999313354492px=3Bcolor:rgb(68=2C6=
8=2C68)=3Bfont-family:Calibri=3B">Kanak</div><div><hr>Date: Wed=2C 1 Jan 20=
14 20:31:56 -0800<br>Subject: helix rebalancing for multiple resources<br>F=
rom: <a href=3D"mailto:vusilly@gmail.com" target=3D"_blank" onclick=3D"wind=
ow.open('https://mail.google.com/mail/?view=3Dcm&amp=3Btf=3D1&amp=3Bto=3Dvu=
silly@gmail.com&amp=3Bcc=3D&amp=3Bbcc=3D&amp=3Bsu=3D&amp=3Bbody=3D'=2C'_bla=
nk')=3Breturn false=3B">vusilly@gmail.com</a><br>=0A=
To: <a href=3D"mailto:user@helix.incubator.apache.org" target=3D"_blank" on=
click=3D"window.open('https://mail.google.com/mail/?view=3Dcm&amp=3Btf=3D1&=
amp=3Bto=3Duser@helix.incubator.apache.org&amp=3Bcc=3D&amp=3Bbcc=3D&amp=3Bs=
u=3D&amp=3Bbody=3D'=2C'_blank')=3Breturn false=3B">user@helix.incubator.apa=
che.org</a><div><div class=3D"h5"><br><br><div dir=3D"ltr"><span style=3D"f=
ont-family:arial=2Csans-serif=3Bfont-size:13px=3B">Hi=2C</span><div style=
=3D"font-family:arial=2Csans-serif=3Bfont-size:13px=3B">=0A=
We're looking into creating something like a distributed task processing cl=
uster. &nbsp=3BWe already have existing code for the processing task on a s=
ingle host. &nbsp=3BSo that results in stronger restrictions on what we're =
doing:</div>=0A=
=0A=
<div style=3D"font-family:arial=2Csans-serif=3Bfont-size:13px=3B">- partiti=
oned task A: single partition needs to be assigned to a single node and a n=
ode may have only a single partitioned task</div><div style=3D"font-family:=
arial=2Csans-serif=3Bfont-size:13px=3B">=0A=
=0A=
- another set of non-partitioned tasks (e.g. B=2C C=2C D) also needs to be =
assigned nodes=2C but it would be most efficient of those tasks are assigne=
d to separate nodes so any single node has at most 1 task (either partition=
ed A=2C B=2C C=2C D=2C etc.)</div>=0A=
=0A=
<div style=3D"font-family:arial=2Csans-serif=3Bfont-size:13px=3B"><br></div=
><div style=3D"font-family:arial=2Csans-serif=3Bfont-size:13px=3B">This see=
ms to require a global view of a tasks. &nbsp=3BHowever=2C from the example=
s and the Rebalancer code=2C it appears that the resource mappings/assignme=
nts are independent of each another. &nbsp=3BIs that correct? &nbsp=3BIf so=
=2C is Apache Helix the right framework for us=2C given the requirements ab=
ove?</div>=0A=
=0A=
<div style=3D"font-family:arial=2Csans-serif=3Bfont-size:13px=3B"><br></div=
><div style=3D"font-family:arial=2Csans-serif=3Bfont-size:13px=3B">I saw th=
at it might be possible to find the current resource assignment for other r=
esources during the rebalancing calculation methods=2C but I was then conce=
rned about concurrency issues--if the rebalance for task A and rebalance fo=
r B was computed at the same time.</div>=0A=
=0A=
<div style=3D"font-family:arial=2Csans-serif=3Bfont-size:13px=3B"><br></div=
><div style=3D"font-family:arial=2Csans-serif=3Bfont-size:13px=3B">Thanks f=
or any and all feedback.</div><div style=3D"font-family:arial=2Csans-serif=
=3Bfont-size:13px=3B"><br>=0A=
=0A=
</div><div style=3D"font-family:arial=2Csans-serif=3Bfont-size:13px=3B">Vu =
Nguyen</div><div style=3D"font-family:arial=2Csans-serif=3Bfont-size:13px=
=3B"></div></div></div></div></div></div></div> 		 	   		  </div></div>=0A=
</blockquote></div><br></div></div></div> 		 	   		  </div></body>
</html>=

--_d8bd00c6-7252-4e96-a55f-e24739b4f988_--