Return-Path: X-Original-To: apmail-helix-user-archive@minotaur.apache.org Delivered-To: apmail-helix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 805A410114 for ; Thu, 2 Jan 2014 05:26:34 +0000 (UTC) Received: (qmail 3974 invoked by uid 500); 2 Jan 2014 05:26:32 -0000 Delivered-To: apmail-helix-user-archive@helix.apache.org Received: (qmail 3939 invoked by uid 500); 2 Jan 2014 05:26:31 -0000 Mailing-List: contact user-help@helix.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.incubator.apache.org Delivered-To: mailing list user@helix.incubator.apache.org Received: (qmail 3932 invoked by uid 99); 2 Jan 2014 05:26:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jan 2014 05:26:29 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kanak.b@hotmail.com designates 65.54.190.214 as permitted sender) Received: from [65.54.190.214] (HELO bay0-omc4-s12.bay0.hotmail.com) (65.54.190.214) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jan 2014 05:26:22 +0000 Received: from BAY173-W3 ([65.54.190.199]) by bay0-omc4-s12.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 1 Jan 2014 21:26:01 -0800 X-TMN: [x/NKTq7XQxi50pLRvcGeg/0DtB6H2DHl] X-Originating-Email: [kanak.b@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_d8bd00c6-7252-4e96-a55f-e24739b4f988_" From: Kanak Biscuitwala To: "user@helix.incubator.apache.org" CC: "vusilly@gmail.com" Subject: RE: helix rebalancing for multiple resources Date: Wed, 1 Jan 2014 21:26:00 -0800 Importance: Normal In-Reply-To: References: ,, MIME-Version: 1.0 X-OriginalArrivalTime: 02 Jan 2014 05:26:01.0160 (UTC) FILETIME=[1FB8A480:01CF077B] X-Virus-Checked: Checked by ClamAV on apache.org --_d8bd00c6-7252-4e96-a55f-e24739b4f988_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Not sure I follow. Is your problem that Helix creates the cluster as a chil= d of the root node (e.g. /clusterName) while you would like it to be someth= ing else (e.g. /path/to/custom/root/clusterName)? I'm also unclear about what you mean about discovering ZK servers. How woul= d you be able to leverage a path in ZK to discover ZK? Right now Helix requires long-running ZK servers and assumes that you as th= e application know how to connect to them (i.e. you know the hosts/ports). = If that assumption holds=2C I believe it should work independent of deploym= ent (cloud provider=2C private datacenter=2C or anything else). I'm not really sure what you're trying to adapt with the adapter. Could you= clarify? I'm on #apachehelix on freenode if that's more convenient. Thanks=2CKanak Date: Wed=2C 1 Jan 2014 21:07:36 -0800 Subject: Re: helix rebalancing for multiple resources From: vusilly@gmail.com To: kanak.b@hotmail.com CC: user@helix.incubator.apache.org Yes=2C that is helpful. Another big requirement that I forgot to mention is running this on a cloud= service provider=2C like AWS. We already have shared zookeeper setup ther= e with our own client. Ideally=2C I could inject a custom client for helix= to use for operations=2C where the main differences we would require is a = custom top level path (/appname) that is required by our client=2C and that= would handle discovering and connecting to the zookeeper servers.=0A= Is support for AWS and other cloud providers on the roadmap? Also=2C for the short-term=2C do you see any complications in us creating a= n adapter client that helix would use to bridge that gap? Or would it be m= uch more complicated than I am hoping for?=0A= Thanks Vu On Wed=2C Jan 1=2C 2014 at 8:36 PM=2C Kanak Biscuitwala wrote: =0A= =0A= =0A= =0A= Resending since I realized you might not be registered on the user list yet= . By the way=2C for your specific use case=2C I would personally lean towar= ds the CustomCodeRunner along with the CUSTOMIZED IdealState rebalance mode= . Then when nodes enter and exit=2C you can change the IdealState yourself = and Helix will fire the transitions. This will most easily give you the pol= icy-driven global view you're looking for.=0A= --- Hi Vu=2C=0A= Your understanding is basically correct. The controller will rebalance each= resource in sequence=2C at most one controller pipeline execution is going= on at any one time=2C and there is no parallelism within the controller pi= peline (other than batch reading and writing the cluster at the beginning a= nd end).=0A= Here are some things that may be of use to know:=0A= 1. You can plug in your own code to help decide how to rebalance your clust= er in one of two ways:=0A= - Using the CustomCodeRunner on the participant side so that you can upd= ate the IdealState whenever the cluster changes: https://github.com/apache/= incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apach= e/helix/participant/HelixCustomCodeRunner.java?source=3Dc=0A= - Implementing a Rebalancer with USER_DEFINED rebalance mode: https://gi= thub.com/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/mai= n/java/org/apache/helix/controller/rebalancer/Rebalancer.java?source=3Dc=0A= In either case=2C Helix will still fire transitions according to constraint= s and react to node entry/exit.=0A= 2. Helix supports adding tags to nodes (via InstanceConfig)=2C and specifyi= ng tags in each resource IdealState. Then=2C a tagged resource will only be= assigned to nodes with the corresponding tag present.=0A= 3. You can specify max partitions per resource per node in the IdealState o= f the resource (this should be 1 in your case)=0A= 4. You can combine any of the above 3 if that makes sense (e.g. change node= tags whenever a cluster change happens=2C thus constraining how Helix will= assign everything)=0A= Is that helpful?=0A= KanakDate: Wed=2C 1 Jan 2014 20:31:56 -0800 Subject: helix rebalancing for multiple resources From: vusilly@gmail.com =0A= To: user@helix.incubator.apache.org Hi=2C=0A= We're looking into creating something like a distributed task processing cl= uster. We already have existing code for the processing task on a single h= ost. So that results in stronger restrictions on what we're doing:=0A= =0A= - partitioned task A: single partition needs to be assigned to a single nod= e and a node may have only a single partitioned task=0A= =0A= - another set of non-partitioned tasks (e.g. B=2C C=2C D) also needs to be = assigned nodes=2C but it would be most efficient of those tasks are assigne= d to separate nodes so any single node has at most 1 task (either partition= ed A=2C B=2C C=2C D=2C etc.)=0A= =0A= This seems to require a global view of a tasks. However=2C from the exampl= es and the Rebalancer code=2C it appears that the resource mappings/assignm= ents are independent of each another. Is that correct? If so=2C is Apache= Helix the right framework for us=2C given the requirements above?=0A= =0A= I saw that it might be possible to find the current resource assignment for= other resources during the rebalancing calculation methods=2C but I was th= en concerned about concurrency issues--if the rebalance for task A and reba= lance for B was computed at the same time.=0A= =0A= Thanks for any and all feedback. =0A= =0A= Vu Nguyen =0A= = --_d8bd00c6-7252-4e96-a55f-e24739b4f988_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Not sure I follow. Is your probl= em that Helix creates the cluster as a child of the root node (e.g. /cluste= rName) while you would like it to be something else (e.g. /path/to/custom/r= oot/clusterName)?

I'm also unclear about what you mean a= bout discovering ZK servers. How would you be able to leverage a path in ZK= to discover ZK?

Right now Helix requires long-run= ning ZK servers and assumes that you as the application know how to connect= to them (i.e. you know the hosts/ports). If that assumption holds=2C I bel= ieve it should work independent of deployment (cloud provider=2C private da= tacenter=2C or anything else).

I'm not really sure= what you're trying to adapt with the adapter. Could you clarify?

I'm on #apachehelix on freenode if that's more convenient.<= /div>

Thanks=2C
Kanak

Date: Wed=2C 1 Jan 2014 21:07:36 -0800
Subject: Re: helix rebala= ncing for multiple resources
From: vusilly@gmail.com
To: kanak.b@hotm= ail.com
CC: user@helix.incubator.apache.org

Yes= =2C that is helpful.

Another big requirement that I forg= ot to mention is running this on a cloud service provider=2C like AWS. &nbs= p=3BWe already have shared zookeeper setup there with our own client.  = =3BIdeally=2C I could inject a custom client for helix to use for operation= s=2C where the main differences we would require is a custom top level path= (/appname) that is required by our client=2C and that would handle discove= ring and connecting to the zookeeper servers.
=0A=

Is support for AWS and other cloud providers on the roa= dmap?

Also=2C for the short-term=2C do you see any= complications in us creating an adapter client that helix would use to bri= dge that gap?  =3BOr would it be much more complicated than I am hoping= for?
=0A=

Thanks

Vu

<= br>




On Wed=2C Jan 1=2C 2014 at 8:36= PM=2C Kanak Biscuitwala <=3Bkan= ak.b@hotmail.com>=3B wrote:
=0A=
=0A= =0A= =0A=
Resending since I realized you might not be registere= d on the user list yet. By the way=2C for your specific use case=2C I would= personally lean towards the CustomCodeRunner along with the CUSTOMIZED Ide= alState rebalance mode. Then when nodes enter and exit=2C you can change th= e IdealState yourself and Helix will fire the transitions. This will most e= asily give you the policy-driven global view you're looking for.
=0A=
---

Hi Vu=2C
=0A=
Your understanding is basic= ally correct. The controller will rebalance each resource in sequence=2C at= most one controller pipeline execution is going on at any one time=2C and = there is no parallelism within the controller pipeline (other than batch re= ading and writing the cluster at the beginning and end).
=0A=
Here are some things that m= ay be of use to know:
=0A=
1. You can plug in your own= code to help decide how to rebalance your cluster in one of two ways:
=0A=  =3B  =3B- Using the CustomCodeRunner on the participant side so th= at you can update the IdealState whenever the cluster changes: =3Bhttps://github.com/apache/incubator-helix/blob/helix-0.6= .2-release/helix-core/src/main/java/org/apache/helix/participant/HelixCusto= mCodeRunner.java?source=3Dc
=0A= = =0A=

In either case=2C= Helix will still fire transitions according to constraints and react to no= de entry/exit.
=0A=

2. Helix supports= adding tags to nodes (via InstanceConfig)=2C and specifying tags in each r= esource IdealState. Then=2C a tagged resource will only be assigned to node= s with the corresponding tag present.
=0A=

3. You can specif= y max partitions per resource per node in the IdealState of the resource (t= his should be 1 in your case)
=0A=

4. You can combin= e any of the above 3 if that makes sense (e.g. change node tags whenever a = cluster change happens=2C thus constraining how Helix will assign everythin= g)
=0A=

Is that helpful?<= /div>
=0A=
Kanak

Date: Wed=2C 1 Jan 20= 14 20:31:56 -0800
Subject: helix rebalancing for multiple resources
F= rom: vusilly@gmail.com
=0A= To: user@helix.incubator.apa= che.org


Hi=2C
=0A= We're looking into creating something like a distributed task processing cl= uster.  =3BWe already have existing code for the processing task on a s= ingle host.  =3BSo that results in stronger restrictions on what we're = doing:
=0A= =0A=
- partiti= oned task A: single partition needs to be assigned to a single node and a n= ode may have only a single partitioned task
=0A= =0A= - another set of non-partitioned tasks (e.g. B=2C C=2C D) also needs to be = assigned nodes=2C but it would be most efficient of those tasks are assigne= d to separate nodes so any single node has at most 1 task (either partition= ed A=2C B=2C C=2C D=2C etc.)
=0A= =0A=

This see= ms to require a global view of a tasks.  =3BHowever=2C from the example= s and the Rebalancer code=2C it appears that the resource mappings/assignme= nts are independent of each another.  =3BIs that correct?  =3BIf so= =2C is Apache Helix the right framework for us=2C given the requirements ab= ove?
=0A= =0A=

I saw th= at it might be possible to find the current resource assignment for other r= esources during the rebalancing calculation methods=2C but I was then conce= rned about concurrency issues--if the rebalance for task A and rebalance fo= r B was computed at the same time.
=0A= =0A=

Thanks f= or any and all feedback.

=0A= =0A=
Vu = Nguyen
=0A=

= --_d8bd00c6-7252-4e96-a55f-e24739b4f988_--