helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vu Nguyen <vusi...@gmail.com>
Subject Re: helix rebalancing for multiple resources
Date Fri, 03 Jan 2014 08:14:51 GMT
The main issue is that we already have an infrastructure here for ZooKeeper
that has a separate mechanism for clients to discover the ZK server hosts.
 That's provided by our platform team.  So client applications don't
actually provide the ZooKeeper hosts at this point.  I likely could get
access to that information somehow, though.  However, I would prefer to
re-use what our platform team provides in case they make any modifications
to how hosts are discovered.

By using our platform libraries, we get a ZooKeeper client that's ready to
use directly.  I was thinking that we could get Helix to use this for any
ZooKeeper operations.  If we get disconnected from ZooKeeper, the discovery
mechanism would be re-used automatically for reconnecting without requiring
us to explicitly providing the hosts/ports.



On Wed, Jan 1, 2014 at 9:26 PM, Kanak Biscuitwala <kanak.b@hotmail.com>wrote:

> Not sure I follow. Is your problem that Helix creates the cluster as a
> child of the root node (e.g. /clusterName) while you would like it to be
> something else (e.g. /path/to/custom/root/clusterName)?
> I'm also unclear about what you mean about discovering ZK servers. How
> would you be able to leverage a path in ZK to discover ZK?
> Right now Helix requires long-running ZK servers and assumes that you as
> the application know how to connect to them (i.e. you know the
> hosts/ports). If that assumption holds, I believe it should work
> independent of deployment (cloud provider, private datacenter, or anything
> else).
> I'm not really sure what you're trying to adapt with the adapter. Could
> you clarify?
> I'm on #apachehelix on freenode if that's more convenient.
> Thanks,
> Kanak
> ------------------------------
> Date: Wed, 1 Jan 2014 21:07:36 -0800
> Subject: Re: helix rebalancing for multiple resources
> From: vusilly@gmail.com
> To: kanak.b@hotmail.com
> CC: user@helix.incubator.apache.org
> Yes, that is helpful.
> Another big requirement that I forgot to mention is running this on a
> cloud service provider, like AWS.  We already have shared zookeeper setup
> there with our own client.  Ideally, I could inject a custom client for
> helix to use for operations, where the main differences we would require is
> a custom top level path (/appname) that is required by our client, and that
> would handle discovering and connecting to the zookeeper servers.
> Is support for AWS and other cloud providers on the roadmap?
> Also, for the short-term, do you see any complications in us creating an
> adapter client that helix would use to bridge that gap?  Or would it be
> much more complicated than I am hoping for?
> Thanks
> Vu
> On Wed, Jan 1, 2014 at 8:36 PM, Kanak Biscuitwala <kanak.b@hotmail.com>wrote:
> Resending since I realized you might not be registered on the user list
> yet. By the way, for your specific use case, I would personally lean
> towards the CustomCodeRunner along with the CUSTOMIZED IdealState rebalance
> mode. Then when nodes enter and exit, you can change the IdealState
> yourself and Helix will fire the transitions. This will most easily give
> you the policy-driven global view you're looking for.
> ---
> Hi Vu,
> Your understanding is basically correct. The controller will rebalance
> each resource in sequence, at most one controller pipeline execution is
> going on at any one time, and there is no parallelism within the controller
> pipeline (other than batch reading and writing the cluster at the beginning
> and end).
> Here are some things that may be of use to know:
> 1. You can plug in your own code to help decide how to rebalance your
> cluster in one of two ways:
>    - Using the CustomCodeRunner on the participant side so that you can
> update the IdealState whenever the cluster changes:
> https://github.com/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apache/helix/participant/HelixCustomCodeRunner.java?source=c
>    - Implementing a Rebalancer with USER_DEFINED rebalance mode:
> https://github.com/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apache/helix/controller/rebalancer/Rebalancer.java?source=c
> In either case, Helix will still fire transitions according to constraints
> and react to node entry/exit.
> 2. Helix supports adding tags to nodes (via InstanceConfig), and
> specifying tags in each resource IdealState. Then, a tagged resource will
> only be assigned to nodes with the corresponding tag present.
> 3. You can specify max partitions per resource per node in the IdealState
> of the resource (this should be 1 in your case)
> 4. You can combine any of the above 3 if that makes sense (e.g. change
> node tags whenever a cluster change happens, thus constraining how Helix
> will assign everything)
> Is that helpful?
> Kanak
> ------------------------------
> Date: Wed, 1 Jan 2014 20:31:56 -0800
> Subject: helix rebalancing for multiple resources
> From: vusilly@gmail.com
> To: user@helix.incubator.apache.org
> Hi,
> We're looking into creating something like a distributed task processing
> cluster.  We already have existing code for the processing task on a single
> host.  So that results in stronger restrictions on what we're doing:
> - partitioned task A: single partition needs to be assigned to a single
> node and a node may have only a single partitioned task
> - another set of non-partitioned tasks (e.g. B, C, D) also needs to be
> assigned nodes, but it would be most efficient of those tasks are assigned
> to separate nodes so any single node has at most 1 task (either partitioned
> A, B, C, D, etc.)
> This seems to require a global view of a tasks.  However, from the
> examples and the Rebalancer code, it appears that the resource
> mappings/assignments are independent of each another.  Is that correct?  If
> so, is Apache Helix the right framework for us, given the requirements
> above?
> I saw that it might be possible to find the current resource assignment
> for other resources during the rebalancing calculation methods, but I was
> then concerned about concurrency issues--if the rebalance for task A and
> rebalance for B was computed at the same time.
> Thanks for any and all feedback.
> Vu Nguyen

View raw message