helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: helix rebalancing for multiple resources
Date Sat, 11 Jan 2014 17:00:15 GMT
Hi Vu,

What mechanism did you chose to balance across multiple resources? I was
planning to add a recipe/rebalancer to solve this use case.

thanks,
Kishore G


On Fri, Jan 3, 2014 at 1:19 PM, Vu Nguyen <vusilly@gmail.com> wrote:

> I contacted our platform team.  They're making changes that will make the
> hosts ip's more static, so I can use that and pass them to Helix.
>
> We'll have extra nodes allocated and available to pick up a task that was
> previously assigned to a failed node.  In addition, our usual AWS setup
> will replace the failed node for us.  But we'll still have the extra
> standby nodes because that should be faster than waiting for the
> replacement node--I think it tends to take few minutes or more.
>
> Thanks
>
> Vu
>
>
>
> On Fri, Jan 3, 2014 at 8:32 AM, kishore g <g.kishore@gmail.com> wrote:
>
>> Hi Vu,
>>
>> Currently, Helix does not have the ability to take zookeeper client from
>> outside. Its possible to add that feature but I need to think more about
>> the zookeeper state changes like disconnect/connect, session expiry etc.
>>
>> Looks like getting the zk host/ports from your platform and passing it to
>> Helix is a possible option for now. Meanwhile, we will look into what it
>> takes to accept a zookeeper client as input.
>>
>> Regarding the rebalancing for multiple resources, of the options Kanak
>> provided, start with #2 first and then implement #1 using USER defined
>> rebalancer. This functionality is generic enough that we can provide a
>> default implemention in Helix or if you implement one we can add it to
>> helix-core.
>>
>> Let us know if you need help on implementing a rebalancer that works
>> across resources.
>>
>> Another question is what is the expected behavior when a node fails, will
>> you have stand by nodes to pick up the task or assign it to a node that is
>> already running another task.
>>
>> thanks,
>> Kishore G
>>
>>
>>
>> On Fri, Jan 3, 2014 at 12:14 AM, Vu Nguyen <vusilly@gmail.com> wrote:
>>
>>> The main issue is that we already have an infrastructure here for
>>> ZooKeeper that has a separate mechanism for clients to discover the ZK
>>> server hosts.  That's provided by our platform team.  So client
>>> applications don't actually provide the ZooKeeper hosts at this point.  I
>>> likely could get access to that information somehow, though.  However, I
>>> would prefer to re-use what our platform team provides in case they make
>>> any modifications to how hosts are discovered.
>>>
>>> By using our platform libraries, we get a ZooKeeper client that's ready
>>> to use directly.  I was thinking that we could get Helix to use this for
>>> any ZooKeeper operations.  If we get disconnected from ZooKeeper, the
>>> discovery mechanism would be re-used automatically for reconnecting without
>>> requiring us to explicitly providing the hosts/ports.
>>>
>>> Thanks
>>>
>>> Vu
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jan 1, 2014 at 9:26 PM, Kanak Biscuitwala <kanak.b@hotmail.com>wrote:
>>>
>>>> Not sure I follow. Is your problem that Helix creates the cluster as a
>>>> child of the root node (e.g. /clusterName) while you would like it to be
>>>> something else (e.g. /path/to/custom/root/clusterName)?
>>>>
>>>> I'm also unclear about what you mean about discovering ZK servers. How
>>>> would you be able to leverage a path in ZK to discover ZK?
>>>>
>>>> Right now Helix requires long-running ZK servers and assumes that you
>>>> as the application know how to connect to them (i.e. you know the
>>>> hosts/ports). If that assumption holds, I believe it should work
>>>> independent of deployment (cloud provider, private datacenter, or anything
>>>> else).
>>>>
>>>> I'm not really sure what you're trying to adapt with the adapter. Could
>>>> you clarify?
>>>>
>>>> I'm on #apachehelix on freenode if that's more convenient.
>>>>
>>>> Thanks,
>>>> Kanak
>>>> ------------------------------
>>>> Date: Wed, 1 Jan 2014 21:07:36 -0800
>>>> Subject: Re: helix rebalancing for multiple resources
>>>> From: vusilly@gmail.com
>>>> To: kanak.b@hotmail.com
>>>> CC: user@helix.incubator.apache.org
>>>>
>>>>
>>>> Yes, that is helpful.
>>>>
>>>> Another big requirement that I forgot to mention is running this on a
>>>> cloud service provider, like AWS.  We already have shared zookeeper setup
>>>> there with our own client.  Ideally, I could inject a custom client for
>>>> helix to use for operations, where the main differences we would require
is
>>>> a custom top level path (/appname) that is required by our client, and that
>>>> would handle discovering and connecting to the zookeeper servers.
>>>>
>>>> Is support for AWS and other cloud providers on the roadmap?
>>>>
>>>> Also, for the short-term, do you see any complications in us creating
>>>> an adapter client that helix would use to bridge that gap?  Or would it be
>>>> much more complicated than I am hoping for?
>>>>
>>>> Thanks
>>>>
>>>> Vu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jan 1, 2014 at 8:36 PM, Kanak Biscuitwala <kanak.b@hotmail.com>wrote:
>>>>
>>>> Resending since I realized you might not be registered on the user list
>>>> yet. By the way, for your specific use case, I would personally lean
>>>> towards the CustomCodeRunner along with the CUSTOMIZED IdealState rebalance
>>>> mode. Then when nodes enter and exit, you can change the IdealState
>>>> yourself and Helix will fire the transitions. This will most easily give
>>>> you the policy-driven global view you're looking for.
>>>>
>>>> ---
>>>>
>>>> Hi Vu,
>>>>
>>>> Your understanding is basically correct. The controller will rebalance
>>>> each resource in sequence, at most one controller pipeline execution is
>>>> going on at any one time, and there is no parallelism within the controller
>>>> pipeline (other than batch reading and writing the cluster at the beginning
>>>> and end).
>>>>
>>>> Here are some things that may be of use to know:
>>>>
>>>> 1. You can plug in your own code to help decide how to rebalance your
>>>> cluster in one of two ways:
>>>>    - Using the CustomCodeRunner on the participant side so that you can
>>>> update the IdealState whenever the cluster changes:
>>>> https://github.com/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apache/helix/participant/HelixCustomCodeRunner.java?source=c
>>>>    - Implementing a Rebalancer with USER_DEFINED rebalance mode:
>>>> https://github.com/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apache/helix/controller/rebalancer/Rebalancer.java?source=c
>>>>
>>>> In either case, Helix will still fire transitions according to
>>>> constraints and react to node entry/exit.
>>>>
>>>> 2. Helix supports adding tags to nodes (via InstanceConfig), and
>>>> specifying tags in each resource IdealState. Then, a tagged resource will
>>>> only be assigned to nodes with the corresponding tag present.
>>>>
>>>> 3. You can specify max partitions per resource per node in the
>>>> IdealState of the resource (this should be 1 in your case)
>>>>
>>>> 4. You can combine any of the above 3 if that makes sense (e.g. change
>>>> node tags whenever a cluster change happens, thus constraining how Helix
>>>> will assign everything)
>>>>
>>>> Is that helpful?
>>>>
>>>> Kanak
>>>> ------------------------------
>>>> Date: Wed, 1 Jan 2014 20:31:56 -0800
>>>> Subject: helix rebalancing for multiple resources
>>>> From: vusilly@gmail.com
>>>> To: user@helix.incubator.apache.org
>>>>
>>>>
>>>> Hi,
>>>> We're looking into creating something like a distributed task
>>>> processing cluster.  We already have existing code for the processing task
>>>> on a single host.  So that results in stronger restrictions on what we're
>>>> doing:
>>>> - partitioned task A: single partition needs to be assigned to a single
>>>> node and a node may have only a single partitioned task
>>>> - another set of non-partitioned tasks (e.g. B, C, D) also needs to be
>>>> assigned nodes, but it would be most efficient of those tasks are assigned
>>>> to separate nodes so any single node has at most 1 task (either partitioned
>>>> A, B, C, D, etc.)
>>>>
>>>> This seems to require a global view of a tasks.  However, from the
>>>> examples and the Rebalancer code, it appears that the resource
>>>> mappings/assignments are independent of each another.  Is that correct? 
If
>>>> so, is Apache Helix the right framework for us, given the requirements
>>>> above?
>>>>
>>>> I saw that it might be possible to find the current resource assignment
>>>> for other resources during the rebalancing calculation methods, but I was
>>>> then concerned about concurrency issues--if the rebalance for task A and
>>>> rebalance for B was computed at the same time.
>>>>
>>>> Thanks for any and all feedback.
>>>>
>>>> Vu Nguyen
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message