helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vu Nguyen <vusi...@gmail.com>
Subject Re: helix rebalancing for multiple resources
Date Mon, 13 Jan 2014 20:15:40 GMT
We haven't started on that part yet.  We were doing some planning and
prototyping (without the cluster management part) before we started the
full implementation.  We've started preparing the rest of our code to be
able to run in a distributed fashion.  We should have an implementation
with helix by mid/late Feb.  I'll respond again with whatever code that we
can share when we're closer to the final implementation.

Thanks for all your help!

Vu



On Sat, Jan 11, 2014 at 9:00 AM, kishore g <g.kishore@gmail.com> wrote:

> Hi Vu,
>
> What mechanism did you chose to balance across multiple resources? I was
> planning to add a recipe/rebalancer to solve this use case.
>
> thanks,
> Kishore G
>
>
> On Fri, Jan 3, 2014 at 1:19 PM, Vu Nguyen <vusilly@gmail.com> wrote:
>
>> I contacted our platform team.  They're making changes that will make the
>> hosts ip's more static, so I can use that and pass them to Helix.
>>
>> We'll have extra nodes allocated and available to pick up a task that was
>> previously assigned to a failed node.  In addition, our usual AWS setup
>> will replace the failed node for us.  But we'll still have the extra
>> standby nodes because that should be faster than waiting for the
>> replacement node--I think it tends to take few minutes or more.
>>
>> Thanks
>>
>> Vu
>>
>>
>>
>> On Fri, Jan 3, 2014 at 8:32 AM, kishore g <g.kishore@gmail.com> wrote:
>>
>>> Hi Vu,
>>>
>>> Currently, Helix does not have the ability to take zookeeper client from
>>> outside. Its possible to add that feature but I need to think more about
>>> the zookeeper state changes like disconnect/connect, session expiry etc.
>>>
>>> Looks like getting the zk host/ports from your platform and passing it
>>> to Helix is a possible option for now. Meanwhile, we will look into what it
>>> takes to accept a zookeeper client as input.
>>>
>>> Regarding the rebalancing for multiple resources, of the options Kanak
>>> provided, start with #2 first and then implement #1 using USER defined
>>> rebalancer. This functionality is generic enough that we can provide a
>>> default implemention in Helix or if you implement one we can add it to
>>> helix-core.
>>>
>>> Let us know if you need help on implementing a rebalancer that works
>>> across resources.
>>>
>>> Another question is what is the expected behavior when a node fails,
>>> will you have stand by nodes to pick up the task or assign it to a node
>>> that is already running another task.
>>>
>>> thanks,
>>> Kishore G
>>>
>>>
>>>
>>> On Fri, Jan 3, 2014 at 12:14 AM, Vu Nguyen <vusilly@gmail.com> wrote:
>>>
>>>> The main issue is that we already have an infrastructure here for
>>>> ZooKeeper that has a separate mechanism for clients to discover the ZK
>>>> server hosts.  That's provided by our platform team.  So client
>>>> applications don't actually provide the ZooKeeper hosts at this point.  I
>>>> likely could get access to that information somehow, though.  However, I
>>>> would prefer to re-use what our platform team provides in case they make
>>>> any modifications to how hosts are discovered.
>>>>
>>>> By using our platform libraries, we get a ZooKeeper client that's ready
>>>> to use directly.  I was thinking that we could get Helix to use this for
>>>> any ZooKeeper operations.  If we get disconnected from ZooKeeper, the
>>>> discovery mechanism would be re-used automatically for reconnecting without
>>>> requiring us to explicitly providing the hosts/ports.
>>>>
>>>> Thanks
>>>>
>>>> Vu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jan 1, 2014 at 9:26 PM, Kanak Biscuitwala <kanak.b@hotmail.com>wrote:
>>>>
>>>>> Not sure I follow. Is your problem that Helix creates the cluster as
a
>>>>> child of the root node (e.g. /clusterName) while you would like it to
be
>>>>> something else (e.g. /path/to/custom/root/clusterName)?
>>>>>
>>>>> I'm also unclear about what you mean about discovering ZK servers. How
>>>>> would you be able to leverage a path in ZK to discover ZK?
>>>>>
>>>>> Right now Helix requires long-running ZK servers and assumes that you
>>>>> as the application know how to connect to them (i.e. you know the
>>>>> hosts/ports). If that assumption holds, I believe it should work
>>>>> independent of deployment (cloud provider, private datacenter, or anything
>>>>> else).
>>>>>
>>>>> I'm not really sure what you're trying to adapt with the adapter.
>>>>> Could you clarify?
>>>>>
>>>>> I'm on #apachehelix on freenode if that's more convenient.
>>>>>
>>>>> Thanks,
>>>>> Kanak
>>>>> ------------------------------
>>>>> Date: Wed, 1 Jan 2014 21:07:36 -0800
>>>>> Subject: Re: helix rebalancing for multiple resources
>>>>> From: vusilly@gmail.com
>>>>> To: kanak.b@hotmail.com
>>>>> CC: user@helix.incubator.apache.org
>>>>>
>>>>>
>>>>> Yes, that is helpful.
>>>>>
>>>>> Another big requirement that I forgot to mention is running this on a
>>>>> cloud service provider, like AWS.  We already have shared zookeeper setup
>>>>> there with our own client.  Ideally, I could inject a custom client for
>>>>> helix to use for operations, where the main differences we would require
is
>>>>> a custom top level path (/appname) that is required by our client, and
that
>>>>> would handle discovering and connecting to the zookeeper servers.
>>>>>
>>>>> Is support for AWS and other cloud providers on the roadmap?
>>>>>
>>>>> Also, for the short-term, do you see any complications in us creating
>>>>> an adapter client that helix would use to bridge that gap?  Or would
it be
>>>>> much more complicated than I am hoping for?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Vu
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 1, 2014 at 8:36 PM, Kanak Biscuitwala <kanak.b@hotmail.com
>>>>> > wrote:
>>>>>
>>>>> Resending since I realized you might not be registered on the user
>>>>> list yet. By the way, for your specific use case, I would personally
lean
>>>>> towards the CustomCodeRunner along with the CUSTOMIZED IdealState rebalance
>>>>> mode. Then when nodes enter and exit, you can change the IdealState
>>>>> yourself and Helix will fire the transitions. This will most easily give
>>>>> you the policy-driven global view you're looking for.
>>>>>
>>>>> ---
>>>>>
>>>>> Hi Vu,
>>>>>
>>>>> Your understanding is basically correct. The controller will rebalance
>>>>> each resource in sequence, at most one controller pipeline execution
is
>>>>> going on at any one time, and there is no parallelism within the controller
>>>>> pipeline (other than batch reading and writing the cluster at the beginning
>>>>> and end).
>>>>>
>>>>> Here are some things that may be of use to know:
>>>>>
>>>>> 1. You can plug in your own code to help decide how to rebalance your
>>>>> cluster in one of two ways:
>>>>>    - Using the CustomCodeRunner on the participant side so that you
>>>>> can update the IdealState whenever the cluster changes:
>>>>> https://github.com/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apache/helix/participant/HelixCustomCodeRunner.java?source=c
>>>>>    - Implementing a Rebalancer with USER_DEFINED rebalance mode:
>>>>> https://github.com/apache/incubator-helix/blob/helix-0.6.2-release/helix-core/src/main/java/org/apache/helix/controller/rebalancer/Rebalancer.java?source=c
>>>>>
>>>>> In either case, Helix will still fire transitions according to
>>>>> constraints and react to node entry/exit.
>>>>>
>>>>> 2. Helix supports adding tags to nodes (via InstanceConfig), and
>>>>> specifying tags in each resource IdealState. Then, a tagged resource
will
>>>>> only be assigned to nodes with the corresponding tag present.
>>>>>
>>>>> 3. You can specify max partitions per resource per node in the
>>>>> IdealState of the resource (this should be 1 in your case)
>>>>>
>>>>> 4. You can combine any of the above 3 if that makes sense (e.g. change
>>>>> node tags whenever a cluster change happens, thus constraining how Helix
>>>>> will assign everything)
>>>>>
>>>>> Is that helpful?
>>>>>
>>>>> Kanak
>>>>> ------------------------------
>>>>> Date: Wed, 1 Jan 2014 20:31:56 -0800
>>>>> Subject: helix rebalancing for multiple resources
>>>>> From: vusilly@gmail.com
>>>>> To: user@helix.incubator.apache.org
>>>>>
>>>>>
>>>>> Hi,
>>>>> We're looking into creating something like a distributed task
>>>>> processing cluster.  We already have existing code for the processing
task
>>>>> on a single host.  So that results in stronger restrictions on what we're
>>>>> doing:
>>>>> - partitioned task A: single partition needs to be assigned to a
>>>>> single node and a node may have only a single partitioned task
>>>>> - another set of non-partitioned tasks (e.g. B, C, D) also needs to be
>>>>> assigned nodes, but it would be most efficient of those tasks are assigned
>>>>> to separate nodes so any single node has at most 1 task (either partitioned
>>>>> A, B, C, D, etc.)
>>>>>
>>>>> This seems to require a global view of a tasks.  However, from the
>>>>> examples and the Rebalancer code, it appears that the resource
>>>>> mappings/assignments are independent of each another.  Is that correct?
 If
>>>>> so, is Apache Helix the right framework for us, given the requirements
>>>>> above?
>>>>>
>>>>> I saw that it might be possible to find the current resource
>>>>> assignment for other resources during the rebalancing calculation methods,
>>>>> but I was then concerned about concurrency issues--if the rebalance for
>>>>> task A and rebalance for B was computed at the same time.
>>>>>
>>>>> Thanks for any and all feedback.
>>>>>
>>>>> Vu Nguyen
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message