helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Rai <abhishek...@gmail.com>
Subject Re: Question about participant dropping a subset of assigned partitions
Date Thu, 07 Feb 2013 18:02:22 GMT
Thanks again for the thoughtful feedback Kishore.

On Sat, Feb 2, 2013 at 12:20 AM, kishore g <g.kishore@gmail.com> wrote:

> Thanks Abhishek, you are thinking in the right direction.
> Your point about disable-enable happening too quickly is valid. However,
> how fast can you detect the c++ process crash and restart. Is C++ process
> running in a daemon mode and it is restarted automatically or the
> helix-java-proxy agent will be responsible to start the c++ process.

Both options are possible at this time, but I was evaluating the
disable-partition suggestion for soundness.  However, I agree that it's a
highly unlikely scenario.

> The messaging solution will work but the alternative of modeling each c++
> process as a participant makes sense and is the right thing to do.
> Can you provide more details on "its ok if the java-proxy dies" why does
> it not effect the correctness? when the agent is restarted does it
> re-register the c++ instances, do you plan to store the c++ pid in Helix so
> that the agent can remember the processes it had started?

Death or restarts of the java proxy may result in temporarily
unavailability of some data until the controller rebalances the lost
partitions.  Death or restarts of the C++ DDS process also affects
availability but if the Java proxy stays up, then (1) controller may not
notice the unavailability, and (2) DDS clients may continue to think that
their data is reachable when it's not.  Sorry I misused the term
"correctness" for describing the weaker availability in the latter case.

> The reason I am asking these questions is there is a similar effort on
> writing a stand alone helix agent that acts as proxies for other processes
> started on the node. In general this approach seems to be quite useful and
> might have some common functionality that can be leverage across multiple
> implementations.

That's great!  The proxy agent will be very useful.  I wonder if a good
goal would be to enable recursion in the proxy such that the participant
itself can be a controller for another Helix cluster.  Thus the
proxy-participant could delegate its set of assigned resources to another
set of participants.  This may be trivially true.

> As for the writing the custom rebalancer, you have two options 1) as you
> mentioned you can write inside the controller 2) there is another feature
> called CustomCodeInvoker. You can basically write your logic to change the
> idealstate in this and simply run it along with your java proxy and helix
> will ensure it is actively running on only one node. This has an overhead
> of around 50-100ms on reacting to failure but is much cleaner. If you are
> doing 1) you need to be careful to not change existing code but simply add
> a new stage in the pipeline. That way you will be able to upgrade Helix to
> get new features without breaking your functionality.

Thanks for the suggestions.  I'm doing 1) but in a slightly different way,
please let me know if I'm totally off in the wrong direction :-)  Or if I'm
likely to run into problems with future Helix upgrades.

I've subclassed GenericHelixController and implemented all listener
callbacks.  This subclass registers for all events and ensures that
GenericHelixController listeners run for each event.  Internally, it
implements the scheduling logic that it needs and applies it via
ZKHelixAdmin.setResourceIdealState().  Do you see any clear benefits of
changing this to insert a new stage in GenericHelixController's pipeline?

I'm following a similar scheme in the custom participant except that it
directly registers the listeners with Helix without using a
GenericHelixController.  I will take a closer look at CustomCodeInvoker,
looks very useful.

> On the topic of "Helix does not have a c++ library", do you think it would
> make it easy if there was a c++ library?. This may not that hard to write
> because only thing that needs to be written in c++ is participant code
> which simply acts on the message from controller. Majority of the code is
> in controller and it can still be run as java. We are working on a python
> agent and i hope some one will write a c++ agent.

Thanks for the update, yeah I am thinking of taking a stab at it in 1-2
months if it still seems useful.

> One of the good things of modeling each c++ process as an instance is that
> in future if there is a c++ helix agent then you can easily migrate to it.

Thanks again!

> Hope this helps.
> thanks,
> Kishore G
> On Fri, Feb 1, 2013 at 9:44 PM, Terence Yim <chtyim@gmail.com> wrote:
>> Hi,
>> What do mean by "fake" live instance that you mentioned? I think the Java
>> proxy could simply creates one HelixManager participant per C++ instance
>> (hence there are N HelixManager instances in the Java proxy) and disconnect
>> them accordingly based on the liveness of the C++ process.
>> Terence
>> On Fri, Feb 1, 2013 at 7:19 PM, Abhishek Rai <abhishekrai@gmail.com>wrote:
>>> Thanks for the quick and thoughtful response Kishore!  Comments inline.
>>> On Fri, Feb 1, 2013 at 6:33 PM, kishore g <g.kishore@gmail.com> wrote:
>>>> Hi Abhishek,
>>>> Thanks for the good question. We have two options(listed later in the
>>>> email) for allowing a partition to drop the partitions.  However, It works
>>>> only in two modes (auto, custom) of three mode(auto_rebalance, auto,
>>>> custom) the controller can support.
>>>> More info about modes here
>>>> http://helix.incubator.apache.org/Features.html
>>>> Can you let me know which mode you are running it in?.
>>> We are planning to use CUSTOM mode since we have some specific
>>> requirements about (1) the desired state for each partition, and (2)
>>> scheduling partitions to instances.  Our requirements for (1) are not
>>> expressible in the FSM framework.
>>> Also is it sufficient if the disabled partitions are re-assigned
>>>> uniformly to other nodes or you want to other partitions from other nodes
>>>> to be assigned to this node.
>>> Once a participant disables some partitions, it's alright for the
>>> default rebalancing logic to kick in.
>>>> Also it will help us if you can tell the use case when you need this
>>>> feature.
>>> Sure, I'm still trying to hash things out but here is a summary.  The
>>> DDS nodes are C++ processes, which is the crux of the problem.  AFAIK Helix
>>> does not have a C++ library, so I'm planning to use a participant written
>>> in Java, which runs as a separate process on the same node, receives state
>>> transitions from the controller, and proxies them to the C++ process via an
>>> RPC interface.  The problem is that the C++ process and the Java-Helix
>>> proxy can fail independently.  I'm not worried about the Java-Helix proxy
>>> crashing since that would knock of all partitions in the C++ process from
>>> Helix view, which does not affect correctness.
>>> But when the C++ process crashes, the Java-Helix proxy needs to let the
>>> controller know asap, so the Helix "external view" can be updated,
>>> rebalancing can start, etc.  One alternative is to invoke
>>> "manager.disconnect()" from the Helix proxy.  But this would knock off all
>>> partitions managed by the proxy (I want to retain the ability for the proxy
>>> to manage multiple C++ programs).  Hence the question about selectively
>>> dropping certain partitions, viz., the ones in a crashed C++ program.
>>>> To summarize, you can achieve this in AUTO and CUSTOM but not in
>>>> AUTO_REBALANCE mode because the goal of controller is always to assign the
>>>> partitions evenly among nodes. But you bring up a good use case, depending
>>>> the  behavior we might be able to support it easily.
>>>> 1. Disable a partition on a given node: Disabling a partition on a
>>>> particular node should automatically trigger rebalancing. This can be done
>>>> either by admin using command line tool
>>>> helix-admin.sh --zkSvr <ZookeeperServerAddress(Required)>
>>>> --enablePartition <clusterName instanceName resourceName partitionName
>>>> true/false>
>>>> or programmatically if you have the access to manager, you can invoke
>>>> this
>>>> manager.getClusterManagementTool().enablePartition(enabled,
>>>> clusterName,instanceName,resourceName,partitionNames);
>>>> This can be done in auto and custom.
>>> I am not sure this will have the right effect in the scenario described
>>> above.  Specifically, the Java proxy would need to disable all the crashed
>>> partitions, and then re-enable them when the C++ DDS process reboots
>>> successfully.  If the disable-enable transitions happen too quickly, could
>>> the controller possibly miss the transition for some partition and not do
>>> anything?
>>>> 2. The other option is to change the mapping of partition --> node in
>>>> the ideal state. ( You can do this programmatically in custom modes and in
>>>> some what in auto mode as well). Doing this will send transitions to the
>>>> node to drop the partitions and reassign it to other nodes.
>>> Yes, this seems like the most logical thing.  The Java proxy will
>>> probably need to send a message to the controller to trigger this change in
>>> the ideal states of all crashed partitions.  The messaging API would
>>> probably be useful here.
>>> Another alternative I'm considering is for the Java proxy to add a
>>> "fake" instance for each C++ process that it spawns locally.  The custom
>>> rebalancer (that I'd write inside the controller) would then schedule the
>>> C++ DDS partitions on to these "fake" live instances.  When the C++ process
>>> crashes, the Java proxy would simply disconnect the corresponding fake
>>> instance's manager.  Does this make sense to you?  Or do you have any other
>>> thoughts?
>>> Thanks again for your thoughtful feedback!
>>> Abhishek
>>>> Thanks,
>>>> Kishore G
>>>> On Fri, Feb 1, 2013 at 5:44 PM, Abhishek Rai <abhishekrai@gmail.com>wrote:
>>>>> Hi Helix users!
>>>>> I'm a Helix newbie and need some advice about a use case.  I'm using
>>>>> Helix to manage a storage system which fits the description of a DDS
>>>>> ("distributed data service" as defined in the Helix SOCC paper).  Each
>>>>> participant hosts a bunch of partitions of a resource, as assigned by
>>>>> controller.  The set of partitions assigned to a participant changes
>>>>> dynamically as the controller rebalances partitions, nodes join or leave,
>>>>> etc.
>>>>> Additionally, I need the ability for a participant to "drop" a subset
>>>>> of partitions currently assigned to it.  When a partition is dropped
by a
>>>>> participant, Helix would remove the partition from the current state
of the
>>>>> instance, update the external view, and make the partition available
>>>>> rebalancing by the controller.  Does the Java API provide a way of
>>>>> accomplishing this?  If not, are there any workarounds?  Or, was there
>>>>> design rationale to disallow such actions from the participant?
>>>>> Thanks,
>>>>> Abhishek

View raw message