helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Sharma <va...@pinterest.com>
Subject Re: Questions about custom helix rebalancer/controller/agent
Date Wed, 06 Aug 2014 22:56:35 GMT
I am attempting to attach the HelixCustomCodeRunner to a controller
instance - not really running a controller alongside each of my nodes.
HelixCustomCodeRunner.start() is failing as above with a nullpointer
exception at line 120. Is it not possible to attach the
HelixCustomCodeRunner to a controller instance ?

Thanks !
Varun


On Wed, Aug 6, 2014 at 3:46 PM, Kanak Biscuitwala <kanak.b@hotmail.com>
wrote:

> I would suggest maintaining 2 HelixManager connections: one for
> CONTROLLER, one for PARTICIPANT (I'm assuming you're running a controller
> instance alongside each of your nodes). It's wasteful, but you should just
> leave the controller one alone, and then attach the state model factory and
> custom code runner to the participant one.
>
> Kanak
>
> ------------------------------
> Date: Wed, 6 Aug 2014 15:43:16 -0700
>
> Subject: Re: Questions about custom helix rebalancer/controller/agent
> From: varun@pinterest.com
> To: user@helix.apache.org
>
> Without this I was getting a null pointer exception in the
> CustomCodeRunner - Helix 0.6.3 - Lines 120 and 121
>
>
>
> 120 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java/#120>
>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java/#>
>
>     StateMachineEngine <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/StateMachineEngine.java#StateMachineEngine>
stateMach = _manager <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java#HelixCustomCodeRunner.0_manager>.getStateMachineEngine
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/HelixManager.java#HelixManager.getStateMachineEngine%28%29>();
>
>  121 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java/#121>
>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java/#>
>
>     stateMach.registerStateModelFactory <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/StateMachineEngine.java#StateMachineEngine.registerStateModelFactory%28java.lang.String%2corg.apache.helix.participant.statemachine.StateModelFactory%2cjava.lang.String%29>(LEADER_STANDBY
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java#HelixCustomCodeRunner.0LEADER_STANDBY>,
_stateModelFty <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java#HelixCustomCodeRunner.0_stateModelFty>,
_resourceName <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.helix/helix-core/0.6.3/org/apache/helix/participant/HelixCustomCodeRunner.java#HelixCustomCodeRunner.0_resourceName>);
>
>
> My code calls the HelixCustomCodeRunner in the following way:
>
>
>    new HelixCustomCodeRunner(helixManager, zookeeperQuorum).
>         on(HelixConstants.ChangeType.LIVE_INSTANCE).invoke(myCallback).
>         usingLeaderStandbyModel("HDFS_rebalancer").start();
>
>
>
>
> On Wed, Aug 6, 2014 at 3:08 PM, Kanak Biscuitwala <kanak.b@hotmail.com>
> wrote:
>
> Hi Varun,
>
> getStateMachineEngine is only supported for InstanceType.PARTICIPANT. May
> I ask why you need your controller to have state transition callbacks?
>
> In future releases, we're creating separate classes for each role, so
> hopefully that will resolve confusions like this moving forward.
>
> Kanak
>
> ------------------------------
> Date: Wed, 6 Aug 2014 15:04:36 -0700
>
> Subject: Re: Questions about custom helix rebalancer/controller/agent
> From: varun@pinterest.com
> To: user@helix.apache.org
>
> I am getting a weird null pointer exception while instantiating the
> controller. Here is the error:
>
> this.helixManager = HelixManagerFactory.getZKHelixManager(this.clusterName,
>         InetAddress.getLocalHost().getHostName() + ":" + thriftPort,
>         InstanceType.CONTROLLER,
>         zookeeperQuorum);
> StateMachineEngine machineEngine = helixManager.getStateMachineEngine();
>
> *machineEngine.registerStateModelFactory("HDFS_state_machine",*
> *     new OnlineOfflineStateModelFactory(1000));*
> this.helixManager.connect();
>
> I get a NullPointerException at line #3 because getStateMachineEngine()
> returns a null value. Is that supposed to happen ?
>
> Thanks
> Varun
>
>
> On Fri, Aug 1, 2014 at 11:23 AM, Zhen Zhang <zzhang@linkedin.com> wrote:
>
>  Hi Varun,
>
>  The state transitions will be independent. Helix controller may send
> MASTER->OFFLINE to all three nodes, for example, and if node1 completes the
> MASTER->OFFLINE transition first, controller will send OFFLINE->DROPPED to
> node1 first. Or if all three nodes completes MASTER->OFFLINE at the same
> time, controller may send OFFLINE->DROPPED to all three nodes together.
>
>  Thanks,
> Jason
>
>   From: Varun Sharma <varun@pinterest.com>
> Reply-To: "user@helix.apache.org" <user@helix.apache.org>
> Date: Friday, August 1, 2014 11:10 AM
> To: "user@helix.apache.org" <user@helix.apache.org>
>
> Subject: Re: Questions about custom helix rebalancer/controller/agent
>
>   Thanks a lot. Most of my questions are answered, except I have one
> follow up question.
>
>  Lets say I have a situation with 3 masters per partition. For partition
> X, these are on node1, node2 and node3. Upon dropping the resource, would
> the partition X be offlined on all three nodes and then dropped or can that
> be independent as in, node1 offlines and drops, followed by node2 and so
> on. Just want to check if we first wait for all the masters to offline and
> then initiate the offline->drop or the other way round.
>
>  Thanks !
> Varun
>
>
> On Fri, Aug 1, 2014 at 10:33 AM, Kanak Biscuitwala <kanak.b@hotmail.com>
> wrote:
>
>
> Dropping a resource will cause the controller to first send MASTER -->
> OFFLINE for all partitions, and then OFFLINE --> DROPPED.
>
>  Kanak
>  ------------------------------
> Date: Fri, 1 Aug 2014 10:30:54 -0700
>
> Subject: Re: Questions about custom helix rebalancer/controller/agent
> From: varun@pinterest.com
> To: user@helix.apache.org
>
> In my case, I will have many resources - like say upto a 100 resources.
> Each of them will have partitions in the range of 100-5K.  So I guess, I do
> require the bucket size. 300K partitions is the sum of partitions across
> all resources, rather than the # of partitions within a single resource.
>
>  Another question, I had was regarding removing a resource in Helix. When
> a removeResource is called from HelixAdmin, would it trigger the
> MASTER->OFFLINE the respective partitions before the resource is removing ?
> To concretize my use case, we have many resources with a few thousand
> partitions being loaded every day. New versions of the resources keep
> getting loaded as brand new resources into Helix and the older versions are
> decommissioned/garbage collected. So we would be issuing upto a 100 or so
> resource additions per day and upto a 100 or so resource deletions every
> day. Just want to check that deleting a resource would also trigger the
> appropriate MASTER->OFFLINE transitions.
>
>  Thanks
> Varun
>
>
> On Fri, Aug 1, 2014 at 10:18 AM, Kanak Biscuitwala <kanak.b@hotmail.com>
> wrote:
>
>  a) By default, there is one znode per resource, which as you know is a
> grouping of partitions. The biggest limitation is that ZK has a 1MB limit
> on znode sizing. To get around this, Helix has the concept of bucketizing,
> where in your ideal state, you can set a bucket size, which will
> effectively create that many znodes to fully represent all your partitions.
> I believe that you can have ~2k partitions before you start needing to
> bucketize.
>
>  300k may cause you separate issues, and you may want to consider doing
> things like enabling batch message mode in your ideal state so that each
> message we send to an instance contains transitions for all partitions
> hosted on that instance, rather than creating a znode per partition state
> change. However, in theory (we've never played with this many in practice),
> Helix should be able to function correctly with that many partitions.
>
>  b) Yes, if you have a hard limit of 1 master per partition, Helix will
> transition the first node to OFFLINE before sending the MASTER transition
> to the new master.
>
>  Kanak
>
>  ------------------------------
> Date: Fri, 1 Aug 2014 10:09:24 -0700
>
> Subject: Re: Questions about custom helix rebalancer/controller/agent
> From: varun@pinterest.com
> To: user@helix.apache.org
>
> Sounds fine to me. I can work without the FINALIZE notification for now,
> but I hope its going to come out soon. A few more questions:
>
>  a) How well does Helix scale with partitions - is each partition a
> separate znode inside helix ? If I have 300K partitions in Helix would that
> be an issue ?
> b) If a partition which was assigned as a master on node1 is now assigned
> as a master on node2, will node1 get a callback execution for transition
> from MASTER-->OFFLINE
>
>  Thanks
> Varun
>
>
> On Thu, Jul 31, 2014 at 11:18 PM, Kanak Biscuitwala <kanak.b@hotmail.com>
> wrote:
>
>  s/run/start/g -- sorry about that, fixed in javadocs for future releases
>
> You may need to register for a notification type; I believe
> HelixCustomCodeRunner complains if you don't. However, you can simply
> ignore that notification type, and just check for INIT and FINALIZE
> notification types in your callback to to track whether or not you're the
> leader. On INIT, you start your 30 minute timer, and on FINALIZE you stop
> it. You may need to wait for us to make a 0.6.4 release (we will likely do
> this soon) to get the FINALIZE notification.
>
> Here is an example of a custom code runner usage:
> Registration:
> https://github.com/kishoreg/fullmatix/blob/master/mysql-cluster/src/main/java/org/apache/fullmatix/mysql/MySQLAgent.java
> Callback:
> https://github.com/kishoreg/fullmatix/blob/master/mysql-cluster/src/main/java/org/apache/fullmatix/mysql/MasterSlaveRebalancer.java
>
> Regarding setting up the Helix controller, you actually don't need to
> instantiate a GenericHelixController. If you create a HelixManager with
> InstanceType.CONTROLLER, then ZKHelixManager automatically creates a
> GenericHelixController and sets it up with leader election. We really
> should update the documentation to clarify that.
>
>  ------------------------------
> Date: Thu, 31 Jul 2014 23:00:13 -0700
>
> Subject: Re: Questions about custom helix rebalancer/controller/agent
> From: varun@pinterest.com
> To: user@helix.apache.org
>
>  Thanks for the suggestions..
>
>  Seems like the HelixCustomCodeRunner could do it. However, it seems like
> the CustomCodeRunner only provides hooks for plugging into notifications.
> The documentation example in the above link suggests a run() method, which
> does not seem to exist.
>
>  However, this maybe sufficient for my case. I essentially hook in an
> empty CustomCodeRunner into my helix manager. Then I can instantiate my own
> thread which would run above snippet and keep writing ideal states every 30
> minutes. I guess I would still need to attach the GenericHelixController
> with the following code snippet to take action whenever the ideal state
> changes ??
>
>  GenericHelixController controller = new GenericHelixController();
>      manager.addConfigChangeListener(controller);
>      manager.addLiveInstanceChangeListener(controller);
>      manager.addIdealStateChangeListener(controller);
>      manager.addExternalViewChangeListener(controller);
>      manager.addControllerListener(controller);
>
>
>
>
>
> On Thu, Jul 31, 2014 at 6:01 PM, kishore g <g.kishore@gmail.com> wrote:
>
>  List resourceList = helixAdmin.getResourceList();
> for each resource:
>    Compute target ideal state
>     helixAdmin.setIdealState(resource, targetIdealState);
>
>  Thread.sleep(30minutes);
>
>  This can work right. This code can be as part of CustomCodeRunner.
> http://helix.apache.org/javadocs/0.6.3/reference/org/apache/helix/participant/HelixCustomCodeRunner.html.
> You can say you are interested in notifications but can ignore that.
>
>  thanks,
> Kishore G
>
>
> On Thu, Jul 31, 2014 at 5:45 PM, Kanak Biscuitwala <kanak.b@hotmail.com>
> wrote:
>
>  i.e. helixAdmin.enableCluster(clusterName, false);
>
>   ------------------------------
> From: kanak.b@hotmail.com
> To: user@helix.apache.org
> Subject: RE: Questions about custom helix rebalancer/controller/agent
> Date: Thu, 31 Jul 2014 17:44:40 -0700
>
>
> Unfortunately HelixAdmin#rebalance is a misnomer, and it is a function of
> all the configured instances and not the live instances. The closest you
> can get to that is to use the third option I listed related to CUSTOMIZED
> mode, where you write the mappings yourself based on what is live.
>
>  Another thing you could do is pause the cluster controller and unpause
> it for a period every 30 minutes. That will essentially enforce that the
> controller will not send transitions (or do anything else, really) during
> the time it is paused. This sounds a little like a hack to me, but it may
> do what you want.
>
>  Kanak
>
>  ------------------------------
> Date: Thu, 31 Jul 2014 17:39:40 -0700
> Subject: Re: Questions about custom helix rebalancer/controller/agent
> From: varun@pinterest.com
> To: user@helix.apache.org
>
> Thanks Kanak, for your detailed response and this is really very helpful.
> I was wondering if its possible for me do something like the following:
>
>  List resourceList = helixAdmin.getResourceList();
> for each resource:
>    Compute target ideal state
>    helixAdmin.rebalance(resource);
>
>  Thread.sleep(30minutes);
>
>  So, the above happens inside a while loop thread and this is the only
> place where we do the rebalancing ?
>
>  Thanks
> Varun
>
>
> On Thu, Jul 31, 2014 at 5:25 PM, Kanak Biscuitwala <kanak.b@hotmail.com>
> wrote:
>
>  Hi Varun,
>
>  Sorry for the delay.
>
>  1 and 3) There are a number of ways to do this, with various tradeoffs.
>
>  - You can write a user-defined rebalancer. In helix 0.6.x, it involves
> implementing the following interface:
>
>
> https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/controller/rebalancer/Rebalancer.java
>
>  Essentially what it does is given an existing ideal state, compute a new
> ideal state. For 0.6.x, this will read the preference lists in the output
> ideal state and compute a state mapping based on them. If you need more
> control, you can also implement:
>
>
> https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/controller/rebalancer/internal/MappingCalculator.java
>
>  which will allow you to create a mapping from partition to map of
> participant and state. In 0.7.x, we consolidated these into a single method.
>
>  Here is a tutorial on the user-defined rebalancer:
> http://helix.apache.org/0.6.3-docs/tutorial_user_def_rebalancer.html
>
>  Now, running this every 30 minutes is tricky because by default the
> controller responds to all cluster events (and really it needs to because
> it aggregates all participant current states into the external view --
> unless you don't care about that).
>
>  - Combined with the user-defined rebalancer (or not), you can have a
> GenericHelixController that doesn't listen on any events, but calls
> startRebalancingTimer(), into which you can pass 30 minutes. The problem
> with this is that the instructions at
> http://helix.apache.org/0.6.3-docs/tutorial_controller.html won't work as
> described because of a known issue. The workaround is to connect
> HelixManager as role ADMINISTRATOR instead of CONTROLLER.
>
>  However, if you connect as ADMINISTRATOR, you have to set up leader
> election yourself (assuming you want a fault-tolerant controller). See
> https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/manager/zk/DistributedLeaderElection.java
for
> a controller change listener that can do leader election, but your version
> will have to be different, as you actually don't want to add listeners, but
> rather set up a timer.
>
>  This also gives you the benefit of plugging in your own logic into the
> controller pipeline. See
> https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
createDefaultRegistry()
> for how to create an appropriate PipelineRegistry.
>
>  - You can take a completely different approach and put your ideal state
> in CUSTOMIZED rebalance mode. Then you can have a meta-resource where one
> participant is a leader and the others are followers (you can create an
> ideal state in SEMI_AUTO mode, where the replica count and the replica
> count and preference list of resourceName_0 is "ANY_LIVEINSTANCE". When one
> participant is told to become leader, you can set a timer for 30 minutes
> and update and write the map fields of the ideal state accordingly.
>
>  2) I'm not sure I understand the question. If you're in the JVM, you
> simply need to connect as a PARTICIPANT for your callbacks, but that can
> just be something you do at the beginning of your node startup. The rest of
> your code is more or less governed by your transitions, but if there are
> things you need to do on the side, there is nothing in Helix preventing you
> from doing so. See
> http://helix.apache.org/0.6.3-docs/tutorial_participant.html for
> participant logic.
>
>  4) The current state is per-instance and is literally called
> CurrentState. For a given participant, you can query a current state by
> doing something like:
>
>  HelixDataAccessor accessor = helixManager.getHelixDataAccessor();
> CurrentState currentState =
> accessor.getProperty(accessor.keyBuilder().currentState(instanceName,
> sessionId, resourceName);
>
>  If you implement a user-defined rebalancer as above, we automatically
> aggregate all these current states into a CurrentStateOutput object.
>
>  5) You can use a Helix spectator:
>
>  http://helix.apache.org/0.6.3-docs/tutorial_spectator.html
>
>  This basically gives you a live-updating routing table for the mappings
> of the Helix-managed resource. However, it requires the external view to be
> up to date, going back to my other point of perhaps separating the concept
> of changing mappings every 30 minutes from the frequency at which the
> controller runs.
>
>  Hopefully this helps.
>
>  Kanak
>
>   ------------------------------
> Date: Thu, 31 Jul 2014 12:13:27 -0700
> Subject: Questions about custom helix rebalancer/controller/agent
> From: varun@pinterest.com
> To: user@helix.apache.org
>
>
> Hi,
>
>  I am trying to write a customized rebalancing algorithm. I would like to
> run the rebalancer every 30 minutes inside a single thread. I would also
> like to completely disable Helix triggering the rebalancer.
>
>  I have a few questions:
> 1) What's the best way to run the custom controller ? Can I simply
> instantiate a ZKHelixAdmin object and then keep running my rebalancer
> inside a thread or do I need to do something more.
>
>  Apart from rebalancing, I want to do other things inside the the
> controller, so it would be nice if I could simply fire up the controller
> through code. I could not find this in the documentation.
>
>  2) Same question for the Helix agent. My Helix Agent is a JVM process
> which does other things apart from exposing the callbacks for state
> transitions. Is there a code sample for the same ?
>
>  3) How do I disable Helix triggered rebalancing once I am able to run
> the custom controller ?
>
>  4) During my custom rebalance run, how I can get the current cluster
> state - is it through ClusterDataCache.getIdealState() ?
>
>  5) For clients talking to the cluster, does helix provide an easy
> abstraction to find the partition distribution for a helix resource ?
>
>  Thanks
>
>
>
>
>
>
>
>
>
>

Mime
View raw message