helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhen Zhang <zzh...@linkedin.com>
Subject Re: Controller fault tolerance
Date Fri, 21 Jun 2013 20:49:18 GMT
yes. Using different names for the controllers is a quick workaround.

From: Lance Co Ting Keh <lance@box.com<mailto:lance@box.com>>
Reply-To: "user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>"
<user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>>
Date: Friday, June 21, 2013 1:47 PM
To: "user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>" <user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>>
Subject: Re: Controller fault tolerance

Okay thank you. But for now the quick fix is to make sure to name the controllers differently?


On Fri, Jun 21, 2013 at 1:44 PM, Zhen Zhang <zzhang@linkedin.com<mailto:zzhang@linkedin.com>>
wrote:
This is a known bug in helix.
https://issues.apache.org/jira/browse/HELIX-123

The problem is we are comparing the instance name of the controller but not the session id,
so if you start two controllers of the same name, isLeader() return true. We will fix it shortly.

Thanks,
Jason

From: Lance Co Ting Keh <lance@box.com<mailto:lance@box.com>>
Reply-To: "user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>"
<user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>>
Date: Friday, June 21, 2013 1:39 PM
To: "user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>" <user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>>
Subject: Re: Controller fault tolerance

Hi Kishore,

I tried starting two controllers programmatically like you mentioned:


controllerManager = HelixControllerMain.startHelixController(zkAddress,


          clusterName, "controller", HelixControllerMain.STANDALONE);


I then called isLeader() on the both managers (http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/HelixManager.html#isLeader()).
and both of them returned true. They're obviously both on the same zookeeper instance, and
on the same cluster. The controllers are running and so im not sure whether or not its actually
leader electing properly, or I'm misinterpreting the isLeader() function


Thanks
Lance



On Mon, Jun 17, 2013 at 9:22 AM, Manikumar Reddy <kumar@nmsworks.co.in<mailto:kumar@nmsworks.co.in>>
wrote:
Hi Kishore,

Thanks for the quick response.

Regards,
Kumar


On Mon, Jun 17, 2013 at 8:18 PM, kishore g <g.kishore@gmail.com<mailto:g.kishore@gmail.com>>
wrote:
Hi Kumar,

You can start multiple controllers and only one of them will be active and rest of them will
be in standby mode. If the active controller fails, one of the standby will become active
and start managing the cluster.

You can start the controllers either using command line or programmatically.

command line

./run-helix-controller.sh --zkSvr localhost:2199 --cluster <clustername>

using Helix api

controllerManager = HelixControllerMain.startHelixController(zkAddress,
          clusterName, "controller", HelixControllerMain.STANDALONE);

Hope this helps.

thanks,
Kishore G



On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy <kumar@nmsworks.co.in<mailto:kumar@nmsworks.co.in>>
wrote:
Hi,

I am trying to understand the Helix Controller/Cluster manager fault tolerance mechanism.
Single Controller will become Single-Point-Failure. So what are the available options/techniques
to
achieve controller fault tolerance?   Any pointers/recipes/code snippets?

Regards,
Kumar





Mime
View raw message