Return-Path: X-Original-To: apmail-helix-user-archive@minotaur.apache.org Delivered-To: apmail-helix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 51E1610324 for ; Fri, 21 Jun 2013 20:48:45 +0000 (UTC) Received: (qmail 9670 invoked by uid 500); 21 Jun 2013 20:48:45 -0000 Delivered-To: apmail-helix-user-archive@helix.apache.org Received: (qmail 9637 invoked by uid 500); 21 Jun 2013 20:48:45 -0000 Mailing-List: contact user-help@helix.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.incubator.apache.org Delivered-To: mailing list user@helix.incubator.apache.org Received: (qmail 9630 invoked by uid 99); 21 Jun 2013 20:48:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Jun 2013 20:48:45 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of g.kishore@gmail.com designates 74.125.82.49 as permitted sender) Received: from [74.125.82.49] (HELO mail-wg0-f49.google.com) (74.125.82.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Jun 2013 20:48:41 +0000 Received: by mail-wg0-f49.google.com with SMTP id a12so7019510wgh.16 for ; Fri, 21 Jun 2013 13:48:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=9ql9IVI4Jw12ANT0LUBMCNJ1P8x5ohTOXY/4ZVviK8g=; b=fRaOTlxXexlNqbpDD9zpsnTo7zewuH/uXlaK302gXIj54MPcVYxFDw7y+Kq0b/xVKe +usDeo9CRbzHSkBo3kz7WBPiJaqyTJxsT0dXSKyDMeQeKdQvGplkh2wMG/hI/AYY4/mX XpwFaHvHOLjQ/o+mZ33Dhupht1HlmnyItVJKH8fjughSHFhFrK2fEG+1B+t0oyuFlXEG w+D1R+g5XGRwPm/bUymomw1+vGFdHdNr+HptOS+6pbuFH6EEiWVjbGJ76Ft0cf+0YvGR aZBfCAqdcaZUunUtGjGjqLJFPOR0AFUiWtW+56uMwLT+SW1sTMYHS1qZDixbU82kibbc wInw== MIME-Version: 1.0 X-Received: by 10.194.22.1 with SMTP id z1mr10479792wje.14.1371847700177; Fri, 21 Jun 2013 13:48:20 -0700 (PDT) Received: by 10.194.172.137 with HTTP; Fri, 21 Jun 2013 13:48:20 -0700 (PDT) In-Reply-To: <23CA11DC8830BA44A37C6B44B14D013C51A63194@ESV4-MBX01.linkedin.biz> References: <23CA11DC8830BA44A37C6B44B14D013C51A63194@ESV4-MBX01.linkedin.biz> Date: Fri, 21 Jun 2013 13:48:20 -0700 Message-ID: Subject: Re: Controller fault tolerance From: kishore g To: user Content-Type: multipart/alternative; boundary=047d7b5d57c2daed3204dfb02ee1 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b5d57c2daed3204dfb02ee1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Jason. I am guessing its only the isLeader method returning wrong results since it compares the name but there is actually one active controller. Is my understanding correct, if yes. then naming each controller with different names should work right ? On Fri, Jun 21, 2013 at 1:44 PM, Zhen Zhang wrote: > This is a known bug in helix. > https://issues.apache.org/jira/browse/HELIX-123 > > The problem is we are comparing the instance name of the controller but > not the session id, so if you start two controllers of the same name, > isLeader() return true. We will fix it shortly. > > Thanks, > Jason > > From: Lance Co Ting Keh > Reply-To: "user@helix.incubator.apache.org" < > user@helix.incubator.apache.org> > Date: Friday, June 21, 2013 1:39 PM > To: "user@helix.incubator.apache.org" > Subject: Re: Controller fault tolerance > > Hi Kishore, > > I tried starting two controllers programmatically like you mentioned: > > controllerManager =3D HelixControllerMain.startHelixController(zkAddress= , > > clusterName, "controller", HelixControllerMain.STANDALONE); > > > I then called isLeader() on the both managers (http://helix.incubator.apa= che.org/apidocs/reference/org/apache/helix/HelixManager.html#isLeader()). a= nd both of them returned true. They're obviously both on the same zookeeper= instance, and on the same cluster. The controllers are running and so im n= ot sure whether or not its actually leader electing properly, or I'm misint= erpreting the isLeader() function > > > Thanks > Lance > > > > On Mon, Jun 17, 2013 at 9:22 AM, Manikumar Reddy wr= ote: > >> Hi Kishore, >> >> Thanks for the quick response. >> >> Regards, >> Kumar >> >> >> On Mon, Jun 17, 2013 at 8:18 PM, kishore g wrote: >> >>> Hi Kumar, >>> >>> You can start multiple controllers and only one of them will be active >>> and rest of them will be in standby mode. If the active controller fail= s, >>> one of the standby will become active and start managing the cluster. >>> >>> You can start the controllers either using command line or >>> programmatically. >>> >>> command line >>> >>> ./run-helix-controller.sh --zkSvr localhost:2199 --cluster >>> >>> using Helix api >>> >>> controllerManager =3D HelixControllerMain.startHelixController(zkAddres= s, >>> clusterName, "controller", HelixControllerMain.STANDALONE); >>> >>> Hope this helps. >>> >>> thanks, >>> Kishore G >>> >>> >>> >>> >>> On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy = wrote: >>> >>>> Hi, >>>> >>>> I am trying to understand the Helix Controller/Cluster manager fault >>>> tolerance mechanism. >>>> Single Controller will become Single-Point-Failure. So what are the >>>> available options/techniques to >>>> achieve controller fault tolerance? Any pointers/recipes/code >>>> snippets? >>>> >>>> Regards, >>>> Kumar >>> >>> >>> >> > --047d7b5d57c2daed3204dfb02ee1 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks Jason. I am guessing its only the isLeader method r= eturning wrong results since it compares the name but there is actually one= active controller. Is my understanding correct, if yes. then naming each c= ontroller with different names should work right ?


On Fri, Jun 2= 1, 2013 at 1:44 PM, Zhen Zhang <zzhang@linkedin.com> wrote= :
This is a known bug in helix.=A0

The problem is we are comparing the instance name of the controller bu= t not the session id, so if you start two controllers of the same name, isL= eader() return true. We will fix it shortly.

Thanks,
Jason

From: Lance Co Ting Keh <lance@box.com>
Reply-To: "user@helix.incubator.apac= he.org" <user@helix.incubator.apache.org>
Date: Friday, June 21, 2013 1:39 PM=
To: "user@helix.incubator.apache.org= " <user@helix.incubator.apache.org>
Subject: Re: Controller fault toler= ance

Hi Kishore,

I tried starting two controllers programmatically like you mentioned:<= /div>


controllerManager =
=3D HelixControllerMain.=
startHelixController(zkAddress,
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0clusterName, "controller", HelixControllerMain.STANDALONE);

=

<= span style=3D"line-height:17.984375px;white-space:pre-wrap">I then called i= sLeader() on the both managers (http://helix.incubator.apache.org/apidocs/reference/org/apache/= helix/HelixManager.html#isLeader()). and both of them returned true. Th= ey're obviously both on the same zookeeper instance, and on the same cl= uster. The controllers are running and so im not sure whether or not its ac= tually leader electing properly, or I'm misinterpreting the isLeader() = function

=

<= span style=3D"line-height:17.984375px;white-space:pre-wrap">Thanks
Lance



On Mon, Jun 17, 2013 at 9:22 AM, Manikumar Reddy= <kumar@nmswork= s.co.in> wrote:
Hi Kishore,

Thanks for the quick response.

Regards,
Kumar


On Mon, Jun 17, 2013 at 8:18 PM, kishore g <g.kishore@gmai= l.com> wrote:
Hi Kumar,

You can start multiple controllers and only one of them will be active= and rest of them will be in standby mode. If the active controller fails, = one of the standby will become active and start managing the cluster.

You can start the controllers either using command line or programmati= cally.

command line
./run-helix-<=
span style=3D"color:rgb(0,0,0)">controller.sh --zkSvr l=
ocalhost:2199 <=
span style=3D"color:rgb(102,102,0)">--cluster <clustername>
using Helix api
controllerManager = =3D HelixControllerMain.= startHelixController(zkAddress,
=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0cluster= Name, "controller", HelixControllerMain.STANDALONE);

Hope this helps.=

thanks,
Kishore G



On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy= <kumar@nmswork= s.co.in> wrote:
Hi,

I am trying to understand the Helix Controller/Cluster manager fault tolera= nce mechanism.
Single Controller will become Single-Point-Failure. So what are the availab= le options/techniques to
achieve controller fault tolerance?=A0=A0 Any pointers/recipes/code snippet= s?

Regards,
Kumar




--047d7b5d57c2daed3204dfb02ee1--