Return-Path: X-Original-To: apmail-helix-user-archive@minotaur.apache.org Delivered-To: apmail-helix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E58D410BDF for ; Thu, 25 Jul 2013 22:57:54 +0000 (UTC) Received: (qmail 60600 invoked by uid 500); 25 Jul 2013 22:57:54 -0000 Delivered-To: apmail-helix-user-archive@helix.apache.org Received: (qmail 60549 invoked by uid 500); 25 Jul 2013 22:57:54 -0000 Mailing-List: contact user-help@helix.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.incubator.apache.org Delivered-To: mailing list user@helix.incubator.apache.org Received: (qmail 60541 invoked by uid 99); 25 Jul 2013 22:57:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jul 2013 22:57:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lance@box.com designates 209.85.220.54 as permitted sender) Received: from [209.85.220.54] (HELO mail-pa0-f54.google.com) (209.85.220.54) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jul 2013 22:57:48 +0000 Received: by mail-pa0-f54.google.com with SMTP id kx1so2284162pab.27 for ; Thu, 25 Jul 2013 15:57:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=ii/EAmdqeySeZ+AXTZsuq229mxtv4eGABxGNt+Dm4UM=; b=Xb7tlqM7n5KpesNdHII0tqemo3s2KfoGjYivlqz95dKFUwGK871n4qnnBrcONCdmhX lfeFpeho2vibMIqa/aO6o9yf25MQBEODSx3DEBb8isxEnjjo/MHSuyewekmbVM5OyDGQ x/pjfRDRmhnJVujUFGoOg49ZpEFjsMpej7vu2tz+iq2l7kmz/Rc74r4rDUA9LZ8BCfR/ pQVyF0QNpUvpHJVaNr1OLFBoPHq9uDpdlUCD7cdYoPnBJu5rzSpKEQRZgX8aEJYiav5s UuASv4KsTw4fOqLlB073LRihYz3h8DUX45qdJv3kyXRrrFOno7Q/3uXoU6U7HGwwrQGx d12Q== MIME-Version: 1.0 X-Received: by 10.68.171.35 with SMTP id ar3mr50133967pbc.61.1374793046552; Thu, 25 Jul 2013 15:57:26 -0700 (PDT) Received: by 10.69.14.37 with HTTP; Thu, 25 Jul 2013 15:57:26 -0700 (PDT) In-Reply-To: References: Date: Thu, 25 Jul 2013 15:57:26 -0700 Message-ID: Subject: Re: helix alert when zookeeper temporary/permanent session loss From: Lance Co Ting Keh To: user@helix.incubator.apache.org Content-Type: multipart/alternative; boundary=047d7b6d94a02e0a1e04e25df335 X-Gm-Message-State: ALoCoQnbjAHDUnv02vasgwh9MsJVozNiz6e588jZAAlE0bek05olNTX4HN5aqK/ECx6kXIqj88dZ X-Virus-Checked: Checked by ClamAV on apache.org --047d7b6d94a02e0a1e04e25df335 Content-Type: text/plain; charset=ISO-8859-1 Thank you for the response. I will definitely file a ticket once I have a good understanding of how the participant does it-- just so i can phrase the ticket properly. You mentioned that you detect the disconnection from Zk in the participant. How should i best be informed of this disconnection (in advance of the ephemeral node in /LIVEINSTANCES going away?) 1. Looking at ZkStateChangeListener line 76, it looks like manager.isConnected() will be false when the state goes into *Disconnected*, even before *Expired *which works for me. Should i then be periodically calling manager.isConnected()? 2. The addHealthStateChangeListener on line 358 of ZkHelixManager only seems to be listening for EventTypes and not KeeperStates You also mentioned that "if we notice many disconnects in a short period we disable the node". When the node is disabled do you call the @Transition(from = "OFFLINE", to = "ONLINE") method? Sincerely, Lance On Wed, Jul 24, 2013 at 12:45 PM, kishore g wrote: > Hi Lance, > > Unfortunately the controller does not know about the disconnection from > ZK. However we detect that in the participant and if we notice many > disconnects in a short period we disable the node. > > After we detect a disconnect we can potentially inform the controller > about it and have an alert. Can you please file a jira for this. > > thanks, > Kishore G > > > On Tue, Jul 23, 2013 at 6:50 PM, Lance Co Ting Keh wrote: > >> I see what you mean by alerts on live instances. In fact there is an >> "onLiveInstanceChange" under GenericHelixController ( >> http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/controller/GenericHelixController.html >> ) >> >> The question is can i register for an alert to myself? If the agent that >> is being alerted is the one that loses connection to zk, does the alert >> trigger? >> >> More importantly, it seems that setting an alert for onLiveInstanceChange >> happens when the zookeeper session expires(in which case master controller >> already remaps), and not immediately when a zk connection falters (but >> ephemeral node on LIVEINSTANCES is still there). I was hoping to get an >> alert not when the ephemeral node expires but immediately right when a zk >> connection falters. >> >> >> Thank you >> Lance >> >> >> On Tue, Jul 23, 2013 at 6:00 PM, Shi Lu wrote: >> >>> Hi Lance: >>> >>> The helix controller exposes jmx beans that reflects the number of >>> liveInstances under the jmx domain ClusterStatus:cluster=, in >>> which it will report >>> number of down instances, disabled instancesand disabled partitions. >>> You can set alerts on those jmx beans. >>> >>> >>> >>> >>> On Tue, Jul 23, 2013 at 2:32 PM, Lance Co Ting Keh wrote: >>> >>>> Hi guys, >>>> >>>> I was trying to look for how I can most cleanly get alerted when a >>>> helix participant temporary and permanently loses its session with >>>> Zookeeper. What is the best way to do this? >>>> >>>> >>>> Sincerely, >>>> Lance >>>> >>> >>> >> > --047d7b6d94a02e0a1e04e25df335 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thank you for the response.

I will defi= nitely file a ticket once I have a good understanding of how the participan= t does it-- just so i can phrase the ticket properly.

You mentioned that you detect the disconnection from Zk in the p= articipant. How should i best be informed of this disconnection (in advance= of the ephemeral node in /LIVEINSTANCES going away?)

1. Looking at ZkStateChangeListener line 76, it looks like= manager.isConnected() will be false when the state goes into Disconnect= ed, even before Expired which works for me. Should i then be per= iodically calling manager.isConnected()?

2. The addHealthStateChangeListener on line= 358 of ZkHelixManager only seems to be listening for EventTypes and not Ke= eperStates=A0

You also mentioned = that "if we notice many disconnects i= n a short period we disable the node". When the node is disabled do yo= u call the=A0=A0 @Transition(from =3D "OFFLINE", to =3D "ONL= INE")=A0method?

Sincerely,
Lance






On Wed, Jul 24, 2013 = at 12:45 PM, kishore g <g.kishore@gmail.com> wrote:
Hi Lance,

Unfortunately the controller does not know about the disconnection from Z= K. However we detect that in the participant and if we notice many disconne= cts in a short period we disable the node.

After we detect a disconnect we can potentially inform = the controller about it and have an alert. Can you please file a jira for t= his.

thanks,
Kishore G


On Tue, Jul 23, 2013 at 6:50 PM, Lance C= o Ting Keh <lance@box.com> wrote:
I see what you mean by alerts on live instances. In fact t= here is an "onLiveInstanceChange" under GenericHelixController (<= a href=3D"http://helix.incubator.apache.org/apidocs/reference/org/apache/he= lix/controller/GenericHelixController.html" target=3D"_blank">http://helix.= incubator.apache.org/apidocs/reference/org/apache/helix/controller/GenericH= elixController.html)

The question is can i register for an alert to myself? If th= e agent that is being alerted is the one that loses connection to zk, does = the alert trigger?

More importantly, it seems that= setting an alert for onLiveInstanceChange happens when the zookeeper sessi= on expires(in which case master controller already remaps), and not immedia= tely when a zk connection falters (but ephemeral node on LIVEINSTANCES is s= till there). I was hoping to get an alert not when the ephemeral node expir= es but immediately right when a zk connection falters.


Thank you
Lance


On Tue, Jul 23, 2013 at 6:00 PM, Shi Lu <lushi04@gmail.com> = wrote:
Hi Lance:

The helix controller exposes jmx beans that reflects the number of liveIn= stances under the jmx domain ClusterStatus:cluster=3D<clusterName>, i= n which it will report=A0
number of down instances, disabled instancesand disabled partitions. You ca= n set alerts on those jmx beans.


<= div>


On Tue, Jul 23, 2013 at 2:32 PM, Lance Co Ting Keh <lance@box.com> wrote:
Hi guys,

I was trying to look for how I can most cleanly get alerted when a helix p= articipant temporary and permanently loses its session with Zookeeper. What= is the best way to do this?


Sincerely,
Lance=A0




--047d7b6d94a02e0a1e04e25df335--