helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Co Ting Keh <la...@box.com>
Subject Re: helix alert when zookeeper temporary/permanent session loss
Date Thu, 25 Jul 2013 22:57:26 GMT
Thank you for the response.

I will definitely file a ticket once I have a good understanding of how the
participant does it-- just so i can phrase the ticket properly.

You mentioned that you detect the disconnection from Zk in the participant.
How should i best be informed of this disconnection (in advance of the
ephemeral node in /LIVEINSTANCES going away?)

1. Looking at ZkStateChangeListener line 76, it looks like
manager.isConnected() will be false when the state goes into *Disconnected*,
even before *Expired *which works for me. Should i then be periodically
calling manager.isConnected()?

2. The addHealthStateChangeListener on line 358 of ZkHelixManager only
seems to be listening for EventTypes and not KeeperStates

You also mentioned that "if we notice many disconnects in a short period we
disable the node". When the node is disabled do you call the
@Transition(from = "OFFLINE", to = "ONLINE") method?


On Wed, Jul 24, 2013 at 12:45 PM, kishore g <g.kishore@gmail.com> wrote:

> Hi Lance,
> Unfortunately the controller does not know about the disconnection from
> ZK. However we detect that in the participant and if we notice many
> disconnects in a short period we disable the node.
> After we detect a disconnect we can potentially inform the controller
> about it and have an alert. Can you please file a jira for this.
> thanks,
> Kishore G
> On Tue, Jul 23, 2013 at 6:50 PM, Lance Co Ting Keh <lance@box.com> wrote:
>> I see what you mean by alerts on live instances. In fact there is an
>> "onLiveInstanceChange" under GenericHelixController (
>> http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/controller/GenericHelixController.html
>> )
>> The question is can i register for an alert to myself? If the agent that
>> is being alerted is the one that loses connection to zk, does the alert
>> trigger?
>> More importantly, it seems that setting an alert for onLiveInstanceChange
>> happens when the zookeeper session expires(in which case master controller
>> already remaps), and not immediately when a zk connection falters (but
>> ephemeral node on LIVEINSTANCES is still there). I was hoping to get an
>> alert not when the ephemeral node expires but immediately right when a zk
>> connection falters.
>> Thank you
>> Lance
>> On Tue, Jul 23, 2013 at 6:00 PM, Shi Lu <lushi04@gmail.com> wrote:
>>> Hi Lance:
>>> The helix controller exposes jmx beans that reflects the number of
>>> liveInstances under the jmx domain ClusterStatus:cluster=<clusterName>,
>>> which it will report
>>>  number of down instances, disabled instancesand disabled partitions.
>>> You can set alerts on those jmx beans.
>>> On Tue, Jul 23, 2013 at 2:32 PM, Lance Co Ting Keh <lance@box.com>wrote:
>>>> Hi guys,
>>>> I was trying to look for how I can most cleanly get alerted when a
>>>> helix participant temporary and permanently loses its session with
>>>> Zookeeper. What is the best way to do this?
>>>> Sincerely,
>>>> Lance

View raw message