helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: helix alert when zookeeper temporary/permanent session loss
Date Fri, 26 Jul 2013 16:26:49 GMT
Hi Lance,

Yes manager.isConnected will be false. For now you can periodically poll
this.


What are u planning to do after you detect disconnect. 2 scenarios that
might result in this, network partition and GC.If its network paritition,
you may not be able to reach any other box in the cluster, in case of GC
the process is mostly not responding.

Yes, when the node is disabled we invoke the transitions so that partitions
come back to OFFLINE.

Thanks,
Kishore G


On Jul 25, 2013 3:57 PM, "Lance Co Ting Keh" <lance@box.com> wrote:

> Thank you for the response.
>
> I will definitely file a ticket once I have a good understanding of how
> the participant does it-- just so i can phrase the ticket properly.
>
> You mentioned that you detect the disconnection from Zk in the
> participant. How should i best be informed of this disconnection (in
> advance of the ephemeral node in /LIVEINSTANCES going away?)
>
> 1. Looking at ZkStateChangeListener line 76, it looks like
> manager.isConnected() will be false when the state goes into *Disconnected
> *, even before *Expired *which works for me. Should i then be
> periodically calling manager.isConnected()?
>
> 2. The addHealthStateChangeListener on line 358 of ZkHelixManager only
> seems to be listening for EventTypes and not KeeperStates
>
> You also mentioned that "if we notice many disconnects in a short period
> we disable the node". When the node is disabled do you call the
> @Transition(from = "OFFLINE", to = "ONLINE") method?
>
> Sincerely,
> Lance
>
>
>
>
>
>
> On Wed, Jul 24, 2013 at 12:45 PM, kishore g <g.kishore@gmail.com> wrote:
>
>> Hi Lance,
>>
>> Unfortunately the controller does not know about the disconnection from
>> ZK. However we detect that in the participant and if we notice many
>> disconnects in a short period we disable the node.
>>
>> After we detect a disconnect we can potentially inform the controller
>> about it and have an alert. Can you please file a jira for this.
>>
>> thanks,
>> Kishore G
>>
>>
>> On Tue, Jul 23, 2013 at 6:50 PM, Lance Co Ting Keh <lance@box.com> wrote:
>>
>>> I see what you mean by alerts on live instances. In fact there is an
>>> "onLiveInstanceChange" under GenericHelixController (
>>> http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/controller/GenericHelixController.html
>>> )
>>>
>>> The question is can i register for an alert to myself? If the agent that
>>> is being alerted is the one that loses connection to zk, does the alert
>>> trigger?
>>>
>>> More importantly, it seems that setting an alert for
>>> onLiveInstanceChange happens when the zookeeper session expires(in which
>>> case master controller already remaps), and not immediately when a zk
>>> connection falters (but ephemeral node on LIVEINSTANCES is still there). I
>>> was hoping to get an alert not when the ephemeral node expires but
>>> immediately right when a zk connection falters.
>>>
>>>
>>> Thank you
>>> Lance
>>>
>>>
>>> On Tue, Jul 23, 2013 at 6:00 PM, Shi Lu <lushi04@gmail.com> wrote:
>>>
>>>> Hi Lance:
>>>>
>>>> The helix controller exposes jmx beans that reflects the number of
>>>> liveInstances under the jmx domain ClusterStatus:cluster=<clusterName>,
in
>>>> which it will report
>>>>  number of down instances, disabled instancesand disabled partitions.
>>>> You can set alerts on those jmx beans.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jul 23, 2013 at 2:32 PM, Lance Co Ting Keh <lance@box.com>wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> I was trying to look for how I can most cleanly get alerted when a
>>>>> helix participant temporary and permanently loses its session with
>>>>> Zookeeper. What is the best way to do this?
>>>>>
>>>>>
>>>>> Sincerely,
>>>>> Lance
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message