helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "kishore gopalakrishna (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-621) Missing listener notification of LiveInstances changes (and possibly other state change)
Date Wed, 13 Jan 2016 05:36:39 GMT

    [ https://issues.apache.org/jira/browse/HELIX-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095639#comment-15095639
] 

kishore gopalakrishna commented on HELIX-621:
---------------------------------------------

Thats interesting. The code handling all zk changes are here.

https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/manager/zk/CallbackHandler.java

Can you turn on info logging and paste the log.

> Missing listener notification of LiveInstances changes (and possibly other state change)
> ----------------------------------------------------------------------------------------
>
>                 Key: HELIX-621
>                 URL: https://issues.apache.org/jira/browse/HELIX-621
>             Project: Apache Helix
>          Issue Type: Bug
>          Components: helix-core
>    Affects Versions: 0.6.5
>            Reporter: Marco P.
>
> I noticed sometimes my LiveInstanceChangeListener was not notified of an instance disconnecting.
> Digging a little bit I found out:
>  - A reliable way to consistently reproduce this problem
>  - The problem does not seem to be limited to LiveInstances, it can happen to other listeners
using the same strategy
> This is bad as an application relies on notifications, and its view of the system (LiveInstances
or else) can get very outdated.
> The problem at the core is this logic:
> 1) Set watch W on some path P
> 2) Event E1 modifies P triggering W
> 3) The callback for W re-sets W on P
> If however a second Event E2 modifies between 2 and 3, W will not trigger (until P is
modified again).
> An example of why this is bad:
>  - 2 live instances L1, L2 and a spectator S watching them.
> 1) L1 disconnects
> 2) S's watch on LIVEINSTANCES fires
> 3) S reads the children of LIVEINSTANCES: {L2}
> 3) L2 disconnects
> 4) S's notifies LiveInstanceChangeListeners and goes back to watching LIVEINSTANCES
> The application receives a notification that the live instances now consist of {L2}.

> And no further notification until another instance joins.
> The reality is that no instances are live.
> Again, this is not limited to LIVEINSTANCES, although that's the one I can reliably reproduce.
> Fixing this is not trivial, it requires firing the watch again when re-setting it IF
the version of the watched node change since the last time the watch fired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message