helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dafu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HELIX-195) Race condition between FINALIZE callbacks and Zk Callbacks
Date Mon, 05 Aug 2013 22:14:47 GMT

     [ https://issues.apache.org/jira/browse/HELIX-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

dafu updated HELIX-195:
-----------------------

    Description: 
FINALIZE callbacks are sent async via CallbackHandler#reset(), while Zk callbacks are queued
in ZkEventThread. It's possible that we are handling a FINALIZE callback before all Zk callbacks
are cleaned up. This creates race conditions, for example, in zk session expiry, when a GenericController
gets a FINALIZE callback, it cleans up all listeners using ZkClient#unsubscribe(), but Zk
callbacks  leftover in ZkEventThread comes later, and re-subscribe all listeners, causing
zk watcher leaking.

This is observed by setting up two controllers and expire the leader (by simulating a long
gc). The second controller takes the leadership and add all listeners, but when the former
leader recovers from gc, it gets leftover Zk callbacks and re-subscribe the live-instance
listener hence react to all live-instance changes, though it doesn't acquire the leadership.

  was:
FINALIZE callbacks are sent async via CallbackHandler#reset(), while Zk callbacks are queued
in ZkEventThread. It's possible that we are handling a FINALIZE callback before all Zk callbacks
are cleaned up. This creates race conditions, for example, in zk session expiry, when a GenericController
gets a FINALIZE callback, it cleans up all listeners using ZkClient#unsubscribe(), but Zk
callbacks  leftover in ZkEventThread comes later, and re-subscribe all listeners, causing
zk watcher leaking.

This is observed by setting up two controllers and expire the leader (by simulating a long
gc). The second controller takes the leadership and add all listeners, but when the former
leader recovers from gc, it gets leftover Zk callbacks and re-subscribe then live-instance
listener hence react to all live-instance changes, though it doesn't acquire the leadership.

    
> Race condition between FINALIZE callbacks and Zk Callbacks
> ----------------------------------------------------------
>
>                 Key: HELIX-195
>                 URL: https://issues.apache.org/jira/browse/HELIX-195
>             Project: Apache Helix
>          Issue Type: Sub-task
>            Reporter: dafu
>            Assignee: dafu
>
> FINALIZE callbacks are sent async via CallbackHandler#reset(), while Zk callbacks are
queued in ZkEventThread. It's possible that we are handling a FINALIZE callback before all
Zk callbacks are cleaned up. This creates race conditions, for example, in zk session expiry,
when a GenericController gets a FINALIZE callback, it cleans up all listeners using ZkClient#unsubscribe(),
but Zk callbacks  leftover in ZkEventThread comes later, and re-subscribe all listeners, causing
zk watcher leaking.
> This is observed by setting up two controllers and expire the leader (by simulating a
long gc). The second controller takes the leadership and add all listeners, but when the former
leader recovers from gc, it gets leftover Zk callbacks and re-subscribe the live-instance
listener hence react to all live-instance changes, though it doesn't acquire the leadership.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message