curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian Fang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CURATOR-311) SharedValue could hold stall data when quourm membership changes
Date Mon, 28 Mar 2016 23:52:25 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207595#comment-15207595
] 

Jian Fang edited comment on CURATOR-311 at 3/28/16 11:51 PM:
-------------------------------------------------------------

I don't have time to create unit tests to reproduce this because it won't be easy. But I did
observe this behavior very often in my clusters. 

For example, we have three EC2 instances, i.e., m1, m2, and m3. Each instance runs a zookeeper
peer. For some reason, m3 is terminated and a new EC2 instance m4 is provisioned to replace
m3. We called zookeeper reconfig() API to update the membership. But unfortunately, from time
to time, we observed that one process that used the curator client read the stall data before
the replacement (we updated the data after the replacement).  Thus, that process stuck with
old data and failed the whole system. Manually restarted the process or used the mechanism
I described above to force SharedValue to call readValue() when connection state changed did
resolve this issue.

I looked at the code. SharedValue only used the watcher to update the value in-memory. That
is why I suspected that the watcher may be lost or the session reconnection logic did not
handle the watcher properly.  

Anyway, I wonder why SharedValue only used the watcher for value updates. There are always
race conditions in a distributed system to lose events or lose the watcher since the watcher
is set for each zookeeper API call. For example, if a Zookeeper session expires and a new
session could not be established in time, then the events would be lost. Furthermore, if the
session handling logic could not re-register the watcher properly, the value update would
never happen anymore.  From my own experiences, anything could happen in production. Shouldn't
a backup mechanism be used to check the data under certain conditions or make sure that the
watcher is re-registered properly after a new session is established?



was (Author: john.jian.fang):
I don't have time to create unit tests to reproduce this because it won't be easy. But I did
observe this behavior very often in my clusters. 

For example, we have three EC2 instances, i.e., m1, m2, and m3. Each instance runs a zookeeper
peer. For some reason, m3 is terminated and a new EC2 instance m4 is provisioned to replace
m3. We called zookeeper reconfig() API to update the membership. But unfortunately, from time
to time, we observed that one process that used the curator client read the stall data before
the replacement (we updated the data after the replacement).  Thus, that process stuck with
old data and failed the whole system. Manually restarted the process or used the mechanism
I described above to force SharedValue to call readValue() when connection state changed did
resolve this issue.

I looked at the code. SharedValue only used the watcher to update the value in-memory. That
is why I suspected that the watcher may be lost or the session reconnection logic did not
handled the watcher properly.  

Anyway, I wonder why SharedValue only used the watcher for value updates. There are always
race conditions in a distributed system to lose events or lose the watcher since the watcher
is set for each zookeeper API call. For example, if a Zookeeper session expires and a new
session could not be established in time, then the events would be lost. Furthermore, if the
session handling logic could not re-register the watcher properly, the value update would
never happen anymore.  From my own experiences, anything could happen in production. Shouldn't
a backup mechanism be used to check the data under certain conditions or make sure that the
watcher is re-registered properly after a new session is established?


> SharedValue could hold stall data when quourm membership changes
> ----------------------------------------------------------------
>
>                 Key: CURATOR-311
>                 URL: https://issues.apache.org/jira/browse/CURATOR-311
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 3.1.0
>         Environment: Linux
>            Reporter: Jian Fang
>
> We run a Zookeeper 3.5.1-alpha quorum on EC2 instances and the quorum members could be
changed, for example, one peer could be replaced by a new EC2 instance due to EC2 instance
termination. We use Apache Curator 3.1.0 as the zookeeper client. During our testing, we found
the SharedValue data structure could hold stall data during and after one peer is replaced
and thus led to the system failure. 
> We look into the SharedValue code. Seems it always returns the value from an in-memory
reference variable and the value is only updated by a watcher. If for any reason, the watch
is lost, then the value would never get a chance to be updated again.
>  
> Right now, we added a connection state listener to force SharedValue to call readValue(),
i.e., read the data from zookeeper directly, if the connection state has been changed to RECONNECTED
to work around this issue.
> It would be great if this issue could be fixed in Curator directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message