Return-Path: X-Original-To: apmail-curator-dev-archive@minotaur.apache.org Delivered-To: apmail-curator-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E4B1818BCD for ; Wed, 23 Mar 2016 00:18:25 +0000 (UTC) Received: (qmail 62161 invoked by uid 500); 23 Mar 2016 00:18:25 -0000 Delivered-To: apmail-curator-dev-archive@curator.apache.org Received: (qmail 62117 invoked by uid 500); 23 Mar 2016 00:18:25 -0000 Mailing-List: contact dev-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.apache.org Delivered-To: mailing list dev@curator.apache.org Received: (qmail 62105 invoked by uid 99); 23 Mar 2016 00:18:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Mar 2016 00:18:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7DCEE2C1F60 for ; Wed, 23 Mar 2016 00:18:25 +0000 (UTC) Date: Wed, 23 Mar 2016 00:18:25 +0000 (UTC) From: "Jian Fang (JIRA)" To: dev@curator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CURATOR-311) SharedValue could hold stall data when quourm membership changes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CURATOR-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207595#comment-15207595 ] Jian Fang edited comment on CURATOR-311 at 3/23/16 12:17 AM: ------------------------------------------------------------- I don't have time to create unit tests to reproduce this because it won't be easy. But I did observe this behavior very often in my clusters. For example, we have three EC2 instance m1, m2, and m3. For some reason, m3 is terminated, and a new EC2 instance m4 is provisioned to replace m3. We called zookeeper reconfig() API to update the membership, but unfortunately, from time to time, we observed that one old instance always read the stall data before the replacement (we updated the data after the replacement). Manually restarted the JVM or used the mechanism to force SharedValue to call readValue() when connection state changed did resolve this issue. I looked at the code and SharedValue only used the watcher to update the value in-memory. That is why I suspected that the watcher may be lost or the session reconnection logic did not handled the watcher properly. Anyway, I wonder why SharedValue only used the watcher for value updates. There are always race conditions in a distributed system to lose events or lose the watcher since the watcher is set based on each API call. Shouldn't a backup mechanism be used to prevent that from happening? was (Author: john.jian.fang): I don't have time to create unit tests to reproduce this because it won't be easy. But I did observed this behavior very often in my clusters. For example, we have three EC2 instance m1, m2, and m3. For some reason, m3 is terminated, and a new EC2 instance m4 is provisioned to replace m3. We called zookeeper reconfig() API to update the membership, but unfortunately, from time to time, we observed that one old instance always read the stall data before the replacement (we updated the data after the replacement). Manually restarted the JVM or used the mechanism to force SharedValue to call readValue() when connection state changed did resolve this issue. I looked at the code and SharedValue only used the watcher to update the value in-memory. That is why I suspected that the watcher may be lost or the session reconnection logic did not handled the watcher properly. Anyway, I wonder why SharedValue only used the watcher for value updates. There are always race conditions in a distributed system to lose events or lose the watcher since the watcher is set based on each API call. Shouldn't a backup mechanism be used to prevent that from happening? > SharedValue could hold stall data when quourm membership changes > ---------------------------------------------------------------- > > Key: CURATOR-311 > URL: https://issues.apache.org/jira/browse/CURATOR-311 > Project: Apache Curator > Issue Type: Bug > Components: Recipes > Affects Versions: 3.1.0 > Environment: Linux > Reporter: Jian Fang > > We run a Zookeeper 3.5.1-alpha quorum on EC2 instances and the quorum members could be changed, for example, one peer could be replaced by a new EC2 instance due to EC2 instance termination. We use Apache Curator 3.1.0 as the zookeeper client. During our testing, we found the SharedValue data structure could hold stall data during and after one peer is replaced and thus led to the system failure. > We look into the SharedValue code. Seems it always returns the value from an in-memory reference variable and the value is only updated by a watcher. If for any reason, the watch is lost, then the value would never get a chance to be updated again. > > Right now, we added a connection state listener to force SharedValue to call readValue(), i.e., read the data from zookeeper directly, if the connection state has been changed to RECONNECTED to work around this issue. > It would be great if this issue could be fixed in Curator directly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)