From user-return-2542-apmail-zookeeper-user-archive=zookeeper.apache.org@zookeeper.apache.org Thu Dec 09 23:11:04 2010 Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 65284 invoked from network); 9 Dec 2010 23:11:02 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Dec 2010 23:11:02 -0000 Received: (qmail 86304 invoked by uid 500); 9 Dec 2010 23:11:02 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 86277 invoked by uid 500); 9 Dec 2010 23:11:02 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 86269 invoked by uid 99); 9 Dec 2010 23:11:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Dec 2010 23:11:02 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.161.45] (HELO mail-fx0-f45.google.com) (209.85.161.45) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Dec 2010 23:10:55 +0000 Received: by fxm12 with SMTP id 12so3119471fxm.18 for ; Thu, 09 Dec 2010 15:10:34 -0800 (PST) Received: by 10.223.112.1 with SMTP id u1mr16808fap.109.1291936234719; Thu, 09 Dec 2010 15:10:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.105.17 with HTTP; Thu, 9 Dec 2010 15:10:13 -0800 (PST) In-Reply-To: References: From: Henry Robinson Date: Thu, 9 Dec 2010 15:10:13 -0800 Message-ID: Subject: Re: Failure scenarios and consequences To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary=001636c5a4e557c90e0497025893 X-Virus-Checked: Checked by ClamAV on apache.org --001636c5a4e557c90e0497025893 Content-Type: text/plain; charset=ISO-8859-1 Hi Jeremy - One note in-line: On 9 December 2010 12:04, Mahadev Konar wrote: > Hi Jeremy, > Responses in line below: > > On 12/9/10 11:53 AM, "Jeremy Hanna" wrote: > > I looked around on the wiki and in the user list archives and couldn't find > something definitive about certain failure scenarios. > > A partition splits the ensemble where a quorum is on one side of the > partition > -- if the leader is on the quorum side of the partition, what happens to > reads/writes that go to the non-quorum side? I assume writes return errors > because it can't get to the leader. Reads? > > > The reads will also fail on all the quorum nodes until a new quorum is > elected. > > This is true, but since reads are served locally and are not serialised by the leader I believe there is a small time window during which a network partition may have occurred and a follower may not have realised it, so the follower keeps on serving reads for slightly longer than it would serve writes for. In most cases the time of failure detection is very short, so this wouldn't be obvious, but if you turned down the ping frequency from followers to the leader then you could engineer an arbitrarily large gap when reads would be served. Note that no consistency guarantees are violated here because it's legal to serve a stale value as long as you yourself haven't overwritten it. Overwriting it would trigger a failure detection and no subsequent reads would be served. Writes are guaranteed not to get through on the smaller side of the partition, because every write must be acknowledged by a quorum of nodes before it is committed. In the case of a network partition, this is obviously not possible on the smaller side. Henry > -- if the leader is on the non-quorum side of the partition, I would assume > that the quorum side of the partition would elect a new leader for those > clients on its side of the partition. However, is there the possibility for > the leader on the non-quorum side to accept writes before it realizes that > there's no longer a quorum? Just wondering about the possibility of > corruption and then when the cluster syncs back up how the cluster would > handle that data. > > > No there isnt. The leader relinquishes its right as a leader as soon as > it realizes a quorum isnt committing the changes it proposed. > > (I would be happy to create a wiki page for failure scenarios if one > doesn't exist that people could add to, but maybe this is just common > knowledge.) > > > Please do! > > thanks > mahadev > -- Henry Robinson Software Engineer Cloudera 415-994-6679 --001636c5a4e557c90e0497025893--