Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 90243 invoked from network); 9 Dec 2010 21:42:52 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Dec 2010 21:42:52 -0000 Received: (qmail 57176 invoked by uid 500); 9 Dec 2010 21:42:52 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 57141 invoked by uid 500); 9 Dec 2010 21:42:52 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 57133 invoked by uid 99); 9 Dec 2010 21:42:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Dec 2010 21:42:52 +0000 X-ASF-Spam-Status: No, hits=3.3 required=10.0 tests=HTML_MESSAGE,NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [216.145.54.173] (HELO mrout3.yahoo.com) (216.145.54.173) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Dec 2010 21:42:41 +0000 Received: from SP2-EX07CAS03.ds.corp.yahoo.com (sp2-ex07cas03.corp.sp2.yahoo.com [98.137.59.35]) by mrout3.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id oB9Lg2op056918 for ; Thu, 9 Dec 2010 13:42:02 -0800 (PST) Received: from SP2-EX07VS04.ds.corp.yahoo.com ([98.137.59.33]) by SP2-EX07CAS03.ds.corp.yahoo.com ([98.137.59.35]) with mapi; Thu, 9 Dec 2010 13:42:02 -0800 From: Mahadev Konar To: "user@zookeeper.apache.org" Date: Thu, 9 Dec 2010 13:42:00 -0800 Subject: Re: Failure scenarios and consequences Thread-Topic: Failure scenarios and consequences Thread-Index: AcuX6T2z9Nt/hkO/RjGRYzNcxYecDQAAKsmd Message-ID: In-Reply-To: <48F261C8-F803-4954-910E-56521D3E8201@gmail.com> Accept-Language: en, en-US Content-Language: en X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en, en-US Content-Type: multipart/alternative; boundary="_000_C9268D2847DFAmahadevyahooinccom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_C9268D2847DFAmahadevyahooinccom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I just read through the wiki. It seems fine to me. Please feel free to add = documentation to that wiki and get it reviewed by folks on the list. Thanks mahadev On 12/9/10 1:29 PM, "Jeremy Hanna" wrote: I created a link off of the main wiki and the page itself: http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios Would someone please review it? Specifically, I am curious to know about t= his: "if the leader is in the non-quorum side of the partition, that side of the= partition will recognize that it no longer has a quorum of the ensemble. T= he leader will be demoted to being a regular ZooKeeper server and those nod= es will no longer accept reads or writes." I just wanted to clarify - in the time for the non-quorum side to recognize= it is no longer a quorum, will there ever be writes that get through? Is = it guaranteed that it won't accept writes after the partition? I don't thi= nk that guarantee can exist, but wondered how to handle that. On Dec 9, 2010, at 2:04 PM, Mahadev Konar wrote: > Hi Jeremy, > Responses in line below: > > On 12/9/10 11:53 AM, "Jeremy Hanna" wrote: > > I looked around on the wiki and in the user list archives and couldn't fi= nd something definitive about certain failure scenarios. > > A partition splits the ensemble where a quorum is on one side of the part= ition > -- if the leader is on the quorum side of the partition, what happens to = reads/writes that go to the non-quorum side? I assume writes return errors= because it can't get to the leader. Reads? > >> The reads will also fail on all the quorum nodes until a new quorum is e= lected. > > -- if the leader is on the non-quorum side of the partition, I would assu= me that the quorum side of the partition would elect a new leader for those= clients on its side of the partition. However, is there the possibility f= or the leader on the non-quorum side to accept writes before it realizes th= at there's no longer a quorum? Just wondering about the possibility of cor= ruption and then when the cluster syncs back up how the cluster would handl= e that data. > >> No there isnt. The leader relinquishes its right as a leader as soon as = it realizes a quorum isnt committing the changes it proposed. > > (I would be happy to create a wiki page for failure scenarios if one does= n't exist that people could add to, but maybe this is just common knowledge= .) > >> Please do! > > thanks > mahadev --_000_C9268D2847DFAmahadevyahooinccom_--