Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@locus.apache.org Received: (qmail 49752 invoked from network); 18 Dec 2008 23:00:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Dec 2008 23:00:25 -0000 Received: (qmail 19634 invoked by uid 500); 18 Dec 2008 23:00:37 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 19618 invoked by uid 500); 18 Dec 2008 23:00:37 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 19607 invoked by uid 99); 18 Dec 2008 23:00:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Dec 2008 15:00:37 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=NO_RDNS_DOTCOM_HELO,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.172] (HELO mrout2.yahoo.com) (216.145.54.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Dec 2008 23:00:15 +0000 Received: from SP1-EX07CAS02.ds.corp.yahoo.com (sp1-ex07cas02.ds.corp.yahoo.com [216.252.116.138]) by mrout2.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id mBIMxcpl040192 for ; Thu, 18 Dec 2008 14:59:38 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:from:to:date:subject:thread-topic:thread-index: message-id:references:in-reply-to:accept-language: content-language:x-ms-has-attach:x-ms-tnef-correlator:acceptlanguage: content-type:content-transfer-encoding:mime-version; b=RVIeXWkKVkWPrCU8Cuv7oIgA3eplYbmPJclqtotHqWSw2LtfQGodbdoqNf7Z70gG Received: from SP1-EX07VS02.ds.corp.yahoo.com ([216.252.116.135]) by SP1-EX07CAS02.ds.corp.yahoo.com ([216.252.116.138]) with mapi; Thu, 18 Dec 2008 14:59:38 -0800 From: Benjamin Reed To: "zookeeper-user@hadoop.apache.org" Date: Thu, 18 Dec 2008 14:59:37 -0800 Subject: RE: What happens when a server loses all its state? Thread-Topic: What happens when a server loses all its state? Thread-Index: Aclgmi95aqVOvHXpTuiHTLo9X0iYRwAygLfQ Message-ID: <6990D2A1CAF07E40A7CFE68A5FAAA15317CF9BE2B6@SP1-EX07VS02.ds.corp.yahoo.com> References: <49484193.60300@sun.com> <6990D2A1CAF07E40A7CFE68A5FAAA15317C7FCF519@SP1-EX07VS02.ds.corp.yahoo.com> <4949828E.5070407@sun.com> In-Reply-To: <4949828E.5070407@sun.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org I have opened ZOOKEEPER-261 for this issue. it shouldn't be too hard to fix= and it would be nice to target for 3.1. ben -----Original Message----- From: Thomas.Johnson@Sun.COM [mailto:Thomas.Johnson@Sun.COM]=20 Sent: Wednesday, December 17, 2008 2:52 PM To: zookeeper-user@hadoop.apache.org Subject: Re: What happens when a server loses all its state? Thanks for all the responses. Benjamin Reed wrote: > Thomas,=20 > > in the scenario you give you have two simultaneous failures with 3 nodes,= so it will not recover correctly. A is failed because it is not up. B has = failed because it lost all its data. > > it would be good for ZooKeeper to not come up in that scenario. perhaps w= hat we need is something similar to your safe state proposal. basically a s= erver that has forgotten everything should not be allowed to vote in the le= ader election. that would avoid your scenario. we just need to put a flag f= ile in the data directory to say that the data is valid and thus can vote. > > ben > ________________________________________ > =20 Would this feature be something you'd consider implementing in the short=20 to medium term?