From zookeeper-user-return-148-apmail-hadoop-zookeeper-user-archive=hadoop.apache.org@hadoop.apache.org Tue Dec 16 22:49:20 2008 Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@locus.apache.org Received: (qmail 16311 invoked from network); 16 Dec 2008 22:49:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Dec 2008 22:49:20 -0000 Received: (qmail 90925 invoked by uid 500); 16 Dec 2008 22:49:33 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 90832 invoked by uid 500); 16 Dec 2008 22:49:32 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 90820 invoked by uid 99); 16 Dec 2008 22:49:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Dec 2008 14:49:32 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.171] (HELO mrout1.yahoo.com) (216.145.54.171) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Dec 2008 22:49:10 +0000 Received: from SNV-EXPF01.ds.corp.yahoo.com (snv-expf01.ds.corp.yahoo.com [207.126.227.250]) by mrout1.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id mBGMmNP9043145 for ; Tue, 16 Dec 2008 14:48:24 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:user-agent:date:subject:from:to:message-id: thread-topic:thread-index:in-reply-to:mime-version:content-type: content-transfer-encoding:return-path:x-originalarrivaltime; b=O/LVBAAGXwvEYsOnLA1/m8MvYjK9XVJJLIlYUIvIi62XmlIePjsdWO5LzkslfHuR Received: from SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.86]) by SNV-EXPF01.ds.corp.yahoo.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 16 Dec 2008 14:48:23 -0800 Received: from 10.73.146.106 ([10.73.146.106]) by SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.84]) via Exchange Front-End Server snv-webmail.corp.yahoo.com ([207.126.227.59]) with Microsoft Exchange Server HTTP-DAV ; Tue, 16 Dec 2008 22:47:43 +0000 User-Agent: Microsoft-Entourage/12.14.0.081024 Date: Tue, 16 Dec 2008 14:47:41 -0800 Subject: Re: What happens when a server loses all its state? From: Mahadev Konar To: Message-ID: Thread-Topic: What happens when a server loses all its state? Thread-Index: Aclf0E02MfRG5oah3kWBurD4kAZW6A== In-Reply-To: <494817DD.70800@sun.com> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-OriginalArrivalTime: 16 Dec 2008 22:48:23.0450 (UTC) FILETIME=[66837FA0:01C95FD0] X-Virus-Checked: Checked by ClamAV on apache.org Hi Thomas, > More generally, is it a safe assumption to make that the ZooKeeper > service will maintain all its guarantees if a minority of servers lose > persistent state (due to bad disks, etc) and restart at some point in > the future? Yes that is true. mahadev > > Thanks. > Mahadev Konar wrote: >> Hi Thomas, >> >> If a zookeeper server loses all state and their are enough servers in the >> ensemble to continue a zookeeper service ( like 2 servers in the case of >> ensemble of 3), then the server will get the latest snapshot from the leader >> and continue. >> >> >> The idea of zookeeper persisting its state on disk is just so that it does >> not lose state. All the guarantees that zookeeper makes is based on the >> understanding that we do not lose state of the data we store on the disk. >> >> >> Their might be problems if we lose the state that we stored on the disk. >> We might lose transactions that have been committed and the ensemble might >> start with some snapshot in the past. >> >> You might want ot read through how zookeeper internals work. This will help >> you understand on why the persistence guarantees are required. >> >> http://wiki.apache.org/hadoop-data/attachments/ZooKeeper(2f)ZooKeeperPresent >> ations/attachments/zk-talk-upc.pdf >> >> mahadev >> >> >> >> On 12/16/08 9:45 AM, "Thomas Vinod Johnson" wrote: >> >> >>> What is the expected behavior if a server in a ZooKeeper service >>> restarts with all its prior state lost? Empirically, everything seems to >>> work*. Is this something that one can count on, as part of ZooKeeper >>> design, or are there known conditions under which this could cause >>> problems, either liveness or violation of ZooKeeper guarantees? >>> >>> I'm really most interested in a situation where a single server loses >>> state, but insights into issues when more than one server loses state >>> and other interesting failure scenarios are appreciated. >>> >>> Thanks. >>> >>> * The restarted server appears to catch up to the latest snapshot (from >>> the current leader?). >>> >> >> >