Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@locus.apache.org Received: (qmail 18252 invoked from network); 16 Dec 2008 18:41:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Dec 2008 18:41:02 -0000 Received: (qmail 21015 invoked by uid 500); 16 Dec 2008 18:41:15 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 20926 invoked by uid 500); 16 Dec 2008 18:41:14 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 20915 invoked by uid 99); 16 Dec 2008 18:41:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Dec 2008 10:41:14 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [216.145.54.173] (HELO mrout3.yahoo.com) (216.145.54.173) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Dec 2008 18:40:56 +0000 Received: from SNV-EXBH01.ds.corp.yahoo.com (snv-exbh01.ds.corp.yahoo.com [207.126.227.249]) by mrout3.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id mBGIeKIC062841 for ; Tue, 16 Dec 2008 10:40:20 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:user-agent:date:subject:from:to:message-id: thread-topic:thread-index:in-reply-to:mime-version:content-type: content-transfer-encoding:return-path:x-originalarrivaltime; b=sS7q4G5dEdnlUaCaj255e7OvBGH+FM0dgvwVx/FYLEdiM7xgEZJC7YR3YzPzjrTO Received: from SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.86]) by SNV-EXBH01.ds.corp.yahoo.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 16 Dec 2008 10:40:20 -0800 Received: from 10.73.146.106 ([10.73.146.106]) by SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.84]) via Exchange Front-End Server snv-webmail.corp.yahoo.com ([207.126.227.59]) with Microsoft Exchange Server HTTP-DAV ; Tue, 16 Dec 2008 18:39:37 +0000 User-Agent: Microsoft-Entourage/12.14.0.081024 Date: Tue, 16 Dec 2008 10:39:37 -0800 Subject: Re: What happens when a server loses all its state? From: Mahadev Konar To: Message-ID: Thread-Topic: What happens when a server loses all its state? Thread-Index: AclfraWn6mUi487zN0C+kgk501/STw== In-Reply-To: <4947E942.3020209@sun.com> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-OriginalArrivalTime: 16 Dec 2008 18:40:20.0459 (UTC) FILETIME=[BF8F4BB0:01C95FAD] X-Virus-Checked: Checked by ClamAV on apache.org Hi Thomas, If a zookeeper server loses all state and their are enough servers in the ensemble to continue a zookeeper service ( like 2 servers in the case of ensemble of 3), then the server will get the latest snapshot from the leader and continue. The idea of zookeeper persisting its state on disk is just so that it does not lose state. All the guarantees that zookeeper makes is based on the understanding that we do not lose state of the data we store on the disk. Their might be problems if we lose the state that we stored on the disk. We might lose transactions that have been committed and the ensemble might start with some snapshot in the past. You might want ot read through how zookeeper internals work. This will help you understand on why the persistence guarantees are required. http://wiki.apache.org/hadoop-data/attachments/ZooKeeper(2f)ZooKeeperPresent ations/attachments/zk-talk-upc.pdf mahadev On 12/16/08 9:45 AM, "Thomas Vinod Johnson" wrote: > What is the expected behavior if a server in a ZooKeeper service > restarts with all its prior state lost? Empirically, everything seems to > work*. Is this something that one can count on, as part of ZooKeeper > design, or are there known conditions under which this could cause > problems, either liveness or violation of ZooKeeper guarantees? > > I'm really most interested in a situation where a single server loses > state, but insights into issues when more than one server loses state > and other interesting failure scenarios are appreciated. > > Thanks. > > * The restarted server appears to catch up to the latest snapshot (from > the current leader?).