Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: zookeeper-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns;
	h=received:user-agent:date:subject:from:to:message-id:
	thread-topic:thread-index:in-reply-to:mime-version:content-type:
	content-transfer-encoding:return-path:x-originalarrivaltime;
	b=sS7q4G5dEdnlUaCaj255e7OvBGH+FM0dgvwVx/FYLEdiM7xgEZJC7YR3YzPzjrTO
User-Agent: Microsoft-Entourage/12.14.0.081024
Date: Tue, 16 Dec 2008 10:39:37 -0800
Subject: Re: What happens when a server loses all its state?
From: Mahadev Konar <mahadev@yahoo-inc.com>
To: <zookeeper-user@hadoop.apache.org>
Message-ID: <C56D35E9.15C77%mahadev@yahoo-inc.com>
Thread-Topic: What happens when a server loses all its state?
Thread-Index: AclfraWn6mUi487zN0C+kgk501/STw==
In-Reply-To: <4947E942.3020209@sun.com>
Mime-version: 1.0
Content-type: text/plain;
	charset="US-ASCII"
Content-transfer-encoding: 7bit

Hi Thomas,

If a zookeeper server loses all state and their are enough servers in the
ensemble to continue a zookeeper service ( like 2 servers in the case of
ensemble of 3), then the server will get the latest snapshot from the leader
and continue.


The idea of zookeeper persisting its state on disk is just so that it does
not lose state. All the guarantees that zookeeper makes is based on the
understanding that we do not lose state of the data we store on the disk.


Their might be problems if we lose the state that we stored on the disk.
We might lose transactions that have been committed and the ensemble might
start with some snapshot in the past.

You might want ot read through how zookeeper internals work. This will help
you understand on why the persistence guarantees are required.

http://wiki.apache.org/hadoop-data/attachments/ZooKeeper(2f)ZooKeeperPresent
ations/attachments/zk-talk-upc.pdf

mahadev


On 12/16/08 9:45 AM, "Thomas Vinod Johnson" <Thomas.Johnson@Sun.COM> wrote:

> What is the expected behavior if a server in a ZooKeeper service
> restarts with all its prior state lost? Empirically, everything seems to
> work*.  Is this something that one can count on, as part of ZooKeeper
> design, or are there known conditions under which this could cause
> problems, either liveness or violation of ZooKeeper guarantees?
> 
> I'm really most interested in a situation where a single server loses
> state, but insights into issues when more than one server loses state
> and other interesting failure scenarios are appreciated.
> 
> Thanks.
> 
> * The restarted server appears to catch up to the latest snapshot (from
> the current leader?).