hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahadev Konar <maha...@yahoo-inc.com>
Subject Re: What happens when a server loses all its state?
Date Tue, 16 Dec 2008 22:47:41 GMT
Hi Thomas,



> More generally, is it a safe assumption to make that the ZooKeeper
> service will maintain all its guarantees if a minority of servers lose
> persistent state (due to bad disks, etc) and restart at some point in
> the future?
Yes that is true. 

mahadev

> 
> Thanks.
> Mahadev Konar wrote:
>> Hi Thomas,
>> 
>> If a zookeeper server loses all state and their are enough servers in the
>> ensemble to continue a zookeeper service ( like 2 servers in the case of
>> ensemble of 3), then the server will get the latest snapshot from the leader
>> and continue.
>> 
>> 
>> The idea of zookeeper persisting its state on disk is just so that it does
>> not lose state. All the guarantees that zookeeper makes is based on the
>> understanding that we do not lose state of the data we store on the disk.
>> 
>> 
>> Their might be problems if we lose the state that we stored on the disk.
>> We might lose transactions that have been committed and the ensemble might
>> start with some snapshot in the past.
>> 
>> You might want ot read through how zookeeper internals work. This will help
>> you understand on why the persistence guarantees are required.
>> 
>> http://wiki.apache.org/hadoop-data/attachments/ZooKeeper(2f)ZooKeeperPresent
>> ations/attachments/zk-talk-upc.pdf
>> 
>> mahadev
>> 
>> 
>> 
>> On 12/16/08 9:45 AM, "Thomas Vinod Johnson" <Thomas.Johnson@Sun.COM> wrote:
>> 
>>   
>>> What is the expected behavior if a server in a ZooKeeper service
>>> restarts with all its prior state lost? Empirically, everything seems to
>>> work*.  Is this something that one can count on, as part of ZooKeeper
>>> design, or are there known conditions under which this could cause
>>> problems, either liveness or violation of ZooKeeper guarantees?
>>> 
>>> I'm really most interested in a situation where a single server loses
>>> state, but insights into issues when more than one server loses state
>>> and other interesting failure scenarios are appreciated.
>>> 
>>> Thanks.
>>> 
>>> * The restarted server appears to catch up to the latest snapshot (from
>>> the current leader?).
>>>     
>> 
>>   
> 


Mime
View raw message