zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Reed <br...@yahoo-inc.com>
Subject RE: What happens when a server loses all its state?
Date Wed, 17 Dec 2008 19:47:35 GMT

in the scenario you give you have two simultaneous failures with 3 nodes, so it will not recover
correctly. A is failed because it is not up. B has failed because it lost all its data.

it would be good for ZooKeeper to not come up in that scenario. perhaps what we need is something
similar to your safe state proposal. basically a server that has forgotten everything should
not be allowed to vote in the leader election. that would avoid your scenario. we just need
to put a flag file in the data directory to say that the data is valid and thus can vote.

From: Thomas.Johnson@Sun.COM [Thomas.Johnson@Sun.COM]
Sent: Tuesday, December 16, 2008 4:02 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: What happens when a server loses all its state?

Mahadev Konar wrote:
> Hi Thomas,
>> More generally, is it a safe assumption to make that the ZooKeeper
>> service will maintain all its guarantees if a minority of servers lose
>> persistent state (due to bad disks, etc) and restart at some point in
>> the future?
> Yes that is true.
Great - thanks Mahadev.

Not to drag this on more than necessary, please bear with me for one
more example of 'amnesia' that comes to mind. I have a set of ZooKeeper
servers A, B, C.
- C is currently not running, A is the leader, B is the follower.
- A proposes zxid1 to A and B, both acknowledge.
- A asks A to commit (which it persists), but before the same commit
request reaches B, all servers go down (say a power failure).
- Later, B and C come up (A is slow to reboot), but B has lost all state
due to disk failure.
- C becomes the new leader and perhaps continues with some more new

Likely I'm misunderstanding the protocol, but have I effectively lost
zxid1 at this point? What would happen when A comes back up?


View raw message