zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <teddyyyy...@gmail.com>
Subject question on ZAB protocol
Date Wed, 13 Jul 2011 01:45:50 GMT
I read the ZAB paper before, and never realized this question, but
find out today that I can't answer why, so I'm bringing it up here.

according to the paper

B. Reed and F. P. Junqueira. A simple totally ordered broadcast
protocol. In LADIS ’08: Proceedings of the 2nd Workshop
on Large-Scale Distributed Systems and Middleware, pages 1–6,
New York, NY, USA, 2008. ACM.

the leader broadcasts a write to all replicas, and then waits for a
quorum to reply, before sending out the COMMIT.
why is the quorum necessary (i.e. why can't the leader just wait for
one reply and start sending the COMMIT?)??

now that I think about it, it seems that waiting for just one reply is
enough, because the connection from leader to replicas are FIFO, as
long as the replicas do not die,
they will eventually get the writes, even though the writes arrive at
them after the leader starts the COMMIT.

the only reason I can think of  for using a quorum is to tolerate more
failures: if the only replied replica  dies, and leader dies, then we
lose that  latest write.
by requiring f ACKs, you can tolerate f-1 failures. but then you don't
really need 2f+1 nodes in the ZK cluster, just f+1 is enough.

Thanks a lot

View raw message