zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <teddyyyy...@gmail.com>
Subject Re: question on ZAB protocol
Date Wed, 13 Jul 2011 01:46:46 GMT
btw, to give proper credit, I thought about this question after reading

http://www.vldb.org/pvldb/vol4/p243-rao.pdf

which actually just waits for 1 reply

On Tue, Jul 12, 2011 at 6:45 PM, Yang <teddyyyy123@gmail.com> wrote:
> I read the ZAB paper before, and never realized this question, but
> find out today that I can't answer why, so I'm bringing it up here.
>
> according to the paper
>
> B. Reed and F. P. Junqueira. A simple totally ordered broadcast
> protocol. In LADIS ’08: Proceedings of the 2nd Workshop
> on Large-Scale Distributed Systems and Middleware, pages 1–6,
> New York, NY, USA, 2008. ACM.
>
>
>
> the leader broadcasts a write to all replicas, and then waits for a
> quorum to reply, before sending out the COMMIT.
> why is the quorum necessary (i.e. why can't the leader just wait for
> one reply and start sending the COMMIT?)??
>
> now that I think about it, it seems that waiting for just one reply is
> enough, because the connection from leader to replicas are FIFO, as
> long as the replicas do not die,
> they will eventually get the writes, even though the writes arrive at
> them after the leader starts the COMMIT.
>
> the only reason I can think of  for using a quorum is to tolerate more
> failures: if the only replied replica  dies, and leader dies, then we
> lose that  latest write.
> by requiring f ACKs, you can tolerate f-1 failures. but then you don't
> really need 2f+1 nodes in the ZK cluster, just f+1 is enough.
>
>
> Thanks a lot
> Yang
>

Mime
View raw message