zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From daidong <daidon...@gmail.com>
Subject Re: RE: RE: Problems about Zab protocol
Date Sat, 23 Apr 2011 04:55:27 GMT
Hi, Alex

Thanks for your reply and Flavio's 

I think i finally get the idea. :)

Would it be appropriate to see the ZAB as a 3PC without the READY/WAIT status? As all the
participators will reply VOTE_COMMIT (they do not abort...).

I will read the source code and hope can do some stuff with ZAB. Thanks a lot for all the
replies.
-- 
daidong
On 2011年4月22日星期五 at 上午3:54, Alexander Shraer [via zookeeper-user] wrote: 
>  Hi Daidong, 
> 
> In addition to Flavio's response, I'll try to address some of your specific questions.

> 
> > In my opinion, an atomic broadcast protocol must guarantee all the non- 
> > faulty servers have the same status eventually. So in the 2PC protocol, 
> > the coordinator must block until "all" the servers reply "ok". 
> 
> Designed this way, the protocol wouldn't be able to tolerate any failures - the leader
could block 
> waiting for a response from a server that had crashed. The idea is to receive enough
"ok" messages 
> to guarantee that even if a minority of servers crash, the information is still not lost.
That's why 
> the leader waits for a majority of acks. Messages are still sent to all followers, so
they will eventually 
> get them (or if they disconnect they will later reconnect and synch with the leader automatically).

> 
> Regarding your second question - formally, sequential consistency guarantees that operations
of each client take effect in the order 
> they were submitted by the client - so a client's read is guaranteed to see its own last
complete write. 
> In the example you mention, the client first executes a create() and then getChildren().
If clients C1 and C2 both submit a create() 
> concurrently, one of these requests will reach the leader and will be scheduled by the
leader before the other one, suppose the create() request of C1. 
> Then, when C2 is notified about the completion of its own create, FIFO ensures that it
also finds out about any operation that completed before that create() 
> (these messages were sent by the leader earlier). So when C2 finally runs getChildren(),
its local state will already have every operation that was scheduled 
> by the leader before its own create() completed. 
> 
> In general, ZAB implements state-machine replication by executing consensus on each operation.
To understand the general idea, 
> I recommend reading Lamport's "Paxos made simple" paper I sent earlier - it has a constructive
explanation of this 
> (although the algorithm is somewhat different from ZAB). 
> 
> Alex 
> 
> > -----Original Message----- 
> > From: daidong [mailto:] 
> > Sent: Wednesday, April 20, 2011 11:31 PM 
> > To: [hidden email] 
> > Subject: Re: RE: Problems about Zab protocol 
> > 
> > Hi, Alex 
> > 
> > Thanks for your reply. :) 
> > 
> > I knew ZAB has two modes, but things i do not quit understand focus on 
> > the broadcast mode. In the ZAB paper, authors said ZAB is a simple 
> > version of two phases commit protocol because we don't have abort 
> > actions in followers. I do not quit understand this. 
> > 
> > In my opinion, an atomic broadcast protocol must guarantee all the non- 
> > faulty servers have the same status eventually. So in the 2PC protocol, 
> > the coordinator must block until "all" the servers reply "ok". If there 
> > is not any abort too, consider the situation that we have a very slow 
> > follower F who processes messages slower than other followers. 
> > According TCP and FIFO channel, We can say all the messages will be 
> > processed orderly in F, however, the messages will assemble if 
> > coordinator continues to broadcasting. What happens if the receive 
> > buffer in F is overflow? 
> > 
> > Is there any mechanism i have not noticed to avoid this situation in 
> > ZAB? 
> > 
> > About my second questions, I read the consistency guarantees section, 
> > thanks for your tips. I still have a question, if zookeeper do not make 
> > sure that all the clients will see the latest value, how the lock 
> > mechanism works? i checked the recipe example code in Zookeeper 3.3.3, 
> > when a client try to get the write lock, it does not sync() before call 
> > getChildren(). If other client has created a ephemeral node with the 
> > lowest number suffix, this client does not get this information as 
> > getChildren() do not sync with leader. Is there any possibility that 
> > two clients will think they both got the lock? 
> > 
> > Thanks for any words. :) 
> > -- 
> > daidong 
> > Sent with Sparrow 
> > On 2011年4月21日星期四 at 上午2:30, Alexander Shraer [via zookeeper- 
> > user] wrote: 
> > > Hi, 
> > > 
> > > Regarding your first question - ZAB has two parts - the broadcast 
> > protocol you mention, 
> > > which is executed by a leader, and the leader election protocol, 
> > which recovers from a leader failure. 
> > > This is similar to the way other state-machine replication algorithms 
> > work, where you have 
> > > a fast normal mode and a slower recovery mode (you don't need to 
> > execute both all the time - only when the leader fails). 
> > > See Paxos state-machine replication for example (section 3): 
> > http://research.microsoft.com/en-
> > us/um/people/lamport/pubs/pubs.html#paxos-simple 
> > > 
> > > Regarding your second question - Zookeeper basically guarantees so 
> > called "sequential consistency" semantics. 
> > > This guarantees that the real execution looks to clients like some 
> > sequential execution in which 
> > > the operations of every client appear in the order they were 
> > submitted. It does not guarantee that a read of one client 
> > > returns the latest value written by another client. This allows reads 
> > to be executed locally. If you need to return the latest 
> > > state, you can use the sync() call which flushes the pending updates 
> > between the leader and a follower. 
> > > See also the "consistency guarantees" section here: 
> > > 
> > http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.htm
> > l 
> > > 
> > > Alex 
> > > 
> > > > -----Original Message----- 
> > > > From: daidong [mailto:[hidden email]] 
> > > > Sent: Wednesday, April 20, 2011 2:38 AM 
> > > > To: [hidden email] 
> > > > Subject: Problems about Zab protocol 
> > > > 
> > > > Hi, everyone. 
> > > > 
> > > > Recently, i read the paper "a simple total ordered broadcast 
> > protocol" 
> > > > and 
> > > > there are some problems i can not figure out. Hope anyone can help 
> > > > me... :P 
> > > > 
> > > > The paper describes the Zab protocol as a 2 phase commit protocol 
> > when 
> > > > system is under broadcast mode. However some paper(Skeen 82, "A 
> > Quorum 
> > > > Based 
> > > > Commit Protocol") has mentioned if we want to extend an 2PC to 
> > adapt a 
> > > > quorum based commit protocol we must introduce a three phase commit 
> > > > protocol(In fact, i haven't quit understood this, :( ). However 
> > > > according 
> > > > Zab paper, this still can be done. Why and how to do this? 
> > > > 
> > > > Secondly, even Zookeeper can guarantee that status in different 
> > > > followers 
> > > > are consistent. However, this consistency only works among a quorum 
> > of 
> > > > followers that has acked the COMMIT. As the client can connect to 
> > any 
> > > > followers when perform reading action, so what happens if the 
> > client 
> > > > happens 
> > > > to connect with the follower that has not acked the COMMIT? I can 
> > not 
> > > > find 
> > > > the information in this paper... 
> > > > 
> > > > If i ask some naive question, Hope anybody can tell me where i can 
> > find 
> > > > the 
> > > > answer or some suggestions, thanks :) 
> > > > 
> > > > 
> > > > -- 
> > > > View this message in context: http://zookeeper-
> > > > user.578899.n2.nabble.com/Problems-about-Zab-protocol- 
> > > > tp6290102p6290102.html 
> > > > Sent from the zookeeper-user mailing list archive at Nabble.com. 
> > > 
> > > 
> > > If you reply to this email, your message will be added to the 
> > discussion below: http://zookeeper-user.578899.n2.nabble.com/Problems-
> > about-Zab-protocol-tp6290102p6291775.html 
> > > To unsubscribe from Problems about Zab protocol, click here. 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > -- 
> > View this message in context: http://zookeeper-
> > user.578899.n2.nabble.com/Problems-about-Zab-protocol- 
> > tp6290102p6293369.html 
> > Sent from the zookeeper-user mailing list archive at Nabble.com. 
> 
> 
> If you reply to this email, your message will be added to the discussion below: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6295361.html

>  To unsubscribe from Problems about Zab protocol, click here. 
> 
> 
> 



--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6298861.html
Sent from the zookeeper-user mailing list archive at Nabble.com.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message