zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rakesh R <rake...@huawei.com>
Subject RE: Question about the two-phrase commit
Date Mon, 05 Jan 2015 10:55:25 GMT

In your case only A and E has committed the latest transaction say am calling it as txid=1000.
B, C, D servers are down at this time and doesn't have the changes of txid=1000. 
Also, when restarting B,C,D the servers A, E are not available. Now the newly elected Leader
is seeing atmost txid=999 and when A, E rejoins the quorum it will 'truncate' himself by deleting
the txid=1000. As you said, the write operation performed will be lost in this case.

I could see this is a kinda tricky case of double failures or multiple failures. But I agree
this can happen. 
My point is, if user wants to maintain a reliable cluster then he should keep in mind that
the failures more than the tolerated number of failures may leads to unexpected results like

Best Regards,
-----Original Message-----
From: bit1129@163.com [mailto:bit1129@163.com] 
Sent: 05 January 2015 15:56
To: user@zookeeper.apache.org
Subject: Re: Question about the two-phrase commit

Could someone help on this question? Thanks.

From: bit1129@163.com
Date: 2015-01-05 15:05
To: user@zookeeper.apache.org
Subject: Question about the two-phrase commit


I got a question about the two phrase commit in Zookeeper. When a write operation happens

1. Leader proposes all the followers to accept the change(Proposal Vote phrase) 2. Followers
ack the proposal and writes the change to the disk(but not persisted yet?) 3. When the Leader
receives the majority of acks from followers, the Leader asks the followers to commit the
change 4. When each follower receives the commit request, follower commits the changes(persist
the change for ever?)

In the above process, something rare could happen a. Say,there are 5 nodes in the quorum(1
leader E, 4 follower A,B,C,D).
b. The write operation is issued by the client that connects to Follower A c. A commits the
changes and response to the client that the writer succeeds. 
d. Assume that When the response from A is  back to client telling the client that the write
is successful, But in the period, the other followers (B,C,D) haven't even received the commit
request, and B,C,D are down without getting a chance to commit the change.

Then shut down A and E. 
 Restart B,C,D,making sure that they will elect a leader.and A start later(A's latest tranactions
will be lost,because A will sync with Lead).

When this is done, the write operation done before is lost?

Is there anything I miss in the above process? Thanks.


View raw message