zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ibrahim El-sanosi (PGR)" <i.s.el-san...@newcastle.ac.uk>
Subject RE: 3-server Zab cluster
Date Mon, 05 Oct 2015 17:50:21 GMT
>I'm not entirely sure what the optimization is and if you are proposing a change or what.
Are you looking for a blessing from this community? I'd like to understand what you're trying
to achieve.


As Zab uses reliable FIFO, it is possible to remove commit round. As soon as a follower receives
a proposal, it logs, sends an ACK and commits locally. Upon receiving ACK from any follower,
leader commits a proposal locally, no COMMIT message need to be sent to followers. In this
case, all servers commit a proposal in two round-trips, resulting in reducing latency particularly
in followers. 

Note that this optimization can only work in 3-servers cluster (follower reaches a majority
as soon as it acks).  

The proposal:

ZK with  3-server cluster,  it is common use compared to 5 or 7, etc ensemble (I think). Clients
 who  use 3-ZK ensemble and look to achieve better latency, we may provide this optimization
(above algorithm)  as optional. 

I hope my aim is clear now.

Ibrahim 

-----Original Message-----
From: Flavio Junqueira [mailto:fpj@apache.org] 
Sent: Monday, October 05, 2015 06:23 م
To: user@zookeeper.apache.org
Subject: Re: 3-server Zab cluster


> On 05 Oct 2015, at 18:13, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk>
wrote:
> 
> Hi Rakesh,
> 
> In Zab, before the end of synchronization phase, new leader will not commit any proposals
in transaction logs that have not got a majority of acks from pervious ensemble  (that what
you are saying).

That's not accurate. Being recorded by a quorum guarantees that a txn will be in the initial
state of future epochs, but a prospective leader might have txns it its log that haven't been
recorded in a log. The prospective leader needs to make sure that such txns are recorded in
a quorum before establishing a new epoch, though.

> I think what Zab does is that before the end of synchronization phase,  in L and F2 (the
new quorum), L (a prospective leader) will sync its own state with F2 as the initial state.
 Referring to my scenario, zxid =10 is part of the initial state and as a result it will be
delivered in new quorum (L and F2) before  processing new proposals of new epoch.

Yes, this is right.

> 
> You can read this thread 
> http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581
> 583.html 
> <http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td758
> 1583.html> for more info
> 
> What do you think? Does anyone have any questions or concerns about such (small) optimization?

I'm not entirely sure what the optimization is and if you are proposing a change or what.
Are you looking for a blessing from this community? I'd like to understand what you're trying
to achieve.

-Flavio

> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com 
> <mailto:rakeshr.apache@gmail.com>]
> Sent: Thursday, October 01, 2015 06:15 م
> To: Ibrahim El-sanosi (PGR)
> Subject: Re: 3-server Zab cluster
> 
>>>>>>>>> (***) Ok, I thought when F2 form a quorum with L and
 before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed
in L and F2 as well. I also though this process is the same as Zab, isn't it?
> 
> Since L didn't receives any ACK responses from F1 or F2 before leaving the Leader status
previously, L won't commit transaction zxid=10. IIUC after re-forming the new quorum L will
not have any mechanism to re-initiate the proposal(Active messaging phase) for the previous
zxid=10.
> 
> -Rakesh
> 
> On Thu, Oct 1, 2015 at 10:19 PM, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk
<mailto:i.s.el-sanosi@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <mailto:i.s.el-sanosi@newcastle.ac.uk>>>
wrote:
> Thank you Rakesh.
> 
>>>> In your case, zk client sees a successful response from F1. Then assume F2
>>>joins quorum first and L become the leader again. But the newly formed >>>quorum
will not have the zxid=10 transaction. This will make the cluster >>>inconsistent,
isn't it?
> 
> (***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes
its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though
this process is the same as Zab, isn't it?
> 
> 
>>>> Apart from the above case I'm not seeing any other problems with 3 node >>>cluster.
The above data loss case can be avoided by putting an assumption >>>that more than
a tolerated number of server failures may affect the cluster >>>consistency and results
in data loss.
> 
> Yes, if the solution above (***) is not correct, you assumption makes sense.
> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com 
> <mailto:rakeshr.apache@gmail.com><mailto:rakeshr.apache@gmail.com 
> <mailto:rakeshr.apache@gmail.com>>]
> Sent: 01 October 2015 17:26
> To: user@zookeeper.apache.org 
> <mailto:user@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
> <mailto:user@zookeeper.apache.org>>; Ibrahim El-sanosi (PGR)
> 
> Subject: Re: 3-server Zab cluster
> 
> Hi Ibrahim,
> 
> Below example taken from your older mail thread.
> 
>>>>>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and 
>>>>>> crashes. F2 crashes before receiving P10. L has not received any

>>>>>> ACKs
> 
> My thoughts for the above scenario is,
> 
> In your case, zk client sees a successful response from F1. Then assume F2 joins quorum
first and L become the leader again. But the newly formed quorum will not have the zxid=10
transaction. This will make the cluster inconsistent, isn't it?
> 
> Apart from the above case I'm not seeing any other problems with 3 node cluster. The
above data loss case can be avoided by putting an assumption that more than a tolerated number
of server failures may affect the cluster consistency and results in data loss. But I feel
this optimization would have more cases if we scale up the cluster size beyond 3 servers.
Now, I'm not thinking in that direction as your case is limited to 3 node cluster.
> 
> Regards,
> Rakesh
> 
> 
> On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk
<mailto:i.s.el-sanosi@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <mailto:i.s.el-sanosi@newcastle.ac.uk>>>
wrote:
> Yes Alex, in my post I mentioned that this (small) optimization can only work with 3-servers
cluster.
> 
> Who could confirm the optimization can work?
> 
> Ibrahim
> 
> -----Original Message-----
> From: Alexander Shraer [mailto:shralex@gmail.com 
> <mailto:shralex@gmail.com><mailto:shralex@gmail.com 
> <mailto:shralex@gmail.com>>]
> Sent: Tuesday, September 29, 2015 12:11 ص
> To: user@zookeeper.apache.org 
> <mailto:user@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
> <mailto:user@zookeeper.apache.org>>
> Subject: Re: 3-server Zab cluster
> 
> I'm not 100% sure whether operations that were pending on the leader are sent out during
sync when this leader looses quorum and re-elected. If so, then maybe you're right. But in
any case, this would not work for 5 or more servers...
> 
> On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) < i.s.el-sanosi@newcastle.ac.uk
<mailto:i.s.el-sanosi@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <mailto:i.s.el-sanosi@newcastle.ac.uk>>>
wrote:
> 
>> Thank you Alex for replaying.
>> 
>> When you said " the leader gets re-elected and the operation is 
>> truncated from logs at other servers". I though the new leader will 
>> sync the its logs with other followers (synchronization phase), 
>> resulting in the operation will commit by new quorum.  Let me make the scenarios
as steps:
>> 
>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2 
>> crashes before receiving P10. L has not received any ACKs
>> 
>> Possible solution  (1)
>> The leader will move to LOOKING phase as there is no quorum 
>> supporting its leadership. Now Assume F2 wakes up. F2 forms a quorum 
>> with the L (pervious leader), L becomes new leader again as it has latest zxid (10)
in its log.
>> L syncs its state with F2, as a result L, F1 (before crashing) and F2 
>> commit P10.  Is that correct?
>> 
>> Possible solution  (2)
>> The leader will move to LOOKING phase as there is no quorum 
>> supporting its leadership. Now Assume F1 (with Zxid =10  committed) 
>> wakes up. I am not sure who should be a leader (F1 with Zxid =10 
>> committed or L (pervious
>> leader) with Zxid = 10 logged), I think F1 become a new leader as it 
>> has Zxid = 10 committed. F1 forms a quorum with the L (pervious 
>> leader), F1 becomes new leader as it has latest zxid (10) . L (new
>> leader) syncs its state with L (pervious leader now become a 
>> follower), as a result Zxid10 commits by new quorum.  Is that correct?
>> 
>> What do you think?
>> 
>> Ibrahim
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Alexander Shraer [mailto:shralex@gmail.com 
>> <mailto:shralex@gmail.com><mailto:shralex@gmail.com 
>> <mailto:shralex@gmail.com>>]
>> Sent: Monday, September 28, 2015 07:27 م
>> To: user@zookeeper.apache.org 
>> <mailto:user@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
>> <mailto:user@zookeeper.apache.org>>
>> Cc: dev@zookeeper.apache.org 
>> <mailto:dev@zookeeper.apache.org><mailto:dev@zookeeper.apache.org 
>> <mailto:dev@zookeeper.apache.org>>
>> Subject: Re: 3-server Zab cluster
>> 
>> Committing locally when sending an ACK at a server would lead to loss 
>> of consistency - it is possible that this is the only server that 
>> acks, e.g., this server is temporarily disconnected from the leader, 
>> the leader gets re-elected and the operation is truncated from logs 
>> at other servers. Its ok to ACK it but its not ok to commit since 
>> this exposes this to users as a committed operation that they can see.
>> 
>> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) < 
>> i.s.el-sanosi@newcastle.ac.uk <mailto:i.s.el-sanosi@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk
<mailto:i.s.el-sanosi@newcastle.ac.uk>>> wrote:
>> 
>>> In Zab, assume we have a cluster consists of 3-servers. To deliver a 
>>> write request, it must run 3 communication steps proposal, 
>>> acknowledgement and commit.
>>> As Zab uses reliable FIFO, it is possible to remove commit round. As 
>>> soon as a follower receives a proposal, it logs, sends an ACK and 
>>> commits locally. Upon receiving ACK from any follower, leader 
>>> commits a proposal locally, no COMMIT message need to be sent to 
>>> followers. In this case, all servers commit a proposal in two 
>>> round-trips, resulting in reducing latency particularly in followers.
>>> 
>>> Note that this optimization can only work in 3-servers cluster 
>>> (follower reaches a majority as soon as it acks).
>>> Does anyone see any problems with such (small) optimization?
>>> Ibrahim

Mime
View raw message