hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahadev Konar <maha...@yahoo-inc.com>
Subject Re: Q about ZK internal: how commit is being remembered
Date Fri, 29 Jan 2010 02:06:49 GMT
Qian,

  ZooKeeper gurantees that if a client sees some transaction response, then
it will persist but the one's that a client does not see might be discarded
or committed. So in case a quorum does not log the transaction, there might
be a case wherein a zookeeper server which does not have the logged
transaction becomes the leader (because the machines with the logged
transaction are down). In that case the transaction is discarded. In a case
when a machine which has the logged transaction becomes the leader that
transaction will be committed.

Hope that clear your doubt.

mahadev


On 1/28/10 6:02 PM, "Qian Ye" <yeqian.zju@gmail.com> wrote:

> Thanks henry and ben, actually I have read the paper henry mentioned in this
> mail, but I'm still not so clear with some of the details. Anyway, maybe
> more study on the source code can help me understanding. Since Ben said
> that, "if less than a quorum of servers have accepted a transaction, we can
> commit or discard". Would this feature cause any unexpected problem? Can you
> give some hints about this issue?
> 
> 
> 
> On Fri, Jan 29, 2010 at 1:09 AM, Benjamin Reed <breed@yahoo-inc.com> wrote:
> 
>> henry is correct. just to state another way, Zab guarantees that if a
>> quorum of servers have accepted a transaction, the transaction will commit.
>> this means that if less than a quorum of servers have accepted a
>> transaction, we can commit or discard. the only constraint we have in
>> choosing is ordering. we have to decide which partially accepted
>> transactions are going to be committed and which discarded before we propose
>> any new messages so that ordering is preserved.
>> 
>> ben
>> 
>> 
>> Henry Robinson wrote:
>> 
>>> Hi -
>>> 
>>> Note that a machine that has the highest received zxid will necessarily
>>> have
>>> seen the most recent transaction that was logged by a quorum of followers
>>> (the FIFO property of TCP again ensures that all previous messages will
>>> have
>>> been seen). This is the property that ZAB needs to preserve. The idea is
>>> to
>>> avoid missing a commit that went to a node that has since failed.
>>> 
>>> I was therefore slightly imprecise in my previous mail - it's possible for
>>> only partially-proposed proposals to be committed if the leader that is
>>> elected next has seen them. Only when another proposal is committed
>>> instead
>>> must the original proposal be discarded.
>>> 
>>> I highly recommend Ben Reed's and Flavio Junqueira's LADIS paper on the
>>> subject, for those with portal.acm.org access:
>>> http://portal.acm.org/citation.cfm?id=1529978
>>> 
>>> Henry
>>> 
>>> On 27 January 2010 21:52, Qian Ye <yeqian.zju@gmail.com> wrote:
>>> 
>>> 
>>> 
>>>> Hi Henry:
>>>> 
>>>> According to your explanation, "*ZAB makes the guarantee that a proposal
>>>> which has been logged by
>>>> a quorum of followers will eventually be committed*" , however, the
>>>> source
>>>> code of Zookeeper, the FastLeaderElection.java file, shows that, in the
>>>> election, the candidates only provide their zxid in the votes, the one
>>>> with
>>>> the max zxid would win the election. I mean, it seems that no check has
>>>> been
>>>> made to make sure whether the latest proposal has been logged by a quorum
>>>> of
>>>> servers.
>>>> 
>>>> In this situation, the zookeeper would deliver a proposal, which is known
>>>> as
>>>> a failed one by the client. Imagine this scenario, a zookeeper cluster
>>>> with
>>>> 5 servers, Leader only receives 1 ack for proposal A, after a timeout,
>>>> the
>>>> client is told that the proposal failed. At this time, all servers
>>>> restart
>>>> due to a power failure. The server have the log of proposal A would be
>>>> the
>>>> leader, however, the client is told the proposal A failed.
>>>> 
>>>> Do I misunderstand this?
>>>> 
>>>> 
>>>> On Wed, Jan 27, 2010 at 10:37 AM, Henry Robinson <henry@cloudera.com>
>>>> wrote:
>>>> 
>>>> 
>>>> 
>>>>> Qing -
>>>>> 
>>>>> That part of the documentation is slightly confusing. The elected leader
>>>>> must have the highest zxid that has been written to disk by a quorum
of
>>>>> followers. ZAB makes the guarantee that a proposal which has been logged
>>>>> 
>>>>> 
>>>> by
>>>> 
>>>> 
>>>>> a quorum of followers will eventually be committed. Conversely, any
>>>>> proposals that *don't* get logged by a quorum before the leader sending
>>>>> them
>>>>> dies will not be committed. One of the ZAB papers covers both these
>>>>> situations - making sure proposals are committed or skipped at the right
>>>>> moments.
>>>>> 
>>>>> So you get the neat property that leader election can be live in exactly
>>>>> the
>>>>> case where the ZK cluster is live. If a quorum of peers aren't available
>>>>> 
>>>>> 
>>>> to
>>>> 
>>>> 
>>>>> elect the leader, the resulting cluster won't be live anyhow, so it's
ok
>>>>> for
>>>>> leader election to fail.
>>>>> 
>>>>> FLP impossibility isn't actually strictly relevant for ZAB, because FLP
>>>>> requires that message reordering is possible (see all the stuff in that
>>>>> paper about non-deterministically drawing messages from a potentially
>>>>> deliverable set). TCP FIFO channels don't reorder, so provide the extra
>>>>> signalling that ZAB requires.
>>>>> 
>>>>> cheers,
>>>>> Henry
>>>>> 
>>>>> 2010/1/26 Qing Yan <qingyan@gmail.com>
>>>>> 
>>>>> 
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I have question about how zookeeper *remembers* a commit operation.
>>>>>> 
>>>>>> According to
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> http://hadoop.apache.org/zookeeper/docs/r3.2.2/zookeeperInternals.html#sc_s
>>>> ummary
>>>> 
>>>> 
>>>>> <quote>
>>>>>> 
>>>>>> 
>>>>>> The leader will issue a COMMIT to all followers as soon as a quorum
of
>>>>>> followers have ACKed a message. Since messages are ACKed in order,
>>>>>> 
>>>>>> 
>>>>> COMMITs
>>>>> 
>>>>> 
>>>>>> will be sent by the leader as received by the followers in order.
>>>>>> 
>>>>>> COMMITs are processed in order. Followers deliver a proposals message
>>>>>> 
>>>>>> 
>>>>> when
>>>>> 
>>>>> 
>>>>>> that proposal is committed.
>>>>>> </quote>
>>>>>> 
>>>>>> My question is will leader wait for COMMIT to be processed by quorum
>>>>>> of followers before consider
>>>>>> COMMIT to be success? From the documentation it seems that leader
>>>>>> 
>>>>>> 
>>>>> handles
>>>> 
>>>> 
>>>>> COMMIT asynchronously and
>>>>>> don't expect confirmation from followers. In the extreme case, what
>>>>>> 
>>>>>> 
>>>>> happens
>>>>> 
>>>>> 
>>>>>> if leader issue a COMMIT
>>>>>> to all followers and crash immediately before the COMMIT message
can go
>>>>>> 
>>>>>> 
>>>>> out
>>>>> 
>>>>> 
>>>>>> of the network. How the system
>>>>>> remembers the COMMIT ever happens?
>>>>>> 
>>>>>> Actually this is related to the leader election process:
>>>>>> 
>>>>>> <quote>
>>>>>> ZooKeeper messaging doesn't care about the exact method of electing
a
>>>>>> leader
>>>>>> has long as the following holds:
>>>>>> 
>>>>>>  -
>>>>>> 
>>>>>>  The leader has seen the highest zxid of all the followers.
>>>>>>  -
>>>>>> 
>>>>>>  A quorum of servers have committed to following the leader.
>>>>>> 
>>>>>>  Of these two requirements only the first, the highest zxid amoung
the
>>>>>> followers needs to hold for correct operation.
>>>>>> 
>>>>>> </quote>
>>>>>> 
>>>>>> Is there a liveness issue try to find "The leader has seen the highest
>>>>>> 
>>>>>> 
>>>>> zxid
>>>>> 
>>>>> 
>>>>>> of all the followers"? What if some of the followers (which happens
to
>>>>>> holding the highest zxid) cannot be contacted(FLP impossible result?)
>>>>>>  It will be more striaghtforward if COMMIT requires confirmation
from a
>>>>>> quorum of the followers. But I guess things get
>>>>>> optimized according to Zab's FIFO nature...just want to hear some
>>>>>> clarification about it.
>>>>>> 
>>>>>> Thanks alot!
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> --
>>>> With Regards!
>>>> 
>>>> Ye, Qian
>>>> Made in Zhejiang University
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 


Mime
View raw message