hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Ye <yeqian....@gmail.com>
Subject Re: Q about ZK internal: how commit is being remembered
Date Fri, 29 Jan 2010 02:45:02 GMT
Thanks Mahadev, I see what you mean.


On Fri, Jan 29, 2010 at 10:06 AM, Mahadev Konar <mahadev@yahoo-inc.com>wrote:

> Qian,
>
>  ZooKeeper gurantees that if a client sees some transaction response, then
> it will persist but the one's that a client does not see might be discarded
> or committed. So in case a quorum does not log the transaction, there might
> be a case wherein a zookeeper server which does not have the logged
> transaction becomes the leader (because the machines with the logged
> transaction are down). In that case the transaction is discarded. In a case
> when a machine which has the logged transaction becomes the leader that
> transaction will be committed.
>
> Hope that clear your doubt.
>
> mahadev
>
>
> On 1/28/10 6:02 PM, "Qian Ye" <yeqian.zju@gmail.com> wrote:
>
> > Thanks henry and ben, actually I have read the paper henry mentioned in
> this
> > mail, but I'm still not so clear with some of the details. Anyway, maybe
> > more study on the source code can help me understanding. Since Ben said
> > that, "if less than a quorum of servers have accepted a transaction, we
> can
> > commit or discard". Would this feature cause any unexpected problem? Can
> you
> > give some hints about this issue?
> >
> >
> >
> > On Fri, Jan 29, 2010 at 1:09 AM, Benjamin Reed <breed@yahoo-inc.com>
> wrote:
> >
> >> henry is correct. just to state another way, Zab guarantees that if a
> >> quorum of servers have accepted a transaction, the transaction will
> commit.
> >> this means that if less than a quorum of servers have accepted a
> >> transaction, we can commit or discard. the only constraint we have in
> >> choosing is ordering. we have to decide which partially accepted
> >> transactions are going to be committed and which discarded before we
> propose
> >> any new messages so that ordering is preserved.
> >>
> >> ben
> >>
> >>
> >> Henry Robinson wrote:
> >>
> >>> Hi -
> >>>
> >>> Note that a machine that has the highest received zxid will necessarily
> >>> have
> >>> seen the most recent transaction that was logged by a quorum of
> followers
> >>> (the FIFO property of TCP again ensures that all previous messages will
> >>> have
> >>> been seen). This is the property that ZAB needs to preserve. The idea
> is
> >>> to
> >>> avoid missing a commit that went to a node that has since failed.
> >>>
> >>> I was therefore slightly imprecise in my previous mail - it's possible
> for
> >>> only partially-proposed proposals to be committed if the leader that is
> >>> elected next has seen them. Only when another proposal is committed
> >>> instead
> >>> must the original proposal be discarded.
> >>>
> >>> I highly recommend Ben Reed's and Flavio Junqueira's LADIS paper on the
> >>> subject, for those with portal.acm.org access:
> >>> http://portal.acm.org/citation.cfm?id=1529978
> >>>
> >>> Henry
> >>>
> >>> On 27 January 2010 21:52, Qian Ye <yeqian.zju@gmail.com> wrote:
> >>>
> >>>
> >>>
> >>>> Hi Henry:
> >>>>
> >>>> According to your explanation, "*ZAB makes the guarantee that a
> proposal
> >>>> which has been logged by
> >>>> a quorum of followers will eventually be committed*" , however, the
> >>>> source
> >>>> code of Zookeeper, the FastLeaderElection.java file, shows that, in
> the
> >>>> election, the candidates only provide their zxid in the votes, the one
> >>>> with
> >>>> the max zxid would win the election. I mean, it seems that no check
> has
> >>>> been
> >>>> made to make sure whether the latest proposal has been logged by a
> quorum
> >>>> of
> >>>> servers.
> >>>>
> >>>> In this situation, the zookeeper would deliver a proposal, which is
> known
> >>>> as
> >>>> a failed one by the client. Imagine this scenario, a zookeeper cluster
> >>>> with
> >>>> 5 servers, Leader only receives 1 ack for proposal A, after a timeout,
> >>>> the
> >>>> client is told that the proposal failed. At this time, all servers
> >>>> restart
> >>>> due to a power failure. The server have the log of proposal A would
be
> >>>> the
> >>>> leader, however, the client is told the proposal A failed.
> >>>>
> >>>> Do I misunderstand this?
> >>>>
> >>>>
> >>>> On Wed, Jan 27, 2010 at 10:37 AM, Henry Robinson <henry@cloudera.com>
> >>>> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> Qing -
> >>>>>
> >>>>> That part of the documentation is slightly confusing. The elected
> leader
> >>>>> must have the highest zxid that has been written to disk by a quorum
> of
> >>>>> followers. ZAB makes the guarantee that a proposal which has been
> logged
> >>>>>
> >>>>>
> >>>> by
> >>>>
> >>>>
> >>>>> a quorum of followers will eventually be committed. Conversely,
any
> >>>>> proposals that *don't* get logged by a quorum before the leader
> sending
> >>>>> them
> >>>>> dies will not be committed. One of the ZAB papers covers both these
> >>>>> situations - making sure proposals are committed or skipped at the
> right
> >>>>> moments.
> >>>>>
> >>>>> So you get the neat property that leader election can be live in
> exactly
> >>>>> the
> >>>>> case where the ZK cluster is live. If a quorum of peers aren't
> available
> >>>>>
> >>>>>
> >>>> to
> >>>>
> >>>>
> >>>>> elect the leader, the resulting cluster won't be live anyhow, so
it's
> ok
> >>>>> for
> >>>>> leader election to fail.
> >>>>>
> >>>>> FLP impossibility isn't actually strictly relevant for ZAB, because
> FLP
> >>>>> requires that message reordering is possible (see all the stuff
in
> that
> >>>>> paper about non-deterministically drawing messages from a potentially
> >>>>> deliverable set). TCP FIFO channels don't reorder, so provide the
> extra
> >>>>> signalling that ZAB requires.
> >>>>>
> >>>>> cheers,
> >>>>> Henry
> >>>>>
> >>>>> 2010/1/26 Qing Yan <qingyan@gmail.com>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I have question about how zookeeper *remembers* a commit operation.
> >>>>>>
> >>>>>> According to
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> http://hadoop.apache.org/zookeeper/docs/r3.2.2/zookeeperInternals.html#sc_s
> >>>> ummary
> >>>>
> >>>>
> >>>>> <quote>
> >>>>>>
> >>>>>>
> >>>>>> The leader will issue a COMMIT to all followers as soon as a
quorum
> of
> >>>>>> followers have ACKed a message. Since messages are ACKed in
order,
> >>>>>>
> >>>>>>
> >>>>> COMMITs
> >>>>>
> >>>>>
> >>>>>> will be sent by the leader as received by the followers in order.
> >>>>>>
> >>>>>> COMMITs are processed in order. Followers deliver a proposals
> message
> >>>>>>
> >>>>>>
> >>>>> when
> >>>>>
> >>>>>
> >>>>>> that proposal is committed.
> >>>>>> </quote>
> >>>>>>
> >>>>>> My question is will leader wait for COMMIT to be processed by
quorum
> >>>>>> of followers before consider
> >>>>>> COMMIT to be success? From the documentation it seems that leader
> >>>>>>
> >>>>>>
> >>>>> handles
> >>>>
> >>>>
> >>>>> COMMIT asynchronously and
> >>>>>> don't expect confirmation from followers. In the extreme case,
what
> >>>>>>
> >>>>>>
> >>>>> happens
> >>>>>
> >>>>>
> >>>>>> if leader issue a COMMIT
> >>>>>> to all followers and crash immediately before the COMMIT message
can
> go
> >>>>>>
> >>>>>>
> >>>>> out
> >>>>>
> >>>>>
> >>>>>> of the network. How the system
> >>>>>> remembers the COMMIT ever happens?
> >>>>>>
> >>>>>> Actually this is related to the leader election process:
> >>>>>>
> >>>>>> <quote>
> >>>>>> ZooKeeper messaging doesn't care about the exact method of electing
> a
> >>>>>> leader
> >>>>>> has long as the following holds:
> >>>>>>
> >>>>>>  -
> >>>>>>
> >>>>>>  The leader has seen the highest zxid of all the followers.
> >>>>>>  -
> >>>>>>
> >>>>>>  A quorum of servers have committed to following the leader.
> >>>>>>
> >>>>>>  Of these two requirements only the first, the highest zxid
amoung
> the
> >>>>>> followers needs to hold for correct operation.
> >>>>>>
> >>>>>> </quote>
> >>>>>>
> >>>>>> Is there a liveness issue try to find "The leader has seen the
> highest
> >>>>>>
> >>>>>>
> >>>>> zxid
> >>>>>
> >>>>>
> >>>>>> of all the followers"? What if some of the followers (which
happens
> to
> >>>>>> holding the highest zxid) cannot be contacted(FLP impossible
> result?)
> >>>>>>  It will be more striaghtforward if COMMIT requires confirmation
> from a
> >>>>>> quorum of the followers. But I guess things get
> >>>>>> optimized according to Zab's FIFO nature...just want to hear
some
> >>>>>> clarification about it.
> >>>>>>
> >>>>>> Thanks alot!
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>> --
> >>>> With Regards!
> >>>>
> >>>> Ye, Qian
> >>>> Made in Zhejiang University
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >
>
>


-- 
With Regards!

Ye, Qian
Made in Zhejiang University

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message