zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: what would happen with this case ? (ZAB protocol question)
Date Thu, 21 Jul 2011 22:09:58 GMT
I think the message ordering constraints combined with the quorum deal with
this situation.

On Thu, Jul 21, 2011 at 1:42 PM, Alexander Shraer <shralex@yahoo-inc.com>wrote:

> Hi Ted,
>
> In your scenario there is no problem I can see. The problem is in another
> scenario I described in the JIRA - there C has seen more proposals than B
> but B has seen more commits than C. When leader election happens (and
> assuming they don't restart beforehand), B will be elected as leader and not
> C, which is a problem because C's suffix of transactions which were acked by
> both A and C will be truncated.
>
> Alex
>
> > -----Original Message-----
> > From: Ted Dunning [mailto:ted.dunning@gmail.com]
> > Sent: Thursday, July 21, 2011 1:25 PM
> > To: user@zookeeper.apache.org
> > Cc: Yang
> > Subject: Re: what would happen with this case ? (ZAB protocol question)
> >
> > Alex,
> >
> > Are you sure that this is a bug.
> >
> > Take the case of three servers A, B and C with A being leader.
> >
> > If transactions 1, 2 and 3 are committed, then a majority of the nodes,
> > including at least A, must have seen these transactions.  Moreover,
> > transactions cannot be committed on a node unless all previous
> transactions
> > have been seen on that node as well.  Thus, by symmetry, we can consider
> > cases where B alone committed these transactions or where B and C
> committed
> > them.  Only the first case is problematic.
> >
> > Now, assume further that transaction 4 has arrived at B and been
> forwarded
> > to A but neither B nor C have committed to it.
> >
> > The situation now is that in this first epoch, A has seen 1-4, B has seen
> > 1-3 and C has seen nothing.  At least two nodes know the current epoch
> > because we obviously have a quorum and we know that B knows the current
> > epoch because it has seen transactions from this epoch.  Thus the
> collection
> > of machines that know the current epoch can be A+B or A+B+C.
> >
> > IF all three nodes now die simultaneously and B and C come back up, the
> > question is what will happen.  We know that the two nodes will agree on
> the
> > epoch because at least B has the last epoch.  Node B will be elected
> leader
> > because it has seen later transactions than C.  C will now get the
> > transactions and we have a quorum in a new epoch.
> >
> > If A returns at this point, it will know about transactions 1, 2, 3 and
> 4.
> >  Further, it will know that 1, 2, and 3 have been committed in the first
> > epoch and that 4 was proposed, but never committed.  As it joins, it will
> > find that a new epoch has started and will recognize B as master.  B will
> > tell it to truncate the log by deleting 4, but 4 was never committed
> anyway.
> >
> > Where is the problem?
> >
> > On Thu, Jul 21, 2011 at 1:11 PM, Alexander Shraer <shralex@yahoo-
> > inc.com>wrote:
> >
> > > The problem is in leader election - if the server doesn't reboot before
> > > running leader election (the usual case)  then only the transactions
> for
> > > which it received a commit count and it might not be elected leader,
> even if
> > > it has seen more transactions than the others. This may lead to
> transactions
> > > being dropped.
> > >
> > > I opened a JIRA for this.
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message