zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaurav Saxena <gsaxen...@gmail.com>
Subject Re: Data loss scenario
Date Wed, 20 Aug 2014 22:17:37 GMT
Thanks a lot Alexander. That's a great starting point. I will look into the
code.

On Wednesday, August 20, 2014, Alexander Shraer <shralex@gmail.com> wrote:

> I think its:
>
> src/java/main/org/apache/zookeeper/server/quorum/Leader.java,
> waitForEpochAck throws exception if the follower is ahead of the leader in
> terms of data, like in your example
>
> src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, run()
> throws exception if follower has a more up-to-date configuration than
> leader.
>
> Since a leader needs support from a quorum, when trying to become leader
> one of the servers who knows about d3 will need to connect to it (since d3
> was committed and every two majorities intersect). So C will not be able to
> gather the required support without triggering the checks above.
>
> In fact C is very unlikely to get that far as to try to become the leader -
> as Henry mentioned ZooKeeper has a preliminary protocol called
> FastLeaderElection.java which tries to make sure that the candidate leader
> has the most up-to-date data and support from a quorum. This is how the
> candidate is chosen and then the other servers establish connections to
> this candidate. The checks above are in case by the time connections are
> established to the candidate leader some server from whom he previously
> didn't hear in FastLeaderElection tries to connect and the candidate leader
> discovers that he shouldn't really be the leader. Then he gives up and
> returns back to FastLeaderElection.
>
>
>
>
>
> On Wed, Aug 20, 2014 at 10:42 AM, Gaurav Saxena <gsaxena81@gmail.com
> <javascript:;>> wrote:
>
> > Thanks! That's great... If someone can point me to the code where this is
> > decided, it will be a great help... as I have to present evidence that
> this
> > scenario will not happen
> >
> >
> > On Wed, Aug 20, 2014 at 10:33 AM, Henry Robinson <henry@cloudera.com
> <javascript:;>>
> > wrote:
> >
> > > IIRC, C cannot become the master because it does not have all the
> changes
> > > that A and B have seen. The leader election protocol can take care of
> > > ensuring the invariant that the elected master must be the most
> > up-to-date
> > > of all peers. (Alternatively, the new master can request the missing
> log
> > > suffix from the peers during election, but I believe, although it's a
> > while
> > > since I checked, that ZK does the former. Someone can fill in the
> > details /
> > > correct me).
> > >
> > > Henry
> > >
> > >
> > > On 20 August 2014 10:24, Gaurav Saxena <gsaxena81@gmail.com
> <javascript:;>> wrote:
> > >
> > > > I am curious about a seemingly data loss scenario. I describe it
> below
> > > >
> > > > There are three zookeeper servers A, B, and C.
> > > > 1. At one point in time t1 the state of the system is as follows:
> > > > A is up and contains data d1, d2. A is master
> > > > B is up and contains data d1, d2
> > > > C is up and contains data d1, d2
> > > >
> > > > 2. At time t2 C goes down. The state of the system at t2 is
> > > > A is up and contains data d1, d2. A is master
> > > > B is up and contains data d1, d2
> > > > C is down and its log contains data d1, d2
> > > >
> > > > 3. At time t3 the state of the system changes
> > > > A is up and contains data d1, d2, d3. A is master
> > > > B is up and contains data d1, d2, d3
> > > > C is down and its log contains data d1, d2
> > > >
> > > > 4. At time t4, C comes up and also becomes the master, while A and B
> > are
> > > > also up
> > > >
> > > > Question: Because C is master, will the logs of A and B be truncated
> to
> > > > contain only d1 and d2? Is this considered a data loss scenario? If
> > yes,
> > > is
> > > > there an issue around it?
> > > >
> > > > --
> > > > Regards
> > > > Gaurav Saxena
> > > >
> > >
> > >
> > >
> > > --
> > > Henry Robinson
> > > Software Engineer
> > > Cloudera
> > > 415-994-6679
> > >
> >
> >
> >
> > --
> > Regards
> > Gaurav Saxena
> >
>


-- 
Regards
Gaurav Saxena

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message