zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fangmin Lv <lvfang...@gmail.com>
Subject Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1
Date Fri, 06 Apr 2018 17:04:53 GMT
Hi Alex,

Can you give more details about the data loss scenario in Jira
ZOOKEEPER-2959 <https://issues.apache.org/jira/browse/ZOOKEEPER-2959>? As
far as I know, the leader will ignore the observers' ACK in
waitForNewLeaderAck, so it will not start serve traffic until it received
the actual quorum ACK, if it doesn't have enough followers support before
timeout, it will quit leading and it's learners will re-sync with new
leader.

Thanks,
Fangmin

On Thu, Apr 5, 2018 at 12:57 PM, Alexander Shraer <shralex@gmail.com> wrote:

> Btw we actually observed the described issue (data loss), thankfully in a
> test environment. So I thought this is important to share with the
> community.
>
> Unfortunately I don’t have time to run a new ZK release for this, so I’m
> not going to -1 your candidate, but we are actively working on a fix (ie a
> test at this point) and I can commit that as soon as we have that.
>
> It may be worth while to delay the release by a few more days, but it’s
> totally up to you since you’re running it.
>
> Cheers
> Alex
> On Thu, Apr 5, 2018 at 12:47 PM Andor Molnar <andor@cloudera.com> wrote:
>
> > Got that. I still believe it's a completely valid issue which has to be
> > addressed, but it's not a showstopper. I'm afraid we're not going to
> > convince each other, so it's probably Abe's call if he want to create
> > another release candidate for the fix.
> >
> > I reviewed the code on github and I think it just needs to be covered
> with
> > a unit test to be complete.
> >
> > Regards,
> > Andor
> >
> >
> >
> > On Thu, Apr 5, 2018 at 9:05 PM, Alexander Shraer <shralex@gmail.com>
> > wrote:
> >
> > > Yes sort of, FLE is finished, then enough observer's messages reach the
> > > leader before participant's messages do.
> > > Whether its rare depends on the number of observers and participants.
> For
> > > example with very few participants and many observers
> > > your chance of hitting this are quite high.
> > >
> > > Alex
> > >
> > > On Thu, Apr 5, 2018 at 11:44 AM, Andor Molnar <andor@cloudera.com>
> > wrote:
> > >
> > > > Maybe I'm missing something here, but this looks like a rare edge
> case
> > to
> > > > me. Participants must finish the leader election successfully and
> right
> > > > after enough followers should fail to send epoch to the leader, so
> > > > observers can take it over.
> > > >
> > > > Is that description accurate?
> > > >
> > > > Andor
> > > >
> > > >
> > > > On Thu, Apr 5, 2018 at 7:35 PM, Alexander Shraer <shralex@gmail.com>
> > > > wrote:
> > > >
> > > > > To clarify - in a deployment with observers this bug can
> potentially
> > > > cause
> > > > > data loss. A server could be elected leader based just on the
> support
> > > of
> > > > > observers, even if this servers data is stale wrt other followers.
> > > > >
> > > > > It is certainly a blocker, just not sure if for 3.4.11 or 3.4.12.
> > > > >
> > > > >
> > > > > Alex
> > > > > On Thu, Apr 5, 2018 at 10:29 AM Andor Molnar <andor@cloudera.com>
> > > wrote:
> > > > >
> > > > > > I don't think it's a blocker.
> > > > > > The jira and PR has been open since last December and 3.4.11
has
> > > > released
> > > > > > without it.
> > > > > >
> > > > > > Although this bug is also important to fix, I believe it's more
> > > > important
> > > > > > to release a fix for the regression we've found in 3.4.11 asap.
> > > > > >
> > > > > > Abe, any thoughts?
> > > > > >
> > > > > > Regards,
> > > > > > Andor
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Apr 5, 2018 at 7:00 PM, Alexander Shraer <
> > shralex@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Sorry for coming in at the last moment. I'm not sure when
the
> > next
> > > > 3.4
> > > > > > > release is scheduled, so just wanted to mention this bug,
> > > > > > > which I believe is a blocker for either this or next release:
> > > > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> > > > > > >
> > > > > > > Best,
> > > > > > > Alex
> > > > > > >
> > > > > > > On Thu, Apr 5, 2018 at 9:09 AM, Ted Yu <yuzhihong@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Can the vote be closed ?
> > > > > > > >
> > > > > > > > It seems we have enough +1's
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message