zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@gmail.com>
Subject Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1
Date Thu, 05 Apr 2018 19:57:08 GMT
Btw we actually observed the described issue (data loss), thankfully in a
test environment. So I thought this is important to share with the
community.

Unfortunately I don’t have time to run a new ZK release for this, so I’m
not going to -1 your candidate, but we are actively working on a fix (ie a
test at this point) and I can commit that as soon as we have that.

It may be worth while to delay the release by a few more days, but it’s
totally up to you since you’re running it.

Cheers
Alex
On Thu, Apr 5, 2018 at 12:47 PM Andor Molnar <andor@cloudera.com> wrote:

> Got that. I still believe it's a completely valid issue which has to be
> addressed, but it's not a showstopper. I'm afraid we're not going to
> convince each other, so it's probably Abe's call if he want to create
> another release candidate for the fix.
>
> I reviewed the code on github and I think it just needs to be covered with
> a unit test to be complete.
>
> Regards,
> Andor
>
>
>
> On Thu, Apr 5, 2018 at 9:05 PM, Alexander Shraer <shralex@gmail.com>
> wrote:
>
> > Yes sort of, FLE is finished, then enough observer's messages reach the
> > leader before participant's messages do.
> > Whether its rare depends on the number of observers and participants. For
> > example with very few participants and many observers
> > your chance of hitting this are quite high.
> >
> > Alex
> >
> > On Thu, Apr 5, 2018 at 11:44 AM, Andor Molnar <andor@cloudera.com>
> wrote:
> >
> > > Maybe I'm missing something here, but this looks like a rare edge case
> to
> > > me. Participants must finish the leader election successfully and right
> > > after enough followers should fail to send epoch to the leader, so
> > > observers can take it over.
> > >
> > > Is that description accurate?
> > >
> > > Andor
> > >
> > >
> > > On Thu, Apr 5, 2018 at 7:35 PM, Alexander Shraer <shralex@gmail.com>
> > > wrote:
> > >
> > > > To clarify - in a deployment with observers this bug can potentially
> > > cause
> > > > data loss. A server could be elected leader based just on the support
> > of
> > > > observers, even if this servers data is stale wrt other followers.
> > > >
> > > > It is certainly a blocker, just not sure if for 3.4.11 or 3.4.12.
> > > >
> > > >
> > > > Alex
> > > > On Thu, Apr 5, 2018 at 10:29 AM Andor Molnar <andor@cloudera.com>
> > wrote:
> > > >
> > > > > I don't think it's a blocker.
> > > > > The jira and PR has been open since last December and 3.4.11 has
> > > released
> > > > > without it.
> > > > >
> > > > > Although this bug is also important to fix, I believe it's more
> > > important
> > > > > to release a fix for the regression we've found in 3.4.11 asap.
> > > > >
> > > > > Abe, any thoughts?
> > > > >
> > > > > Regards,
> > > > > Andor
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Apr 5, 2018 at 7:00 PM, Alexander Shraer <
> shralex@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Sorry for coming in at the last moment. I'm not sure when the
> next
> > > 3.4
> > > > > > release is scheduled, so just wanted to mention this bug,
> > > > > > which I believe is a blocker for either this or next release:
> > > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> > > > > >
> > > > > > Best,
> > > > > > Alex
> > > > > >
> > > > > > On Thu, Apr 5, 2018 at 9:09 AM, Ted Yu <yuzhihong@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Can the vote be closed ?
> > > > > > >
> > > > > > > It seems we have enough +1's
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message