zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1
Date Fri, 13 Apr 2018 21:26:01 GMT
Hey folks. I've been on vacation. My 0.02 - given the release candidate is
well underway, has sufficient votes/time to finalize, this is not a
regression in 3.4.12 and it's not yet committed I would think we
finalize/push 3.4.12 then quickly followup with a 3.4.13 that addresses
this. Alex could be the RM given his interest/advocacy.

Regards,

Patrick

On Fri, Apr 13, 2018 at 11:55 AM, Abraham Fine <afine@apache.org> wrote:

> Given that the primary driver of this release is to fix an issue with the
> misuse of dataDir and dataLogDir I would rather see this release make it
> out the door with minimal additional changes to core functionality so
> people can more confidently upgrade.
>
> What do you think Pat?
>
> Abe
>
> On Fri, Apr 13, 2018, at 11:37, Alexander Shraer wrote:
> > Now that we have the fix, why delay it to next release?
> >
> > On Fri, Apr 13, 2018 at 11:09 AM Abraham Fine <afine@apache.org> wrote:
> >
> > > Let's wait until the next release to include this fix.
> > >
> > > On Mon, Apr 9, 2018, at 15:14, Alexander Shraer wrote:
> > > > Hi,
> > > >
> > > > Please take a look on the new PR for ZK-2959:
> > > > https://github.com/apache/zookeeper/pull/500
> > > > If there are no further comments, I can commit it.
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > On Fri, Apr 6, 2018 at 11:33 AM, Alexander Shraer <shralex@gmail.com
> >
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > The bug described in  ZOOKEEPER-2959
> > > > > <https://issues.apache.org/jira/browse/ZOOKEEPER-2959>  is
that
> > > > > getEpochToPropose an waitForEpochAck do not distinguish between
> > > followers
> > > > > and observers.
> > > > > This can cause a candidate leader's acceptedEpoch to be updated
> with
> > > only
> > > > > support from observers. Same for waitForEpochAck - passing this
> method
> > > > > allows the candidate leader to update the currentEpoch. The latter
> > > helps
> > > > > this server to win FLE elections continuously, and the former
> > > > > (acceptedEpoch)
> > > > > causes anyone trying to connect to the server to think that it has
> more
> > > > > up-to-date data and trucate their logs to match.
> > > > >
> > > > >
> > > > > Alex
> > > > >
> > > > > On Fri, Apr 6, 2018 at 10:04 AM, Fangmin Lv <lvfangmin@gmail.com>
> > > wrote:
> > > > >
> > > > >> Hi Alex,
> > > > >>
> > > > >> Can you give more details about the data loss scenario in Jira
> > > > >> ZOOKEEPER-2959 <https://issues.apache.org/
> jira/browse/ZOOKEEPER-2959
> > > >?
> > > > >> As far as I know, the leader will ignore the observers' ACK in
> > > > >> waitForNewLeaderAck, so it will not start serve traffic until
it
> > > received
> > > > >> the actual quorum ACK, if it doesn't have enough followers support
> > > before
> > > > >> timeout, it will quit leading and it's learners will re-sync
with
> new
> > > > >> leader.
> > > > >>
> > > > >> Thanks,
> > > > >> Fangmin
> > > > >>
> > > > >> On Thu, Apr 5, 2018 at 12:57 PM, Alexander Shraer <
> shralex@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>> Btw we actually observed the described issue (data loss),
> thankfully
> > > in a
> > > > >>> test environment. So I thought this is important to share
with
> the
> > > > >>> community.
> > > > >>>
> > > > >>> Unfortunately I don’t have time to run a new ZK release
for
> this, so
> > > I’m
> > > > >>> not going to -1 your candidate, but we are actively working
on a
> fix
> > > (ie
> > > > >>> a
> > > > >>> test at this point) and I can commit that as soon as we have
> that.
> > > > >>>
> > > > >>> It may be worth while to delay the release by a few more
days,
> but
> > > it’s
> > > > >>> totally up to you since you’re running it.
> > > > >>>
> > > > >>> Cheers
> > > > >>> Alex
> > > > >>> On Thu, Apr 5, 2018 at 12:47 PM Andor Molnar <andor@cloudera.com
> >
> > > wrote:
> > > > >>>
> > > > >>> > Got that. I still believe it's a completely valid issue
which
> has
> > > to be
> > > > >>> > addressed, but it's not a showstopper. I'm afraid we're
not
> going
> > > to
> > > > >>> > convince each other, so it's probably Abe's call if
he want to
> > > create
> > > > >>> > another release candidate for the fix.
> > > > >>> >
> > > > >>> > I reviewed the code on github and I think it just needs
to be
> > > covered
> > > > >>> with
> > > > >>> > a unit test to be complete.
> > > > >>> >
> > > > >>> > Regards,
> > > > >>> > Andor
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>> > On Thu, Apr 5, 2018 at 9:05 PM, Alexander Shraer <
> > > shralex@gmail.com>
> > > > >>> > wrote:
> > > > >>> >
> > > > >>> > > Yes sort of, FLE is finished, then enough observer's
messages
> > > reach
> > > > >>> the
> > > > >>> > > leader before participant's messages do.
> > > > >>> > > Whether its rare depends on the number of observers
and
> > > > >>> participants. For
> > > > >>> > > example with very few participants and many observers
> > > > >>> > > your chance of hitting this are quite high.
> > > > >>> > >
> > > > >>> > > Alex
> > > > >>> > >
> > > > >>> > > On Thu, Apr 5, 2018 at 11:44 AM, Andor Molnar <
> > > andor@cloudera.com>
> > > > >>> > wrote:
> > > > >>> > >
> > > > >>> > > > Maybe I'm missing something here, but this
looks like a
> rare
> > > edge
> > > > >>> case
> > > > >>> > to
> > > > >>> > > > me. Participants must finish the leader election
> successfully
> > > and
> > > > >>> right
> > > > >>> > > > after enough followers should fail to send
epoch to the
> > > leader, so
> > > > >>> > > > observers can take it over.
> > > > >>> > > >
> > > > >>> > > > Is that description accurate?
> > > > >>> > > >
> > > > >>> > > > Andor
> > > > >>> > > >
> > > > >>> > > >
> > > > >>> > > > On Thu, Apr 5, 2018 at 7:35 PM, Alexander
Shraer <
> > > > >>> shralex@gmail.com>
> > > > >>> > > > wrote:
> > > > >>> > > >
> > > > >>> > > > > To clarify - in a deployment with observers
this bug can
> > > > >>> potentially
> > > > >>> > > > cause
> > > > >>> > > > > data loss. A server could be elected
leader based just
> on the
> > > > >>> support
> > > > >>> > > of
> > > > >>> > > > > observers, even if this servers data
is stale wrt other
> > > > >>> followers.
> > > > >>> > > > >
> > > > >>> > > > > It is certainly a blocker, just not sure
if for 3.4.11 or
> > > 3.4.12.
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > > Alex
> > > > >>> > > > > On Thu, Apr 5, 2018 at 10:29 AM Andor
Molnar <
> > > andor@cloudera.com
> > > > >>> >
> > > > >>> > > wrote:
> > > > >>> > > > >
> > > > >>> > > > > > I don't think it's a blocker.
> > > > >>> > > > > > The jira and PR has been open since
last December and
> > > 3.4.11
> > > > >>> has
> > > > >>> > > > released
> > > > >>> > > > > > without it.
> > > > >>> > > > > >
> > > > >>> > > > > > Although this bug is also important
to fix, I believe
> it's
> > > more
> > > > >>> > > > important
> > > > >>> > > > > > to release a fix for the regression
we've found in
> 3.4.11
> > > asap.
> > > > >>> > > > > >
> > > > >>> > > > > > Abe, any thoughts?
> > > > >>> > > > > >
> > > > >>> > > > > > Regards,
> > > > >>> > > > > > Andor
> > > > >>> > > > > >
> > > > >>> > > > > >
> > > > >>> > > > > >
> > > > >>> > > > > > On Thu, Apr 5, 2018 at 7:00 PM,
Alexander Shraer <
> > > > >>> > shralex@gmail.com>
> > > > >>> > > > > > wrote:
> > > > >>> > > > > >
> > > > >>> > > > > > > Sorry for coming in at the
last moment. I'm not sure
> > > when the
> > > > >>> > next
> > > > >>> > > > 3.4
> > > > >>> > > > > > > release is scheduled, so just
wanted to mention this
> bug,
> > > > >>> > > > > > > which I believe is a blocker
for either this or next
> > > release:
> > > > >>> > > > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> > > > >>> > > > > > >
> > > > >>> > > > > > > Best,
> > > > >>> > > > > > > Alex
> > > > >>> > > > > > >
> > > > >>> > > > > > > On Thu, Apr 5, 2018 at 9:09
AM, Ted Yu <
> > > yuzhihong@gmail.com>
> > > > >>> > > wrote:
> > > > >>> > > > > > >
> > > > >>> > > > > > > > Can the vote be closed
?
> > > > >>> > > > > > > >
> > > > >>> > > > > > > > It seems we have enough
+1's
> > > > >>> > > > > > > >
> > > > >>> > > > > > > > Thanks
> > > > >>> > > > > > > >
> > > > >>> > > > > > >
> > > > >>> > > > > >
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message