zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@gmail.com>
Subject Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1
Date Fri, 13 Apr 2018 21:48:02 GMT
We discussed with Pat offline and agreed to go without this patch,
especially since we need to patch 3 branches: 3.4, 3.5 and master.
We'll prepare 3.5 and master and then commit all 3 together in time for the
next release. So Abe, please go ahead with your release.

Alex

On Fri, Apr 13, 2018 at 2:26 PM, Patrick Hunt <phunt@apache.org> wrote:

> Hey folks. I've been on vacation. My 0.02 - given the release candidate is
> well underway, has sufficient votes/time to finalize, this is not a
> regression in 3.4.12 and it's not yet committed I would think we
> finalize/push 3.4.12 then quickly followup with a 3.4.13 that addresses
> this. Alex could be the RM given his interest/advocacy.
>
> Regards,
>
> Patrick
>
> On Fri, Apr 13, 2018 at 11:55 AM, Abraham Fine <afine@apache.org> wrote:
>
> > Given that the primary driver of this release is to fix an issue with the
> > misuse of dataDir and dataLogDir I would rather see this release make it
> > out the door with minimal additional changes to core functionality so
> > people can more confidently upgrade.
> >
> > What do you think Pat?
> >
> > Abe
> >
> > On Fri, Apr 13, 2018, at 11:37, Alexander Shraer wrote:
> > > Now that we have the fix, why delay it to next release?
> > >
> > > On Fri, Apr 13, 2018 at 11:09 AM Abraham Fine <afine@apache.org>
> wrote:
> > >
> > > > Let's wait until the next release to include this fix.
> > > >
> > > > On Mon, Apr 9, 2018, at 15:14, Alexander Shraer wrote:
> > > > > Hi,
> > > > >
> > > > > Please take a look on the new PR for ZK-2959:
> > > > > https://github.com/apache/zookeeper/pull/500
> > > > > If there are no further comments, I can commit it.
> > > > >
> > > > > Thanks,
> > > > > Alex
> > > > >
> > > > > On Fri, Apr 6, 2018 at 11:33 AM, Alexander Shraer <
> shralex@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > The bug described in  ZOOKEEPER-2959
> > > > > > <https://issues.apache.org/jira/browse/ZOOKEEPER-2959>
 is that
> > > > > > getEpochToPropose an waitForEpochAck do not distinguish between
> > > > followers
> > > > > > and observers.
> > > > > > This can cause a candidate leader's acceptedEpoch to be updated
> > with
> > > > only
> > > > > > support from observers. Same for waitForEpochAck - passing this
> > method
> > > > > > allows the candidate leader to update the currentEpoch. The
> latter
> > > > helps
> > > > > > this server to win FLE elections continuously, and the former
> > > > > > (acceptedEpoch)
> > > > > > causes anyone trying to connect to the server to think that
it
> has
> > more
> > > > > > up-to-date data and trucate their logs to match.
> > > > > >
> > > > > >
> > > > > > Alex
> > > > > >
> > > > > > On Fri, Apr 6, 2018 at 10:04 AM, Fangmin Lv <lvfangmin@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > >> Hi Alex,
> > > > > >>
> > > > > >> Can you give more details about the data loss scenario in
Jira
> > > > > >> ZOOKEEPER-2959 <https://issues.apache.org/
> > jira/browse/ZOOKEEPER-2959
> > > > >?
> > > > > >> As far as I know, the leader will ignore the observers'
ACK in
> > > > > >> waitForNewLeaderAck, so it will not start serve traffic
until it
> > > > received
> > > > > >> the actual quorum ACK, if it doesn't have enough followers
> support
> > > > before
> > > > > >> timeout, it will quit leading and it's learners will re-sync
> with
> > new
> > > > > >> leader.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Fangmin
> > > > > >>
> > > > > >> On Thu, Apr 5, 2018 at 12:57 PM, Alexander Shraer <
> > shralex@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Btw we actually observed the described issue (data loss),
> > thankfully
> > > > in a
> > > > > >>> test environment. So I thought this is important to
share with
> > the
> > > > > >>> community.
> > > > > >>>
> > > > > >>> Unfortunately I don’t have time to run a new ZK release
for
> > this, so
> > > > I’m
> > > > > >>> not going to -1 your candidate, but we are actively
working on
> a
> > fix
> > > > (ie
> > > > > >>> a
> > > > > >>> test at this point) and I can commit that as soon as
we have
> > that.
> > > > > >>>
> > > > > >>> It may be worth while to delay the release by a few
more days,
> > but
> > > > it’s
> > > > > >>> totally up to you since you’re running it.
> > > > > >>>
> > > > > >>> Cheers
> > > > > >>> Alex
> > > > > >>> On Thu, Apr 5, 2018 at 12:47 PM Andor Molnar <
> andor@cloudera.com
> > >
> > > > wrote:
> > > > > >>>
> > > > > >>> > Got that. I still believe it's a completely valid
issue which
> > has
> > > > to be
> > > > > >>> > addressed, but it's not a showstopper. I'm afraid
we're not
> > going
> > > > to
> > > > > >>> > convince each other, so it's probably Abe's call
if he want
> to
> > > > create
> > > > > >>> > another release candidate for the fix.
> > > > > >>> >
> > > > > >>> > I reviewed the code on github and I think it just
needs to be
> > > > covered
> > > > > >>> with
> > > > > >>> > a unit test to be complete.
> > > > > >>> >
> > > > > >>> > Regards,
> > > > > >>> > Andor
> > > > > >>> >
> > > > > >>> >
> > > > > >>> >
> > > > > >>> > On Thu, Apr 5, 2018 at 9:05 PM, Alexander Shraer
<
> > > > shralex@gmail.com>
> > > > > >>> > wrote:
> > > > > >>> >
> > > > > >>> > > Yes sort of, FLE is finished, then enough
observer's
> messages
> > > > reach
> > > > > >>> the
> > > > > >>> > > leader before participant's messages do.
> > > > > >>> > > Whether its rare depends on the number of
observers and
> > > > > >>> participants. For
> > > > > >>> > > example with very few participants and many
observers
> > > > > >>> > > your chance of hitting this are quite high.
> > > > > >>> > >
> > > > > >>> > > Alex
> > > > > >>> > >
> > > > > >>> > > On Thu, Apr 5, 2018 at 11:44 AM, Andor Molnar
<
> > > > andor@cloudera.com>
> > > > > >>> > wrote:
> > > > > >>> > >
> > > > > >>> > > > Maybe I'm missing something here, but
this looks like a
> > rare
> > > > edge
> > > > > >>> case
> > > > > >>> > to
> > > > > >>> > > > me. Participants must finish the leader
election
> > successfully
> > > > and
> > > > > >>> right
> > > > > >>> > > > after enough followers should fail to
send epoch to the
> > > > leader, so
> > > > > >>> > > > observers can take it over.
> > > > > >>> > > >
> > > > > >>> > > > Is that description accurate?
> > > > > >>> > > >
> > > > > >>> > > > Andor
> > > > > >>> > > >
> > > > > >>> > > >
> > > > > >>> > > > On Thu, Apr 5, 2018 at 7:35 PM, Alexander
Shraer <
> > > > > >>> shralex@gmail.com>
> > > > > >>> > > > wrote:
> > > > > >>> > > >
> > > > > >>> > > > > To clarify - in a deployment with
observers this bug
> can
> > > > > >>> potentially
> > > > > >>> > > > cause
> > > > > >>> > > > > data loss. A server could be elected
leader based just
> > on the
> > > > > >>> support
> > > > > >>> > > of
> > > > > >>> > > > > observers, even if this servers
data is stale wrt other
> > > > > >>> followers.
> > > > > >>> > > > >
> > > > > >>> > > > > It is certainly a blocker, just
not sure if for 3.4.11
> or
> > > > 3.4.12.
> > > > > >>> > > > >
> > > > > >>> > > > >
> > > > > >>> > > > > Alex
> > > > > >>> > > > > On Thu, Apr 5, 2018 at 10:29 AM
Andor Molnar <
> > > > andor@cloudera.com
> > > > > >>> >
> > > > > >>> > > wrote:
> > > > > >>> > > > >
> > > > > >>> > > > > > I don't think it's a blocker.
> > > > > >>> > > > > > The jira and PR has been open
since last December and
> > > > 3.4.11
> > > > > >>> has
> > > > > >>> > > > released
> > > > > >>> > > > > > without it.
> > > > > >>> > > > > >
> > > > > >>> > > > > > Although this bug is also important
to fix, I believe
> > it's
> > > > more
> > > > > >>> > > > important
> > > > > >>> > > > > > to release a fix for the regression
we've found in
> > 3.4.11
> > > > asap.
> > > > > >>> > > > > >
> > > > > >>> > > > > > Abe, any thoughts?
> > > > > >>> > > > > >
> > > > > >>> > > > > > Regards,
> > > > > >>> > > > > > Andor
> > > > > >>> > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > > > On Thu, Apr 5, 2018 at 7:00
PM, Alexander Shraer <
> > > > > >>> > shralex@gmail.com>
> > > > > >>> > > > > > wrote:
> > > > > >>> > > > > >
> > > > > >>> > > > > > > Sorry for coming in at
the last moment. I'm not
> sure
> > > > when the
> > > > > >>> > next
> > > > > >>> > > > 3.4
> > > > > >>> > > > > > > release is scheduled,
so just wanted to mention
> this
> > bug,
> > > > > >>> > > > > > > which I believe is a blocker
for either this or
> next
> > > > release:
> > > > > >>> > > > > > > https://issues.apache.org/
> jira/browse/ZOOKEEPER-2959
> > > > > >>> > > > > > >
> > > > > >>> > > > > > > Best,
> > > > > >>> > > > > > > Alex
> > > > > >>> > > > > > >
> > > > > >>> > > > > > > On Thu, Apr 5, 2018 at
9:09 AM, Ted Yu <
> > > > yuzhihong@gmail.com>
> > > > > >>> > > wrote:
> > > > > >>> > > > > > >
> > > > > >>> > > > > > > > Can the vote be closed
?
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > > It seems we have
enough +1's
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > > Thanks
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >
> > > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message