zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: ZooKeeper 3.5 blocker issues
Date Mon, 17 Dec 2018 23:16:48 GMT
Are folks OK to wait on that OWASP issue I documented over the weekend?
afaict we are not affected but it would be good to get another pair of eyes
on it.

Patrick

On Mon, Dec 17, 2018 at 2:55 PM Andor Molnár <andor@apache.org> wrote:

> Hi team,
>
>
> I'm proudly announce that thanks to the joint effort from the community,
> the 3.5 blockers list has become empty:
>
> "project = ZooKeeper AND resolution = Unresolved AND fixVersion = 3.5.5
> AND priority in (blocker, critical) ORDER BY priority DESC, key ASC"
>
>
> Well... almost. All the blocker issues have gone, but we still have the
> Maven migration to complete before the stable release. If you have some
> free cycles, please join us testing the Maven build on this PR:
>
> https://github.com/apache/zookeeper/pull/708
>
> I hope we can merge it pretty soon.
>
>
> In terms of the builds, the weather at 3.5 branch is quite sunny nowadays:
>
> https://builds.apache.org/view/S-Z/view/ZooKeeper/
>
> The Java 11 build is still having some difficulties, which hopefully I
> can address before the holidays:
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-3204
>
>
> If you happen to know about something which is important from 3.5's
> perspective and missing from the above, please don't hesitate to share.
>
>
> Happy ZooKeeping!
>
> Andor
>
>
>
> On 11/2/18 21:12, Fangmin Lv wrote:
> > Andor,
> >
> > Here is the PR to port ZK-3104 from master to 3.4:
> > https://github.com/apache/zookeeper/pull/685.
> >
> > Fangmin
> >
> > On Fri, Nov 2, 2018 at 11:46 AM Fangmin Lv <lvfangmin@gmail.com> wrote:
> >
> >> Hi Andor,
> >>
> >> Is anyone working on ZK-2778? I can pick it up if there is no one
> working
> >> on it yet.
> >>
> >> I'll open a 3.5 PR for ZK-3104 today.
> >>
> >> Fangmin
> >>
> >> On Fri, Oct 26, 2018 at 3:33 AM Andor Molnar <andor@apache.org> wrote:
> >>
> >>> Hi folks,
> >>>
> >>> You’ve probably realised lots of update emails coming from Jira. Please
> >>> be aware that we’ve updated a bunch of open blocker/critical 3.5
> tickets to
> >>> reflect to what we discussed in this email.
> >>>
> >>> If you open up the following jira filter:
> >>>
> >>> project = ZooKeeper and resolution = Unresolved and fixVersion = 3.5.5
> >>> AND priority in (blocker, critical) ORDER BY priority DESC, key ASC
> >>>
> >>> You’ll see the most up-to-date list of tickets which need to be
> addressed
> >>> before the stable 3.5 release.
> >>>
> >>> Thank you for your efforts to get this done.
> >>>
> >>> Fangmin, ZK-3104 is waiting for backport, but ticket has already been
> >>> resolved. Have you created a separate ticket for the backport or shall
> I
> >>> just reopen it with the right fix versions?
> >>>
> >>> Thanks,
> >>> Andor
> >>>
> >>>
> >>>
> >>>> On 2018. Oct 8., at 12:34, Andor Molnar <andor@apache.org> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> Let me summarize and give a quick update on the outstanding issues for
> >>> 3.5 GA:
> >>>> - ZOOKEEPER-1818 (Fix don't care for trunk)
> >>>> - ZOOKEEPER-2778 (Potential server deadlock between follower sync with
> >>> leader and follower receiving external connection requests.)
> >>>> - ZOOKEEPER-3021 Migrate project structure to Maven (ongoing)
> >>>> - ZOOKEEPER-925 Docs generation to Maven
> >>>> - ZOOKEEPER-3104 (waiting for backport)
> >>>> - ZOOKEEPER-3125 (waiting for backport PR #647)
> >>>>
> >>>> The 2 Maven related tickets are no-brainers as well as the backports.
> >>> ZK-2778 has been picked up by Maoling (thanks!) as far as I can see,
> >>> ZK-1818 is the only one waiting for a volunteer.
> >>>> Please correct me if I’ve missed something.
> >>>>
> >>>> Regards,
> >>>> Andor
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> On 2018. Sep 28., at 18:32, Tamas Penzes <tamaas@cloudera.com.INVALID
> >
> >>> wrote:
> >>>>> Hi All,
> >>>>>
> >>>>> I would add ZOOKEEPER-3021
> >>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-3021> Migrate
> project
> >>>>> structure to Maven build as a blocker too. Since the migration has
> >>> started
> >>>>> it would be good to finish before releasing ZK 3.5.x GA.
> >>>>>
> >>>>> ZOOKEEPER-925 <https://issues.apache.org/jira/browse/ZOOKEEPER-925>
> >>> replace
> >>>>> our forrest site and documentation generation might also be a good
> >>> idea,
> >>>>> since then we could deliver the new MarkDown based documentation.
> >>>>>
> >>>>> Regards, Tamaas
> >>>>>
> >>>>> On Fri, Sep 14, 2018 at 10:09 AM Fangmin Lv <lvfangmin@gmail.com>
> >>> wrote:
> >>>>>> Oh, sorry for the confusion, I should provide more context.
> >>>>>>
> >>>>>> Leader will use on disk txn sync with followers to if the peer
zxid
> >>> is not
> >>>>>> in it's in memory commit logs, the code is here: Leader on disk
txn
> >>> sync
> >>>>>> <
> >>>>>>
> >>>
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java#L774
> >>>>>>> .
> >>>>>> There is bug that potentially there will be gap in the txn files,
> like
> >>>>>> after snap sync, etc, so it's possible the peer will miss txns
due
> to
> >>> this.
> >>>>>> The option to disable it is snapshotSizeFactor
> >>>>>> <
> >>>>>>
> >>>
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZKDatabase.java#L81
> >>>>>>> ,
> >>>>>> set it to -1 will disable this feature. On 3.5, it's better
to have
> a
> >>> PR to
> >>>>>> set this to -1 by default. It might have more SNAP sync, but
from
> our
> >>> prod
> >>>>>> it doesn't seem to be a big problem to me.
> >>>>>>
> >>>>>> I can send out the diff to disable it by default on 3.5 if you
guys
> >>> think
> >>>>>> this is the right way to do.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Fangmin
> >>>>>>
> >>>>>> On Thu, Sep 13, 2018 at 1:58 AM Andor Molnar <andor@apache.org>
> >>> wrote:
> >>>>>>> What’s needed to turn it off?
> >>>>>>> Do we need a PR or it’s just a config option?
> >>>>>>> Shall we implement a feature switch for that and turn it
off by
> >>> default?
> >>>>>>> Sorry I don’t have too much insight on disk txn sync.
> >>>>>>>
> >>>>>>> Andor
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 2018. Sep 13., at 9:16, Fangmin Lv <lvfangmin@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>> And to be clear, ZOOKEEPER-2418 is actually just one
case of
> >>>>>>> inconsistency
> >>>>>>>> which could caused by on disk txn sync, as I mentioned
in a newer
> >>> JIRA
> >>>>>>>> ZOOKEEPER-2846 <
> >>> https://issues.apache.org/jira/browse/ZOOKEEPER-2846>,
> >>>>>>> the
> >>>>>>>> snap sync or txn sync could also leave txns gap in the
txn file,
> >>> which
> >>>>>>> is a
> >>>>>>>> more common case could trigger this issue.
> >>>>>>>>
> >>>>>>>> I would suggest to turn off the on disk txn sync by
default for
> now
> >>> to
> >>>>>>>> avoid this issue, after we finished ZOOKEEPER-3114,
we can use
> that
> >>> to
> >>>>>>>> validate the on disk txns during syncing.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Fangmin
> >>>>>>>>
> >>>>>>>> On Wed, Sep 12, 2018 at 9:55 AM Fangmin Lv <lvfangmin@gmail.com>
> >>>>>> wrote:
> >>>>>>>>> Andor,
> >>>>>>>>>
> >>>>>>>>> ZOOKEEPER-3114 is about adding real time digest
checking to help
> >>>>>>> detecting
> >>>>>>>>> inconsistency, it's a new feature with amounts of
code change.
> I'll
> >>>>>>> start
> >>>>>>>>> upstream it part by part, but I don't expect it's
being merged in
> >>> the
> >>>>>>> next
> >>>>>>>>> few weeks. So yes, it's a nice to have, but definitely
not a
> block
> >>> for
> >>>>>>> 3.5.
> >>>>>>>>> Thanks,
> >>>>>>>>> Fangmin
> >>>>>>>>>
> >>>>>>>>> On Wed, Sep 12, 2018 at 2:55 AM Andor Molnar <andor@apache.org>
> >>>>>> wrote:
> >>>>>>>>>> Fangmin,
> >>>>>>>>>>
> >>>>>>>>>> Sorry, I just noticed that you want to include
the consistency
> >>> fixes
> >>>>>> in
> >>>>>>>>>> the stable version which is fine. Let’s finish
the backports and
> >>>>>> we’ll
> >>>>>>> be
> >>>>>>>>>> done with them.
> >>>>>>>>>>
> >>>>>>>>>> ZOOKEEPER-3114 is essentially a new feature,
I wouldn’t block
> 3.5
> >>>>>> with
> >>>>>>>>>> that. What do you think?
> >>>>>>>>>>
> >>>>>>>>>> Andor
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On 2018. Sep 12., at 11:52, Andor Molnar
<andor@apache.org>
> >>> wrote:
> >>>>>>>>>>> Cool, thanks for the clarification.
> >>>>>>>>>>>
> >>>>>>>>>>> The updated list is as follows:
> >>>>>>>>>>>
> >>>>>>>>>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic
Broadcast protocol)
> >>>>>>>>>>> - ZOOKEEPER-1818 (Fix don't care for trunk)
> >>>>>>>>>>> - ZOOKEEPER-2778 (Potential server deadlock
between follower
> sync
> >>>>>> with
> >>>>>>>>>> leader and follower receiving external connection
requests.)
> >>>>>>>>>>> The following are not critical and no blockers
for the stable
> >>>>>> release:
> >>>>>>>>>>> Waiting for to be ported to 3.5:
> >>>>>>>>>>> - ZOOKEEPER-3104
> >>>>>>>>>>> - ZOOKEEPER-3125
> >>>>>>>>>>> - ZOOKEEPER-3127
> >>>>>>>>>>>
> >>>>>>>>>>> New feature:
> >>>>>>>>>>> - ZOOKEEPER-3114 (fixes ZOOKEEPER-2184 too)
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Andor
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On 2018. Sep 12., at 0:42, Fangmin Lv
<lvfangmin@gmail.com>
> >>> wrote:
> >>>>>>>>>>>> Hi Andor,
> >>>>>>>>>>>>
> >>>>>>>>>>>> That's the on disk txn feature, which
was disabled internally
> >>> after
> >>>>>>> we
> >>>>>>>>>>>> found the potentially inconsistent issue.
The only solution we
> >>> have
> >>>>>>>>>> for now
> >>>>>>>>>>>> is waiting for the new digest checking
feature I mentioned in
> >>>>>>>>>>>> ZOOKEEPER-3114.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think there are some other critical
consistent issues we
> just
> >>>>>> fixed
> >>>>>>>>>> on
> >>>>>>>>>>>> master recently: ZOOKEEPER-3104, ZOOKEEPER-3125,
> >>> ZOOKEEPER-3127, I
> >>>>>>>>>> think we
> >>>>>>>>>>>> should include that in the official
3.5 release as well.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Fangmin
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Sep 11, 2018 at 11:58 AM Andor
Molnár <
> andor@apache.org
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>> Hi Jeelani,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks for letting me know. I'm
happy to remove it from the
> >>> list
> >>>>>> to
> >>>>>>>>>> get
> >>>>>>>>>>>>> closer to a stable release. :)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> What's the feature which can be
disabled to avoid data
> >>>>>>> inconsistency?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Andor
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 09/10/2018 11:33 PM, Mohamed
Jeelani wrote:
> >>>>>>>>>>>>>> Thanks Andor for compiling this.
Should we be ignoring
> >>>>>>>>>> ZOOKEEPER-2418 as
> >>>>>>>>>>>>> well? This exists in 3.4 as well
and the feature can be
> >>> disabled.
> >>>>>> We
> >>>>>>>>>> are
> >>>>>>>>>>>>> working on a longer term fix for
it in 3.6.
> >>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Jeelani
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 9/10/18, 5:19 AM, "Andor
Molnar"
> >>> <andor@cloudera.com.INVALID
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>> Fine.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'm happy to ignore 1549, 2846
and 2930. Still we have the
> >>> list
> >>>>>>> of:
> >>>>>>>>>>>>>> - ZOOKEEPER-236 (SSL/TLS support
for Atomic Broadcast
> >>> protocol)
> >>>>>>>>>>>>>> - ZOOKEEPER-1818 (Fix don't
care for trunk)
> >>>>>>>>>>>>>> - ZOOKEEPER-2418 (txnlog diff
sync can skip sending some
> >>>>>>>>>>>>> transactions to
> >>>>>>>>>>>>>> followers)
> >>>>>>>>>>>>>> - ZOOKEEPER-2778 (Potential
server deadlock between follower
> >>>>>> sync
> >>>>>>>>>>>>> with
> >>>>>>>>>>>>>> leader and follower receiving
external connection requests.)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> SSL (ZK-236) is a feature which
essential for the 3.5
> release,
> >>>>>>>>>> hence
> >>>>>>>>>>>>> I
> >>>>>>>>>>>>>> wouldn't leave it out or postpone
it for the next stable
> >>>>>> release.
> >>>>>>>>>> PR
> >>>>>>>>>>>>> has
> >>>>>>>>>>>>>> been out for a long time, get
on reviewing please.
> >>>>>>>>>>>>>> The rest are also long outstanding
issues which have been
> >>> found
> >>>>>> in
> >>>>>>>>>>>>> the 3.5
> >>>>>>>>>>>>>> branch.
> >>>>>>>>>>>>>> ZK-1818 is something which was
found in 3.4 and fixed in
> 3.4,
> >>>>>> but
> >>>>>>>>>>>>> never has
> >>>>>>>>>>>>>> been fixed in 3.5. Quite a serious
issue if still present.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think we should at least run
some manual testing and see
> if
> >>> we
> >>>>>>>>>>>>> could
> >>>>>>>>>>>>>> repro any of these issues before
going ahead with a stable
> >>>>>>> release.
> >>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>> Andor
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Sep 7, 2018 at 3:24
AM, Michael Han <
> hanm@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>> I haven't went through the
entire list, but looks like lots
> >>> of
> >>>>>> the
> >>>>>>>>>>>>> JIRA
> >>>>>>>>>>>>>>> issues listed in this thread,
such as ZOOKEEPER-1549, 2846,
> >>> also
> >>>>>>>>>>>>> affects
> >>>>>>>>>>>>>>> 3.4 releases. Should we
scope these issues out?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I think historically the
single outstanding blocking issue
> >>> for a
> >>>>>>>>>>>>> stable 3.5
> >>>>>>>>>>>>>>> release is the reconfig
feature and security concerns
> around
> >>> it
> >>>>>>>>>>>>> (somehow
> >>>>>>>>>>>>>>> addressed in ZOOKEEPER-2014),
and the alpha and beta
> releases
> >>>>>> were
> >>>>>>>>>>>>> created
> >>>>>>>>>>>>>>> to stabilize that feature.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__zookeeper-2Duser.578899.n2.nabble.com_Zookeeper-2Dwith-2D&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=_tGtL3nMWtuPrXKXDx27AIWOzyyT7W-CjIVLDFZwT0E&e=
> >>>>>>>>>>>>>>> SSL-release-date-tt7581744.html
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> So it looks like we are
in good shape to release. Something
> >>>>>> might
> >>>>>>>>>>>>> worth
> >>>>>>>>>>>>>>> doing to claim the quality
of 3.5 is on par with 3.4
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> * Run Jepsen on 3.5 - 3.4
passed the test for the record
> >>>>>>>>>>>>>>>
> >>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__aphyr.com_posts_291-2Djepsen-2Dzookeeper&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=VjORkX5s7hrJyl8mW9Q4cfeSWF4qfTdyRjcuAiBt0y4&e=
> >>>>>>>>>>>>>>> * Fix all flaky tests on
3.5 - 3.4 has little or no flaky
> >>> tests
> >>>>>> at
> >>>>>>>>>>>>> all.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 1:48
AM, Andor Molnar
> >>>>>>>>>>>>> <andor@cloudera.com.invalid>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks Maoling! That
would be huge help, I appreciate it.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Andor
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message