zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andor Molnar <an...@cloudera.com.INVALID>
Subject Re: ZooKeeper 3.5 blocker issues
Date Tue, 18 Dec 2018 08:52:39 GMT
Sure, good point. Let's put it on the list.

Andor


On Tue, Dec 18, 2018 at 12:17 AM Patrick Hunt <phunt@apache.org> wrote:

> Are folks OK to wait on that OWASP issue I documented over the weekend?
> afaict we are not affected but it would be good to get another pair of eyes
> on it.
>
> Patrick
>
> On Mon, Dec 17, 2018 at 2:55 PM Andor Molnár <andor@apache.org> wrote:
>
> > Hi team,
> >
> >
> > I'm proudly announce that thanks to the joint effort from the community,
> > the 3.5 blockers list has become empty:
> >
> > "project = ZooKeeper AND resolution = Unresolved AND fixVersion = 3.5.5
> > AND priority in (blocker, critical) ORDER BY priority DESC, key ASC"
> >
> >
> > Well... almost. All the blocker issues have gone, but we still have the
> > Maven migration to complete before the stable release. If you have some
> > free cycles, please join us testing the Maven build on this PR:
> >
> > https://github.com/apache/zookeeper/pull/708
> >
> > I hope we can merge it pretty soon.
> >
> >
> > In terms of the builds, the weather at 3.5 branch is quite sunny
> nowadays:
> >
> > https://builds.apache.org/view/S-Z/view/ZooKeeper/
> >
> > The Java 11 build is still having some difficulties, which hopefully I
> > can address before the holidays:
> >
> > https://issues.apache.org/jira/browse/ZOOKEEPER-3204
> >
> >
> > If you happen to know about something which is important from 3.5's
> > perspective and missing from the above, please don't hesitate to share.
> >
> >
> > Happy ZooKeeping!
> >
> > Andor
> >
> >
> >
> > On 11/2/18 21:12, Fangmin Lv wrote:
> > > Andor,
> > >
> > > Here is the PR to port ZK-3104 from master to 3.4:
> > > https://github.com/apache/zookeeper/pull/685.
> > >
> > > Fangmin
> > >
> > > On Fri, Nov 2, 2018 at 11:46 AM Fangmin Lv <lvfangmin@gmail.com>
> wrote:
> > >
> > >> Hi Andor,
> > >>
> > >> Is anyone working on ZK-2778? I can pick it up if there is no one
> > working
> > >> on it yet.
> > >>
> > >> I'll open a 3.5 PR for ZK-3104 today.
> > >>
> > >> Fangmin
> > >>
> > >> On Fri, Oct 26, 2018 at 3:33 AM Andor Molnar <andor@apache.org>
> wrote:
> > >>
> > >>> Hi folks,
> > >>>
> > >>> You’ve probably realised lots of update emails coming from Jira.
> Please
> > >>> be aware that we’ve updated a bunch of open blocker/critical 3.5
> > tickets to
> > >>> reflect to what we discussed in this email.
> > >>>
> > >>> If you open up the following jira filter:
> > >>>
> > >>> project = ZooKeeper and resolution = Unresolved and fixVersion =
> 3.5.5
> > >>> AND priority in (blocker, critical) ORDER BY priority DESC, key ASC
> > >>>
> > >>> You’ll see the most up-to-date list of tickets which need to be
> > addressed
> > >>> before the stable 3.5 release.
> > >>>
> > >>> Thank you for your efforts to get this done.
> > >>>
> > >>> Fangmin, ZK-3104 is waiting for backport, but ticket has already been
> > >>> resolved. Have you created a separate ticket for the backport or
> shall
> > I
> > >>> just reopen it with the right fix versions?
> > >>>
> > >>> Thanks,
> > >>> Andor
> > >>>
> > >>>
> > >>>
> > >>>> On 2018. Oct 8., at 12:34, Andor Molnar <andor@apache.org>
wrote:
> > >>>>
> > >>>> Hi,
> > >>>>
> > >>>> Let me summarize and give a quick update on the outstanding issues
> for
> > >>> 3.5 GA:
> > >>>> - ZOOKEEPER-1818 (Fix don't care for trunk)
> > >>>> - ZOOKEEPER-2778 (Potential server deadlock between follower sync
> with
> > >>> leader and follower receiving external connection requests.)
> > >>>> - ZOOKEEPER-3021 Migrate project structure to Maven (ongoing)
> > >>>> - ZOOKEEPER-925 Docs generation to Maven
> > >>>> - ZOOKEEPER-3104 (waiting for backport)
> > >>>> - ZOOKEEPER-3125 (waiting for backport PR #647)
> > >>>>
> > >>>> The 2 Maven related tickets are no-brainers as well as the
> backports.
> > >>> ZK-2778 has been picked up by Maoling (thanks!) as far as I can see,
> > >>> ZK-1818 is the only one waiting for a volunteer.
> > >>>> Please correct me if I’ve missed something.
> > >>>>
> > >>>> Regards,
> > >>>> Andor
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>> On 2018. Sep 28., at 18:32, Tamas Penzes
> <tamaas@cloudera.com.INVALID
> > >
> > >>> wrote:
> > >>>>> Hi All,
> > >>>>>
> > >>>>> I would add ZOOKEEPER-3021
> > >>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-3021>
Migrate
> > project
> > >>>>> structure to Maven build as a blocker too. Since the migration
has
> > >>> started
> > >>>>> it would be good to finish before releasing ZK 3.5.x GA.
> > >>>>>
> > >>>>> ZOOKEEPER-925 <https://issues.apache.org/jira/browse/ZOOKEEPER-925
> >
> > >>> replace
> > >>>>> our forrest site and documentation generation might also be
a good
> > >>> idea,
> > >>>>> since then we could deliver the new MarkDown based documentation.
> > >>>>>
> > >>>>> Regards, Tamaas
> > >>>>>
> > >>>>> On Fri, Sep 14, 2018 at 10:09 AM Fangmin Lv <lvfangmin@gmail.com>
> > >>> wrote:
> > >>>>>> Oh, sorry for the confusion, I should provide more context.
> > >>>>>>
> > >>>>>> Leader will use on disk txn sync with followers to if the
peer
> zxid
> > >>> is not
> > >>>>>> in it's in memory commit logs, the code is here: Leader
on disk
> txn
> > >>> sync
> > >>>>>> <
> > >>>>>>
> > >>>
> >
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java#L774
> > >>>>>>> .
> > >>>>>> There is bug that potentially there will be gap in the
txn files,
> > like
> > >>>>>> after snap sync, etc, so it's possible the peer will miss
txns due
> > to
> > >>> this.
> > >>>>>> The option to disable it is snapshotSizeFactor
> > >>>>>> <
> > >>>>>>
> > >>>
> >
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZKDatabase.java#L81
> > >>>>>>> ,
> > >>>>>> set it to -1 will disable this feature. On 3.5, it's better
to
> have
> > a
> > >>> PR to
> > >>>>>> set this to -1 by default. It might have more SNAP sync,
but from
> > our
> > >>> prod
> > >>>>>> it doesn't seem to be a big problem to me.
> > >>>>>>
> > >>>>>> I can send out the diff to disable it by default on 3.5
if you
> guys
> > >>> think
> > >>>>>> this is the right way to do.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Fangmin
> > >>>>>>
> > >>>>>> On Thu, Sep 13, 2018 at 1:58 AM Andor Molnar <andor@apache.org>
> > >>> wrote:
> > >>>>>>> What’s needed to turn it off?
> > >>>>>>> Do we need a PR or it’s just a config option?
> > >>>>>>> Shall we implement a feature switch for that and turn
it off by
> > >>> default?
> > >>>>>>> Sorry I don’t have too much insight on disk txn sync.
> > >>>>>>>
> > >>>>>>> Andor
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> On 2018. Sep 13., at 9:16, Fangmin Lv <lvfangmin@gmail.com>
> > wrote:
> > >>>>>>>>
> > >>>>>>>> And to be clear, ZOOKEEPER-2418 is actually just
one case of
> > >>>>>>> inconsistency
> > >>>>>>>> which could caused by on disk txn sync, as I mentioned
in a
> newer
> > >>> JIRA
> > >>>>>>>> ZOOKEEPER-2846 <
> > >>> https://issues.apache.org/jira/browse/ZOOKEEPER-2846>,
> > >>>>>>> the
> > >>>>>>>> snap sync or txn sync could also leave txns gap
in the txn file,
> > >>> which
> > >>>>>>> is a
> > >>>>>>>> more common case could trigger this issue.
> > >>>>>>>>
> > >>>>>>>> I would suggest to turn off the on disk txn sync
by default for
> > now
> > >>> to
> > >>>>>>>> avoid this issue, after we finished ZOOKEEPER-3114,
we can use
> > that
> > >>> to
> > >>>>>>>> validate the on disk txns during syncing.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Fangmin
> > >>>>>>>>
> > >>>>>>>> On Wed, Sep 12, 2018 at 9:55 AM Fangmin Lv <lvfangmin@gmail.com
> >
> > >>>>>> wrote:
> > >>>>>>>>> Andor,
> > >>>>>>>>>
> > >>>>>>>>> ZOOKEEPER-3114 is about adding real time digest
checking to
> help
> > >>>>>>> detecting
> > >>>>>>>>> inconsistency, it's a new feature with amounts
of code change.
> > I'll
> > >>>>>>> start
> > >>>>>>>>> upstream it part by part, but I don't expect
it's being merged
> in
> > >>> the
> > >>>>>>> next
> > >>>>>>>>> few weeks. So yes, it's a nice to have, but
definitely not a
> > block
> > >>> for
> > >>>>>>> 3.5.
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Fangmin
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Sep 12, 2018 at 2:55 AM Andor Molnar
<andor@apache.org
> >
> > >>>>>> wrote:
> > >>>>>>>>>> Fangmin,
> > >>>>>>>>>>
> > >>>>>>>>>> Sorry, I just noticed that you want to
include the consistency
> > >>> fixes
> > >>>>>> in
> > >>>>>>>>>> the stable version which is fine. Let’s
finish the backports
> and
> > >>>>>> we’ll
> > >>>>>>> be
> > >>>>>>>>>> done with them.
> > >>>>>>>>>>
> > >>>>>>>>>> ZOOKEEPER-3114 is essentially a new feature,
I wouldn’t block
> > 3.5
> > >>>>>> with
> > >>>>>>>>>> that. What do you think?
> > >>>>>>>>>>
> > >>>>>>>>>> Andor
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> On 2018. Sep 12., at 11:52, Andor Molnar
<andor@apache.org>
> > >>> wrote:
> > >>>>>>>>>>> Cool, thanks for the clarification.
> > >>>>>>>>>>>
> > >>>>>>>>>>> The updated list is as follows:
> > >>>>>>>>>>>
> > >>>>>>>>>>> - ZOOKEEPER-236 (SSL/TLS support for
Atomic Broadcast
> protocol)
> > >>>>>>>>>>> - ZOOKEEPER-1818 (Fix don't care for
trunk)
> > >>>>>>>>>>> - ZOOKEEPER-2778 (Potential server
deadlock between follower
> > sync
> > >>>>>> with
> > >>>>>>>>>> leader and follower receiving external
connection requests.)
> > >>>>>>>>>>> The following are not critical and
no blockers for the stable
> > >>>>>> release:
> > >>>>>>>>>>> Waiting for to be ported to 3.5:
> > >>>>>>>>>>> - ZOOKEEPER-3104
> > >>>>>>>>>>> - ZOOKEEPER-3125
> > >>>>>>>>>>> - ZOOKEEPER-3127
> > >>>>>>>>>>>
> > >>>>>>>>>>> New feature:
> > >>>>>>>>>>> - ZOOKEEPER-3114 (fixes ZOOKEEPER-2184
too)
> > >>>>>>>>>>>
> > >>>>>>>>>>> Regards,
> > >>>>>>>>>>> Andor
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>> On 2018. Sep 12., at 0:42, Fangmin
Lv <lvfangmin@gmail.com>
> > >>> wrote:
> > >>>>>>>>>>>> Hi Andor,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> That's the on disk txn feature,
which was disabled
> internally
> > >>> after
> > >>>>>>> we
> > >>>>>>>>>>>> found the potentially inconsistent
issue. The only solution
> we
> > >>> have
> > >>>>>>>>>> for now
> > >>>>>>>>>>>> is waiting for the new digest checking
feature I mentioned
> in
> > >>>>>>>>>>>> ZOOKEEPER-3114.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I think there are some other critical
consistent issues we
> > just
> > >>>>>> fixed
> > >>>>>>>>>> on
> > >>>>>>>>>>>> master recently: ZOOKEEPER-3104,
ZOOKEEPER-3125,
> > >>> ZOOKEEPER-3127, I
> > >>>>>>>>>> think we
> > >>>>>>>>>>>> should include that in the official
3.5 release as well.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>> Fangmin
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Tue, Sep 11, 2018 at 11:58 AM
Andor Molnár <
> > andor@apache.org
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>>> Hi Jeelani,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks for letting me know.
I'm happy to remove it from the
> > >>> list
> > >>>>>> to
> > >>>>>>>>>> get
> > >>>>>>>>>>>>> closer to a stable release.
:)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> What's the feature which can
be disabled to avoid data
> > >>>>>>> inconsistency?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Andor
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On 09/10/2018 11:33 PM, Mohamed
Jeelani wrote:
> > >>>>>>>>>>>>>> Thanks Andor for compiling
this. Should we be ignoring
> > >>>>>>>>>> ZOOKEEPER-2418 as
> > >>>>>>>>>>>>> well? This exists in 3.4 as
well and the feature can be
> > >>> disabled.
> > >>>>>> We
> > >>>>>>>>>> are
> > >>>>>>>>>>>>> working on a longer term fix
for it in 3.6.
> > >>>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Jeelani
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On 9/10/18, 5:19 AM,
"Andor Molnar"
> > >>> <andor@cloudera.com.INVALID
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>>>> Fine.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I'm happy to ignore 1549,
2846 and 2930. Still we have the
> > >>> list
> > >>>>>>> of:
> > >>>>>>>>>>>>>> - ZOOKEEPER-236 (SSL/TLS
support for Atomic Broadcast
> > >>> protocol)
> > >>>>>>>>>>>>>> - ZOOKEEPER-1818 (Fix don't
care for trunk)
> > >>>>>>>>>>>>>> - ZOOKEEPER-2418 (txnlog
diff sync can skip sending some
> > >>>>>>>>>>>>> transactions to
> > >>>>>>>>>>>>>> followers)
> > >>>>>>>>>>>>>> - ZOOKEEPER-2778 (Potential
server deadlock between
> follower
> > >>>>>> sync
> > >>>>>>>>>>>>> with
> > >>>>>>>>>>>>>> leader and follower receiving
external connection
> requests.)
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> SSL (ZK-236) is a feature
which essential for the 3.5
> > release,
> > >>>>>>>>>> hence
> > >>>>>>>>>>>>> I
> > >>>>>>>>>>>>>> wouldn't leave it out or
postpone it for the next stable
> > >>>>>> release.
> > >>>>>>>>>> PR
> > >>>>>>>>>>>>> has
> > >>>>>>>>>>>>>> been out for a long time,
get on reviewing please.
> > >>>>>>>>>>>>>> The rest are also long
outstanding issues which have been
> > >>> found
> > >>>>>> in
> > >>>>>>>>>>>>> the 3.5
> > >>>>>>>>>>>>>> branch.
> > >>>>>>>>>>>>>> ZK-1818 is something which
was found in 3.4 and fixed in
> > 3.4,
> > >>>>>> but
> > >>>>>>>>>>>>> never has
> > >>>>>>>>>>>>>> been fixed in 3.5. Quite
a serious issue if still present.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I think we should at least
run some manual testing and see
> > if
> > >>> we
> > >>>>>>>>>>>>> could
> > >>>>>>>>>>>>>> repro any of these issues
before going ahead with a stable
> > >>>>>>> release.
> > >>>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>>> Andor
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Fri, Sep 7, 2018 at
3:24 AM, Michael Han <
> > hanm@apache.org>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>> I haven't went through
the entire list, but looks like
> lots
> > >>> of
> > >>>>>> the
> > >>>>>>>>>>>>> JIRA
> > >>>>>>>>>>>>>>> issues listed in this
thread, such as ZOOKEEPER-1549,
> 2846,
> > >>> also
> > >>>>>>>>>>>>> affects
> > >>>>>>>>>>>>>>> 3.4 releases. Should
we scope these issues out?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> I think historically
the single outstanding blocking
> issue
> > >>> for a
> > >>>>>>>>>>>>> stable 3.5
> > >>>>>>>>>>>>>>> release is the reconfig
feature and security concerns
> > around
> > >>> it
> > >>>>>>>>>>>>> (somehow
> > >>>>>>>>>>>>>>> addressed in ZOOKEEPER-2014),
and the alpha and beta
> > releases
> > >>>>>> were
> > >>>>>>>>>>>>> created
> > >>>>>>>>>>>>>>> to stabilize that feature.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__zookeeper-2Duser.578899.n2.nabble.com_Zookeeper-2Dwith-2D&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=_tGtL3nMWtuPrXKXDx27AIWOzyyT7W-CjIVLDFZwT0E&e=
> > >>>>>>>>>>>>>>> SSL-release-date-tt7581744.html
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> So it looks like we
are in good shape to release.
> Something
> > >>>>>> might
> > >>>>>>>>>>>>> worth
> > >>>>>>>>>>>>>>> doing to claim the
quality of 3.5 is on par with 3.4
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> * Run Jepsen on 3.5
- 3.4 passed the test for the record
> > >>>>>>>>>>>>>>>
> > >>>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__aphyr.com_posts_291-2Djepsen-2Dzookeeper&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=VjORkX5s7hrJyl8mW9Q4cfeSWF4qfTdyRjcuAiBt0y4&e=
> > >>>>>>>>>>>>>>> * Fix all flaky tests
on 3.5 - 3.4 has little or no flaky
> > >>> tests
> > >>>>>> at
> > >>>>>>>>>>>>> all.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Tue, Sep 4, 2018
at 1:48 AM, Andor Molnar
> > >>>>>>>>>>>>> <andor@cloudera.com.invalid>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Thanks Maoling!
That would be huge help, I appreciate
> it.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Andor
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message