mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anirudh <anirudh2...@gmail.com>
Subject Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2
Date Thu, 03 May 2018 19:44:20 GMT
Hi Naveen,

You raise a good point and I agree that by default MKLDNN by default should
be switched off.
Because of a bug in Cmakelists.txt which has been fixed as part of #10731,
which is merged to master (but not on the release branch),
the users won't have MKLDNN enabled even though MKLDNN is set to ON by
default.
Since cmake install instructions have not been published in mxnet.io and is
lesser used compared to pip package and make from what I have seen,
 would it be acceptable for you if this is added as a known issue and a
workaround is provided in the release notes ?
The impacted users(who will have to use the workaround) here would be the
customers who are interested in MKLDNN and use cmake, users who aren't
interested in using MKLDNN feature won't be impacted.

Anirudh

On Thu, May 3, 2018 at 12:16 PM, Marco de Abreu <
marco.g.abreu@googlemail.com> wrote:

> The MKLDNN tests are not really less stable than the other tests. It's
> pretty much the same across all tests we have. So I wouldn't say there's a
> need to fix them in a separate branch.
>
> On Thu, May 3, 2018 at 9:00 PM, Naveen Swamy <mnnaveen@gmail.com> wrote:
>
> > I also meant(but forgot to send), we stabilize it on a separate branch
> and
> > then bring in the changes instead of blocking the PRs.
> >
> > On Thu, May 3, 2018 at 11:57 AM, Marco de Abreu <
> > marco.g.abreu@googlemail.com> wrote:
> >
> > > I think the failing tests are really getting an issue. We now got
> roughly
> > > 50 test failure related issues [1], leading to a average failure rate
> of
> > > 50%. Considering the costs in terms of money and time per run, this is
> > > adding up quite significantly.
> > >
> > > Didn't we just remove MKLML from our codebase to replace it with
> MKLDNN?
> > I
> > > think removing something and marking the replacement as experimental
> > could
> > > be difficult from a user perspective. Personally, I don't really feel
> > > comfortable solving the problem of known issues by marking something as
> > > experimental. We're basically shifting the responsibility to our users
> > that
> > > way.
> > >
> > > I don't think we should stop testing MKLDNN in our CI. We already had
> the
> > > situation a few months ago where the solution to failed tests was to
> > > disable them. We shouldn't go back to that.
> > >
> > > -Marco
> > >
> > > [1]:
> > > https://github.com/apache/incubator-mxnet/issues?q=is%
> > > 3Aopen+is%3Aissue+label%3ATest
> > >
> > > On Thu, May 3, 2018 at 8:46 PM, Naveen Swamy <mnnaveen@gmail.com>
> wrote:
> > >
> > > > USE_MKLDNN is set to ON in the cmake file by default, since its
> > > > experimental can we turn OFF  so there is some determinism when users
> > > build
> > > > and test.
> > > >
> > > > https://github.com/apache/incubator-mxnet/blob/
> > > > 60641ef1183bb4584c9356e84b6ca6d5fce58d6d/CMakeLists.txt#L23
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On a separate note, since MKLDNN is experimental can we stop building
> > on
> > > > master and cause PR's to queue up.
> > > >
> > > >
> > > > On Thu, May 3, 2018 at 11:33 AM, Anirudh <anirudh2290@gmail.com>
> > wrote:
> > > >
> > > > > Correction: I was able to reproduce the issue with MKLDNN enabled
> on
> > > > > master, but not on 1.2 branch.
> > > > >
> > > > > On Thu, May 3, 2018 at 11:33 AM, Anirudh <anirudh2290@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Pedro and Naveen,
> > > > > >
> > > > > > I am unable to reproduce this issue with MKLDNN on the master
but
> > not
> > > > on
> > > > > > the 1.2.RC2 branch.
> > > > > >
> > > > > > Did the following on 1.2.RC2 branch:
> > > > > >
> > > > > > make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas
> USE_DIST_KVSTORE=0
> > > > > > USE_CUDA=0 USE_CUDNN=0 USE_MKLDNN=1
> > > > > > export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
> > > > > > export MXNET_TEST_SEED=11
> > > > > > export MXNET_MODULE_SEED=812478194
> > > > > > export MXNET_TEST_COUNT=10000
> > > > > > nosetests-2.7 -v tests/python/unittest/test_
> > > > > module.py:test_forward_reshape
> > > > > >
> > > > > > Was able to do the 10k runs successfully.
> > > > > >
> > > > > > Anirudh
> > > > > >
> > > > > > On Thu, May 3, 2018 at 8:46 AM, Anirudh <anirudh2290@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Hi Pedro and Naveen,
> > > > > >>
> > > > > >> Is this issue reproducible when MXNet is built with
> USE_MKLDNN=0?
> > > > > >> Also, there are a bunch of MKLDNN fixes that didn't go into
the
> > > > release
> > > > > >> branch. Is this issue reproducible on the release branch
?
> > > > > >> In my opinion, since we have marked MKLDNN as experimental
> feature
> > > for
> > > > > >> the release, if it is confirmed to be a MKLDNN issue
> > > > > >> we don't need to block the release on it.
> > > > > >>
> > > > > >> Anirudh
> > > > > >>
> > > > > >> On Thu, May 3, 2018 at 6:58 AM, Naveen Swamy <
> mnnaveen@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >>> Thanks for raising this issue Pedro.
> > > > > >>>
> > > > > >>> -1(binding)
> > > > > >>>
> > > > > >>> We were in a similar state for a while a year ago, a
lot of
> > effort
> > > > went
> > > > > >>> to
> > > > > >>> stabilize the tests and the CI. I have seen the PR builds
are
> > > > > >>> non-deterministic and you have to retry over and over
(wasting
> > > > > resources
> > > > > >>> and time) and hope you get lucky.
> > > > > >>>
> > > > > >>> Look at the dashboard for master build
> > > > > >>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-
> > > > mxnet/job/master/
> > > > > >>>
> > > > > >>> -Naveen
> > > > > >>>
> > > > > >>> On Thu, May 3, 2018 at 5:11 AM, Pedro Larroy <
> > > > > >>> pedro.larroy.lists@gmail.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> > -1  nondeterminisitc failures on CI master:
> > > > > >>> > https://issues.apache.org/jira/browse/MXNET-396
> > > > > >>> >
> > > > > >>> > Was able to reproduce once in a fresh p3 instance
with DLAMI
> > > can't
> > > > > >>> > reproduce consistently.
> > > > > >>> >
> > > > > >>> > On Wed, May 2, 2018 at 9:51 PM, Anirudh <
> anirudh2290@gmail.com
> > >
> > > > > wrote:
> > > > > >>> >
> > > > > >>> > > Hi all,
> > > > > >>> > >
> > > > > >>> > > As part of RC2 release, we have addressed
bugs and some
> > > concerns
> > > > > that
> > > > > >>> > were
> > > > > >>> > > raised.
> > > > > >>> > >
> > > > > >>> > > I would like to propose a vote to release
Apache MXNet
> > > > (incubating)
> > > > > >>> > version
> > > > > >>> > > 1.2.0.RC2. Voting will start now (Wednesday,
May 2nd) and
> end
> > > at
> > > > > >>> 12:50 PM
> > > > > >>> > > PDT, Sunday, May 6th.
> > > > > >>> > >
> > > > > >>> > > Link to release notes:
> > > > > >>> > > https://cwiki.apache.org/confluence/display/MXNET/
> > > > > >>> > > Apache+MXNet+%28incubating%29+1.2.0+Release+Notes
> > > > > >>> > >
> > > > > >>> > > Link to release candidate 1.2.0.rc2:
> > > > > >>> > > https://github.com/apache/incubator-mxnet/releases/tag/
> > > 1.2.0.rc2
> > > > > >>> > >
> > > > > >>> > > Voting results for 1.2.0.rc2:
> > > > > >>> > > https://lists.apache.org/thread.html/
> > > > > ebe561c609a8e32351dfe4aafc8876
> > > > > >>> > > 199560336472726b58c3455e85@%3Cdev.mxnet.apache.org%3E
> > > > > >>> > >
> > > > > >>> > > View this page, click on "Build from Source",
and use the
> > > source
> > > > > code
> > > > > >>> > > obtained from 1.2.0.rc2 tag:
> > > > > >>> > > https://mxnet.incubator.apache.org/install/index.html
> > > > > >>> > >
> > > > > >>> > > (Note: The README.md points to the 1.2.0 tag
and does not
> > work
> > > at
> > > > > the
> > > > > >>> > > moment.)
> > > > > >>> > >
> > > > > >>> > > Please remember to test first before voting
accordingly:
> > > > > >>> > >
> > > > > >>> > > +1 = approve
> > > > > >>> > > +0 = no opinion
> > > > > >>> > > -1 = disapprove (provide reason)
> > > > > >>> > >
> > > > > >>> > > Anirudh
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message