mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steffen Rochel <steffenroc...@gmail.com>
Subject Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2
Date Mon, 07 May 2018 05:08:21 GMT
+1 (non-binding). Tested with selected notebooks from The Straight Dope.
So many important enhancements everybody contributed and our users are
waiting for. Hope we will see more votes.
Steffen
On Mon, May 7, 2018 at 1:07 AM Anirudh <anirudh2290@gmail.com> wrote:

> Hi all,
>
> Since we don't have enough binding votes yet, I am extending the vote till
> tomorrow (Monday May 7th), 12:50 PM PDT.
>
> Anirudh
>
> On Sun, May 6, 2018 at 4:05 PM, Anirudh <anirudh2290@gmail.com> wrote:
>
> > Hi Pedro,
> >
> > Thanks for the clarification. I was able to reproduce the issue with
> > USE_OPENMP=OFF. I wasn't able to reproduce the issue with Make. Since the
> > issue is not reproducible with make and the customers using
> USE_OPENMP=OFF
> > with cmake should be small, I agree with you that this should not be a
> > blocker. I have added the issue to known issues in release notes:
> > https://github.com/apache/incubator-mxnet/releases/tag/1.2.0.rc2
> >
> > Anirudh
> >
> > On Sun, May 6, 2018 at 9:03 AM, Pedro Larroy <
> pedro.larroy.lists@gmail.com
> > > wrote:
> >
> >> Agreed, I was not aware that the problems where not present in the
> release
> >> branch.
> >>
> >> On Fri, May 4, 2018 at 8:32 PM, Haibin Lin <haibin.lin.aws@gmail.com>
> >> wrote:
> >>
> >> > I agree with Anirudh that the focus of the discussion should be
> limited
> >> to
> >> > the release branch, not the master branch. Anything that breaks on
> >> master
> >> > but works on release branch should not block the release itself.
> >> >
> >> >
> >> > Best,
> >> >
> >> > Haibin
> >> >
> >> > On Fri, May 4, 2018 at 10:58 AM, Pedro Larroy <
> >> > pedro.larroy.lists@gmail.com>
> >> > wrote:
> >> >
> >> > > I see your point.
> >> > >
> >> > > I checked the failures on the v1.2.0 branch and I don't see
> segfaults,
> >> > just
> >> > > minor failures due to flaky tests.
> >> > >
> >> > > I will trigger it repeatedly a few times until Sunday to have a and
> >> > change
> >> > > my vote accordingly.
> >> > >
> >> > >
> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.2.0/
> >> > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> >> > > incubator-mxnet/detail/v1.2.0/17/pipeline
> >> > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> >> > > incubator-mxnet/detail/v1.2.0/15/pipeline/
> >> > >
> >> > >
> >> > > Pedro.
> >> > >
> >> > > On Fri, May 4, 2018 at 7:16 PM, Anirudh <anirudh2290@gmail.com>
> >> wrote:
> >> > >
> >> > > > Hi Pedro,
> >> > > >
> >> > > > Thank you for the suggestions. I will try to reproduce this
> without
> >> > fixed
> >> > > > seeds and also run it for a longer time duration.
> >> > > > Having said that, running unit tests over and over for a couple
of
> >> days
> >> > > > will likely cause
> >> > > > problems  because there around 42 open issues for flaky tests:
> >> > > > https://github.com/apache/incubator-mxnet/issues?q=is%
> >> > > > 3Aopen+is%3Aissue+label%3AFlaky
> >> > > > Also, the release branch has diverged from master around 3 weeks
> >> back
> >> > and
> >> > > > it doesn't have many of the changes merged to the master.
> >> > > > So, my question essentially is, what will be your benchmark to
> >> accept
> >> > the
> >> > > > release ?
> >> > > > Is it that we run the test which you provided on 1.2 without
fixed
> >> > seeds
> >> > > > and for a longer duration without failures ?
> >> > > > Or is it that all unit tests should pass over a period of 2 days
> >> > without
> >> > > > issues. This may require fixing all of the flaky tests which
would
> >> > delay
> >> > > > the release by considerable amount of time.
> >> > > > Or is it something else ?
> >> > > >
> >> > > > Anirudh
> >> > > >
> >> > > >
> >> > > > On Fri, May 4, 2018 at 4:49 AM, Pedro Larroy <
> >> > > pedro.larroy.lists@gmail.com
> >> > > > >
> >> > > > wrote:
> >> > > >
> >> > > > > Could you remove the fixed seeds and run it for a couple
of
> hours
> >> > with
> >> > > an
> >> > > > > additional loop?  Also I would suggest running the unit
tests
> over
> >> > and
> >> > > > over
> >> > > > > for a couple of days if possible.
> >> > > > >
> >> > > > >
> >> > > > > Pedro.
> >> > > > >
> >> > > > > On Thu, May 3, 2018 at 8:33 PM, Anirudh <anirudh2290@gmail.com>
> >> > wrote:
> >> > > > >
> >> > > > > > Hi Pedro and Naveen,
> >> > > > > >
> >> > > > > > I am unable to reproduce this issue with MKLDNN on
the master
> >> but
> >> > not
> >> > > > on
> >> > > > > > the 1.2.RC2 branch.
> >> > > > > >
> >> > > > > > Did the following on 1.2.RC2 branch:
> >> > > > > >
> >> > > > > > make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas
> >> USE_DIST_KVSTORE=0
> >> > > > > > USE_CUDA=0 USE_CUDNN=0 USE_MKLDNN=1
> >> > > > > > export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
> >> > > > > > export MXNET_TEST_SEED=11
> >> > > > > > export MXNET_MODULE_SEED=812478194
> >> > > > > > export MXNET_TEST_COUNT=10000
> >> > > > > > nosetests-2.7 -v tests/python/unittest/test_
> >> > > > > module.py:test_forward_reshape
> >> > > > > >
> >> > > > > > Was able to do the 10k runs successfully.
> >> > > > > >
> >> > > > > > Anirudh
> >> > > > > >
> >> > > > > > On Thu, May 3, 2018 at 8:46 AM, Anirudh <
> anirudh2290@gmail.com>
> >> > > wrote:
> >> > > > > >
> >> > > > > > > Hi Pedro and Naveen,
> >> > > > > > >
> >> > > > > > > Is this issue reproducible when MXNet is built
with
> >> USE_MKLDNN=0?
> >> > > > > > > Also, there are a bunch of MKLDNN fixes that didn't
go into
> >> the
> >> > > > release
> >> > > > > > > branch. Is this issue reproducible on the release
branch ?
> >> > > > > > > In my opinion, since we have marked MKLDNN as
experimental
> >> > feature
> >> > > > for
> >> > > > > > the
> >> > > > > > > release, if it is confirmed to be a MKLDNN issue
> >> > > > > > > we don't need to block the release on it.
> >> > > > > > >
> >> > > > > > > Anirudh
> >> > > > > > >
> >> > > > > > > On Thu, May 3, 2018 at 6:58 AM, Naveen Swamy <
> >> mnnaveen@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > >> Thanks for raising this issue Pedro.
> >> > > > > > >>
> >> > > > > > >> -1(binding)
> >> > > > > > >>
> >> > > > > > >> We were in a similar state for a while a year
ago, a lot of
> >> > effort
> >> > > > > went
> >> > > > > > to
> >> > > > > > >> stabilize the tests and the CI. I have seen
the PR builds
> are
> >> > > > > > >> non-deterministic and you have to retry over
and over
> >> (wasting
> >> > > > > resources
> >> > > > > > >> and time) and hope you get lucky.
> >> > > > > > >>
> >> > > > > > >> Look at the dashboard for master build
> >> > > > > > >> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-
> >> > > > mxnet/job/master/
> >> > > > > > >>
> >> > > > > > >> -Naveen
> >> > > > > > >>
> >> > > > > > >> On Thu, May 3, 2018 at 5:11 AM, Pedro Larroy
<
> >> > > > > > >> pedro.larroy.lists@gmail.com>
> >> > > > > > >> wrote:
> >> > > > > > >>
> >> > > > > > >> > -1  nondeterminisitc failures on CI master:
> >> > > > > > >> > https://issues.apache.org/jira/browse/MXNET-396
> >> > > > > > >> >
> >> > > > > > >> > Was able to reproduce once in a fresh
p3 instance with
> >> DLAMI
> >> > > > can't
> >> > > > > > >> > reproduce consistently.
> >> > > > > > >> >
> >> > > > > > >> > On Wed, May 2, 2018 at 9:51 PM, Anirudh
<
> >> > anirudh2290@gmail.com>
> >> > > > > > wrote:
> >> > > > > > >> >
> >> > > > > > >> > > Hi all,
> >> > > > > > >> > >
> >> > > > > > >> > > As part of RC2 release, we have
addressed bugs and some
> >> > > concerns
> >> > > > > > that
> >> > > > > > >> > were
> >> > > > > > >> > > raised.
> >> > > > > > >> > >
> >> > > > > > >> > > I would like to propose a vote to
release Apache MXNet
> >> > > > > (incubating)
> >> > > > > > >> > version
> >> > > > > > >> > > 1.2.0.RC2. Voting will start now
(Wednesday, May 2nd)
> and
> >> > end
> >> > > at
> >> > > > > > >> 12:50 PM
> >> > > > > > >> > > PDT, Sunday, May 6th.
> >> > > > > > >> > >
> >> > > > > > >> > > Link to release notes:
> >> > > > > > >> > > https://cwiki.apache.org/confluence/display/MXNET/
> >> > > > > > >> > > Apache+MXNet+%28incubating%29+1.2.0+Release+Notes
> >> > > > > > >> > >
> >> > > > > > >> > > Link to release candidate 1.2.0.rc2:
> >> > > > > > >> > >
> https://github.com/apache/incubator-mxnet/releases/tag/
> >> > > > 1.2.0.rc2
> >> > > > > > >> > >
> >> > > > > > >> > > Voting results for 1.2.0.rc2:
> >> > > > > > >> > > https://lists.apache.org/thread.html/
> >> > > > > ebe561c609a8e32351dfe4aafc8876
> >> > > > > > >> > > 199560336472726b58c3455e85@%3Cdev.mxnet.apache.org%3E
> >> > > > > > >> > >
> >> > > > > > >> > > View this page, click on "Build
from Source", and use
> the
> >> > > source
> >> > > > > > code
> >> > > > > > >> > > obtained from 1.2.0.rc2 tag:
> >> > > > > > >> > > https://mxnet.incubator.apache.org/install/index.html
> >> > > > > > >> > >
> >> > > > > > >> > > (Note: The README.md points to the
1.2.0 tag and does
> not
> >> > work
> >> > > > at
> >> > > > > > the
> >> > > > > > >> > > moment.)
> >> > > > > > >> > >
> >> > > > > > >> > > Please remember to test first before
voting
> accordingly:
> >> > > > > > >> > >
> >> > > > > > >> > > +1 = approve
> >> > > > > > >> > > +0 = no opinion
> >> > > > > > >> > > -1 = disapprove (provide reason)
> >> > > > > > >> > >
> >> > > > > > >> > > Anirudh
> >> > > > > > >> > >
> >> > > > > > >> >
> >> > > > > > >>
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message