mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Larroy <pedro.larroy.li...@gmail.com>
Subject Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2
Date Fri, 04 May 2018 17:58:11 GMT
I see your point.

I checked the failures on the v1.2.0 branch and I don't see segfaults, just
minor failures due to flaky tests.

I will trigger it repeatedly a few times until Sunday to have a and change
my vote accordingly.

http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.2.0/
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.2.0/17/pipeline
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.2.0/15/pipeline/


Pedro.

On Fri, May 4, 2018 at 7:16 PM, Anirudh <anirudh2290@gmail.com> wrote:

> Hi Pedro,
>
> Thank you for the suggestions. I will try to reproduce this without fixed
> seeds and also run it for a longer time duration.
> Having said that, running unit tests over and over for a couple of days
> will likely cause
> problems  because there around 42 open issues for flaky tests:
> https://github.com/apache/incubator-mxnet/issues?q=is%
> 3Aopen+is%3Aissue+label%3AFlaky
> Also, the release branch has diverged from master around 3 weeks back and
> it doesn't have many of the changes merged to the master.
> So, my question essentially is, what will be your benchmark to accept the
> release ?
> Is it that we run the test which you provided on 1.2 without fixed seeds
> and for a longer duration without failures ?
> Or is it that all unit tests should pass over a period of 2 days without
> issues. This may require fixing all of the flaky tests which would delay
> the release by considerable amount of time.
> Or is it something else ?
>
> Anirudh
>
>
> On Fri, May 4, 2018 at 4:49 AM, Pedro Larroy <pedro.larroy.lists@gmail.com
> >
> wrote:
>
> > Could you remove the fixed seeds and run it for a couple of hours with an
> > additional loop?  Also I would suggest running the unit tests over and
> over
> > for a couple of days if possible.
> >
> >
> > Pedro.
> >
> > On Thu, May 3, 2018 at 8:33 PM, Anirudh <anirudh2290@gmail.com> wrote:
> >
> > > Hi Pedro and Naveen,
> > >
> > > I am unable to reproduce this issue with MKLDNN on the master but not
> on
> > > the 1.2.RC2 branch.
> > >
> > > Did the following on 1.2.RC2 branch:
> > >
> > > make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_DIST_KVSTORE=0
> > > USE_CUDA=0 USE_CUDNN=0 USE_MKLDNN=1
> > > export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
> > > export MXNET_TEST_SEED=11
> > > export MXNET_MODULE_SEED=812478194
> > > export MXNET_TEST_COUNT=10000
> > > nosetests-2.7 -v tests/python/unittest/test_
> > module.py:test_forward_reshape
> > >
> > > Was able to do the 10k runs successfully.
> > >
> > > Anirudh
> > >
> > > On Thu, May 3, 2018 at 8:46 AM, Anirudh <anirudh2290@gmail.com> wrote:
> > >
> > > > Hi Pedro and Naveen,
> > > >
> > > > Is this issue reproducible when MXNet is built with USE_MKLDNN=0?
> > > > Also, there are a bunch of MKLDNN fixes that didn't go into the
> release
> > > > branch. Is this issue reproducible on the release branch ?
> > > > In my opinion, since we have marked MKLDNN as experimental feature
> for
> > > the
> > > > release, if it is confirmed to be a MKLDNN issue
> > > > we don't need to block the release on it.
> > > >
> > > > Anirudh
> > > >
> > > > On Thu, May 3, 2018 at 6:58 AM, Naveen Swamy <mnnaveen@gmail.com>
> > wrote:
> > > >
> > > >> Thanks for raising this issue Pedro.
> > > >>
> > > >> -1(binding)
> > > >>
> > > >> We were in a similar state for a while a year ago, a lot of effort
> > went
> > > to
> > > >> stabilize the tests and the CI. I have seen the PR builds are
> > > >> non-deterministic and you have to retry over and over (wasting
> > resources
> > > >> and time) and hope you get lucky.
> > > >>
> > > >> Look at the dashboard for master build
> > > >> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-
> mxnet/job/master/
> > > >>
> > > >> -Naveen
> > > >>
> > > >> On Thu, May 3, 2018 at 5:11 AM, Pedro Larroy <
> > > >> pedro.larroy.lists@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > -1  nondeterminisitc failures on CI master:
> > > >> > https://issues.apache.org/jira/browse/MXNET-396
> > > >> >
> > > >> > Was able to reproduce once in a fresh p3 instance with DLAMI
> can't
> > > >> > reproduce consistently.
> > > >> >
> > > >> > On Wed, May 2, 2018 at 9:51 PM, Anirudh <anirudh2290@gmail.com>
> > > wrote:
> > > >> >
> > > >> > > Hi all,
> > > >> > >
> > > >> > > As part of RC2 release, we have addressed bugs and some
concerns
> > > that
> > > >> > were
> > > >> > > raised.
> > > >> > >
> > > >> > > I would like to propose a vote to release Apache MXNet
> > (incubating)
> > > >> > version
> > > >> > > 1.2.0.RC2. Voting will start now (Wednesday, May 2nd) and
end at
> > > >> 12:50 PM
> > > >> > > PDT, Sunday, May 6th.
> > > >> > >
> > > >> > > Link to release notes:
> > > >> > > https://cwiki.apache.org/confluence/display/MXNET/
> > > >> > > Apache+MXNet+%28incubating%29+1.2.0+Release+Notes
> > > >> > >
> > > >> > > Link to release candidate 1.2.0.rc2:
> > > >> > > https://github.com/apache/incubator-mxnet/releases/tag/
> 1.2.0.rc2
> > > >> > >
> > > >> > > Voting results for 1.2.0.rc2:
> > > >> > > https://lists.apache.org/thread.html/
> > ebe561c609a8e32351dfe4aafc8876
> > > >> > > 199560336472726b58c3455e85@%3Cdev.mxnet.apache.org%3E
> > > >> > >
> > > >> > > View this page, click on "Build from Source", and use the
source
> > > code
> > > >> > > obtained from 1.2.0.rc2 tag:
> > > >> > > https://mxnet.incubator.apache.org/install/index.html
> > > >> > >
> > > >> > > (Note: The README.md points to the 1.2.0 tag and does not
work
> at
> > > the
> > > >> > > moment.)
> > > >> > >
> > > >> > > Please remember to test first before voting accordingly:
> > > >> > >
> > > >> > > +1 = approve
> > > >> > > +0 = no opinion
> > > >> > > -1 = disapprove (provide reason)
> > > >> > >
> > > >> > > Anirudh
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message