mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anirudh <anirudh2...@gmail.com>
Subject Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2
Date Sun, 06 May 2018 23:07:41 GMT
Hi all,

Since we don't have enough binding votes yet, I am extending the vote till
tomorrow (Monday May 7th), 12:50 PM PDT.

Anirudh

On Sun, May 6, 2018 at 4:05 PM, Anirudh <anirudh2290@gmail.com> wrote:

> Hi Pedro,
>
> Thanks for the clarification. I was able to reproduce the issue with
> USE_OPENMP=OFF. I wasn't able to reproduce the issue with Make. Since the
> issue is not reproducible with make and the customers using USE_OPENMP=OFF
> with cmake should be small, I agree with you that this should not be a
> blocker. I have added the issue to known issues in release notes:
> https://github.com/apache/incubator-mxnet/releases/tag/1.2.0.rc2
>
> Anirudh
>
> On Sun, May 6, 2018 at 9:03 AM, Pedro Larroy <pedro.larroy.lists@gmail.com
> > wrote:
>
>> Agreed, I was not aware that the problems where not present in the release
>> branch.
>>
>> On Fri, May 4, 2018 at 8:32 PM, Haibin Lin <haibin.lin.aws@gmail.com>
>> wrote:
>>
>> > I agree with Anirudh that the focus of the discussion should be limited
>> to
>> > the release branch, not the master branch. Anything that breaks on
>> master
>> > but works on release branch should not block the release itself.
>> >
>> >
>> > Best,
>> >
>> > Haibin
>> >
>> > On Fri, May 4, 2018 at 10:58 AM, Pedro Larroy <
>> > pedro.larroy.lists@gmail.com>
>> > wrote:
>> >
>> > > I see your point.
>> > >
>> > > I checked the failures on the v1.2.0 branch and I don't see segfaults,
>> > just
>> > > minor failures due to flaky tests.
>> > >
>> > > I will trigger it repeatedly a few times until Sunday to have a and
>> > change
>> > > my vote accordingly.
>> > >
>> > > http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.2.0/
>> > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
>> > > incubator-mxnet/detail/v1.2.0/17/pipeline
>> > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
>> > > incubator-mxnet/detail/v1.2.0/15/pipeline/
>> > >
>> > >
>> > > Pedro.
>> > >
>> > > On Fri, May 4, 2018 at 7:16 PM, Anirudh <anirudh2290@gmail.com>
>> wrote:
>> > >
>> > > > Hi Pedro,
>> > > >
>> > > > Thank you for the suggestions. I will try to reproduce this without
>> > fixed
>> > > > seeds and also run it for a longer time duration.
>> > > > Having said that, running unit tests over and over for a couple of
>> days
>> > > > will likely cause
>> > > > problems  because there around 42 open issues for flaky tests:
>> > > > https://github.com/apache/incubator-mxnet/issues?q=is%
>> > > > 3Aopen+is%3Aissue+label%3AFlaky
>> > > > Also, the release branch has diverged from master around 3 weeks
>> back
>> > and
>> > > > it doesn't have many of the changes merged to the master.
>> > > > So, my question essentially is, what will be your benchmark to
>> accept
>> > the
>> > > > release ?
>> > > > Is it that we run the test which you provided on 1.2 without fixed
>> > seeds
>> > > > and for a longer duration without failures ?
>> > > > Or is it that all unit tests should pass over a period of 2 days
>> > without
>> > > > issues. This may require fixing all of the flaky tests which would
>> > delay
>> > > > the release by considerable amount of time.
>> > > > Or is it something else ?
>> > > >
>> > > > Anirudh
>> > > >
>> > > >
>> > > > On Fri, May 4, 2018 at 4:49 AM, Pedro Larroy <
>> > > pedro.larroy.lists@gmail.com
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > Could you remove the fixed seeds and run it for a couple of hours
>> > with
>> > > an
>> > > > > additional loop?  Also I would suggest running the unit tests
over
>> > and
>> > > > over
>> > > > > for a couple of days if possible.
>> > > > >
>> > > > >
>> > > > > Pedro.
>> > > > >
>> > > > > On Thu, May 3, 2018 at 8:33 PM, Anirudh <anirudh2290@gmail.com>
>> > wrote:
>> > > > >
>> > > > > > Hi Pedro and Naveen,
>> > > > > >
>> > > > > > I am unable to reproduce this issue with MKLDNN on the master
>> but
>> > not
>> > > > on
>> > > > > > the 1.2.RC2 branch.
>> > > > > >
>> > > > > > Did the following on 1.2.RC2 branch:
>> > > > > >
>> > > > > > make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas
>> USE_DIST_KVSTORE=0
>> > > > > > USE_CUDA=0 USE_CUDNN=0 USE_MKLDNN=1
>> > > > > > export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
>> > > > > > export MXNET_TEST_SEED=11
>> > > > > > export MXNET_MODULE_SEED=812478194
>> > > > > > export MXNET_TEST_COUNT=10000
>> > > > > > nosetests-2.7 -v tests/python/unittest/test_
>> > > > > module.py:test_forward_reshape
>> > > > > >
>> > > > > > Was able to do the 10k runs successfully.
>> > > > > >
>> > > > > > Anirudh
>> > > > > >
>> > > > > > On Thu, May 3, 2018 at 8:46 AM, Anirudh <anirudh2290@gmail.com>
>> > > wrote:
>> > > > > >
>> > > > > > > Hi Pedro and Naveen,
>> > > > > > >
>> > > > > > > Is this issue reproducible when MXNet is built with
>> USE_MKLDNN=0?
>> > > > > > > Also, there are a bunch of MKLDNN fixes that didn't
go into
>> the
>> > > > release
>> > > > > > > branch. Is this issue reproducible on the release branch
?
>> > > > > > > In my opinion, since we have marked MKLDNN as experimental
>> > feature
>> > > > for
>> > > > > > the
>> > > > > > > release, if it is confirmed to be a MKLDNN issue
>> > > > > > > we don't need to block the release on it.
>> > > > > > >
>> > > > > > > Anirudh
>> > > > > > >
>> > > > > > > On Thu, May 3, 2018 at 6:58 AM, Naveen Swamy <
>> mnnaveen@gmail.com
>> > >
>> > > > > wrote:
>> > > > > > >
>> > > > > > >> Thanks for raising this issue Pedro.
>> > > > > > >>
>> > > > > > >> -1(binding)
>> > > > > > >>
>> > > > > > >> We were in a similar state for a while a year ago,
a lot of
>> > effort
>> > > > > went
>> > > > > > to
>> > > > > > >> stabilize the tests and the CI. I have seen the
PR builds are
>> > > > > > >> non-deterministic and you have to retry over and
over
>> (wasting
>> > > > > resources
>> > > > > > >> and time) and hope you get lucky.
>> > > > > > >>
>> > > > > > >> Look at the dashboard for master build
>> > > > > > >> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-
>> > > > mxnet/job/master/
>> > > > > > >>
>> > > > > > >> -Naveen
>> > > > > > >>
>> > > > > > >> On Thu, May 3, 2018 at 5:11 AM, Pedro Larroy <
>> > > > > > >> pedro.larroy.lists@gmail.com>
>> > > > > > >> wrote:
>> > > > > > >>
>> > > > > > >> > -1  nondeterminisitc failures on CI master:
>> > > > > > >> > https://issues.apache.org/jira/browse/MXNET-396
>> > > > > > >> >
>> > > > > > >> > Was able to reproduce once in a fresh p3 instance
with
>> DLAMI
>> > > > can't
>> > > > > > >> > reproduce consistently.
>> > > > > > >> >
>> > > > > > >> > On Wed, May 2, 2018 at 9:51 PM, Anirudh <
>> > anirudh2290@gmail.com>
>> > > > > > wrote:
>> > > > > > >> >
>> > > > > > >> > > Hi all,
>> > > > > > >> > >
>> > > > > > >> > > As part of RC2 release, we have addressed
bugs and some
>> > > concerns
>> > > > > > that
>> > > > > > >> > were
>> > > > > > >> > > raised.
>> > > > > > >> > >
>> > > > > > >> > > I would like to propose a vote to release
Apache MXNet
>> > > > > (incubating)
>> > > > > > >> > version
>> > > > > > >> > > 1.2.0.RC2. Voting will start now (Wednesday,
May 2nd) and
>> > end
>> > > at
>> > > > > > >> 12:50 PM
>> > > > > > >> > > PDT, Sunday, May 6th.
>> > > > > > >> > >
>> > > > > > >> > > Link to release notes:
>> > > > > > >> > > https://cwiki.apache.org/confluence/display/MXNET/
>> > > > > > >> > > Apache+MXNet+%28incubating%29+1.2.0+Release+Notes
>> > > > > > >> > >
>> > > > > > >> > > Link to release candidate 1.2.0.rc2:
>> > > > > > >> > > https://github.com/apache/incubator-mxnet/releases/tag/
>> > > > 1.2.0.rc2
>> > > > > > >> > >
>> > > > > > >> > > Voting results for 1.2.0.rc2:
>> > > > > > >> > > https://lists.apache.org/thread.html/
>> > > > > ebe561c609a8e32351dfe4aafc8876
>> > > > > > >> > > 199560336472726b58c3455e85@%3Cdev.mxnet.apache.org%3E
>> > > > > > >> > >
>> > > > > > >> > > View this page, click on "Build from
Source", and use the
>> > > source
>> > > > > > code
>> > > > > > >> > > obtained from 1.2.0.rc2 tag:
>> > > > > > >> > > https://mxnet.incubator.apache.org/install/index.html
>> > > > > > >> > >
>> > > > > > >> > > (Note: The README.md points to the 1.2.0
tag and does not
>> > work
>> > > > at
>> > > > > > the
>> > > > > > >> > > moment.)
>> > > > > > >> > >
>> > > > > > >> > > Please remember to test first before
voting accordingly:
>> > > > > > >> > >
>> > > > > > >> > > +1 = approve
>> > > > > > >> > > +0 = no opinion
>> > > > > > >> > > -1 = disapprove (provide reason)
>> > > > > > >> > >
>> > > > > > >> > > Anirudh
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message