mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco de Abreu <marco.g.ab...@googlemail.com>
Subject Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2
Date Tue, 08 May 2018 09:59:10 GMT
Small update regarding the ARM64 builds. I have created two pull requests
[1][2] which changed the repository to a mirror I created. This mirror was
created using a cached version of the working Docker image, effectively
reverting the state back to a working one. At the same time, this pins the
container to prevent any further problems.

I would prefer to use the public repository instead of our own mirror, but
for now, this is inevitable. If anybody would like to be added to the
Dockerhub organization "mxnetci", feel free to let me know! To prevent
problems like these in future, I created a feature request at [3] to ensure
future releases of that Dockerimage are properly tagged. Additionally, the
creator of the failing PR is aware and actively involved in creating a
permanent solution [4].

Best regards,
Marco

[1]: https://github.com/apache/incubator-mxnet/pull/10850
[2]: https://github.com/apache/incubator-mxnet/pull/10849
[3]: https://github.com/dockcross/dockcross/issues/223
[4]: https://github.com/dockcross/dockcross/pull/221

On Tue, May 8, 2018 at 2:39 AM, Lai Wei <royweilai@gmail.com> wrote:

> Hi Anirudh,
>
> Update: Did an install on a fresh instance with USE_MKLDNN=1, works fine
> now. Pip install with --pre is also working fine.
> Problem is the mkl-dnn I installed on the old instance.
> Closing the issue <https://github.com/awslabs/keras-apache-mxnet/issues/75
> >.
>
> Thanks!
>
> Best Regards
>
> Lai Wei
>
> https://www.linkedin.com/pub/lai-wei/2b/731/52b
>
> On Mon, May 7, 2018 at 2:48 PM, Lai Wei <royweilai@gmail.com> wrote:
>
> > Hi Anirudh,
> >
> > yes, also tried that,  didn't resolve. Looking into root cause and will
> > update.
> >
> > Best Regards
> >
> > Lai Wei
> >
> > https://www.linkedin.com/pub/lai-wei/2b/731/52b
> >
> > On Mon, May 7, 2018 at 2:15 PM, Anirudh <anirudh2290@gmail.com> wrote:
> >
> >> Hi Lai,
> >>
> >> I see that you used USE_MKL2017_EXPERIMENTAL=1, I am not sure if this is
> >> the right flag. Did you try USE_MKLDNN=1 ?
> >>
> >> Anirudh
> >>
> >> On Mon, May 7, 2018 at 1:22 PM, Lai Wei <royweilai@gmail.com> wrote:
> >>
> >> > Hi,
> >> >
> >> > I would like to raise an issue with mxnet-mkl. The keras-mxnet package
> >> was
> >> > working fine with mxnet-mkl 1.1.0 for training on CPU. However,
> weights
> >> are
> >> > not updated when I use mxnet-mkl 1.2.0b20180507. I tried both 'pip
> >> install
> >> > mxnet-mkl --pre' and built from source from release branch (v1.2.0)
> with
> >> > mkl flag.
> >> >
> >> > Please refer to this issue for more details:
> >> > https://github.com/awslabs/keras-apache-mxnet/issues/75
> >> >
> >> > There is no code change on keras-mxnet side, so I guess some API broke
> >> when
> >> > using latest mxnet-mkl. Still working on finding the root cause.
> >> >
> >> > Thanks
> >> >
> >> >
> >> > Best Regards
> >> >
> >> > Lai Wei
> >> >
> >> > https://www.linkedin.com/pub/lai-wei/2b/731/52b
> >> >
> >> > On Mon, May 7, 2018 at 10:38 AM, Haibin Lin <haibin.lin.aws@gmail.com
> >
> >> > wrote:
> >> >
> >> > > +1 binding. Build from source with CUDA, ran linear classification
> >> > example
> >> > > and works fine.
> >> > >
> >> > > Best.
> >> > > Haibin
> >> > >
> >> > >
> >> > > On Sun, May 6, 2018 at 10:08 PM, Steffen Rochel <
> >> steffenrochel@gmail.com
> >> > >
> >> > > wrote:
> >> > >
> >> > > > +1 (non-binding). Tested with selected notebooks from The Straight
> >> > Dope.
> >> > > > So many important enhancements everybody contributed and our
users
> >> are
> >> > > > waiting for. Hope we will see more votes.
> >> > > > Steffen
> >> > > > On Mon, May 7, 2018 at 1:07 AM Anirudh <anirudh2290@gmail.com>
> >> wrote:
> >> > > >
> >> > > > > Hi all,
> >> > > > >
> >> > > > > Since we don't have enough binding votes yet, I am extending
the
> >> vote
> >> > > > till
> >> > > > > tomorrow (Monday May 7th), 12:50 PM PDT.
> >> > > > >
> >> > > > > Anirudh
> >> > > > >
> >> > > > > On Sun, May 6, 2018 at 4:05 PM, Anirudh <anirudh2290@gmail.com>
> >> > wrote:
> >> > > > >
> >> > > > > > Hi Pedro,
> >> > > > > >
> >> > > > > > Thanks for the clarification. I was able to reproduce
the
> issue
> >> > with
> >> > > > > > USE_OPENMP=OFF. I wasn't able to reproduce the issue
with
> Make.
> >> > Since
> >> > > > the
> >> > > > > > issue is not reproducible with make and the customers
using
> >> > > > > USE_OPENMP=OFF
> >> > > > > > with cmake should be small, I agree with you that this
should
> >> not
> >> > be
> >> > > a
> >> > > > > > blocker. I have added the issue to known issues in
release
> >> notes:
> >> > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.2.
> >> 0.rc2
> >> > > > > >
> >> > > > > > Anirudh
> >> > > > > >
> >> > > > > > On Sun, May 6, 2018 at 9:03 AM, Pedro Larroy <
> >> > > > > pedro.larroy.lists@gmail.com
> >> > > > > > > wrote:
> >> > > > > >
> >> > > > > >> Agreed, I was not aware that the problems where
not present
> in
> >> the
> >> > > > > release
> >> > > > > >> branch.
> >> > > > > >>
> >> > > > > >> On Fri, May 4, 2018 at 8:32 PM, Haibin Lin <
> >> > > haibin.lin.aws@gmail.com>
> >> > > > > >> wrote:
> >> > > > > >>
> >> > > > > >> > I agree with Anirudh that the focus of the
discussion
> should
> >> be
> >> > > > > limited
> >> > > > > >> to
> >> > > > > >> > the release branch, not the master branch.
Anything that
> >> breaks
> >> > on
> >> > > > > >> master
> >> > > > > >> > but works on release branch should not block
the release
> >> itself.
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> > Best,
> >> > > > > >> >
> >> > > > > >> > Haibin
> >> > > > > >> >
> >> > > > > >> > On Fri, May 4, 2018 at 10:58 AM, Pedro Larroy
<
> >> > > > > >> > pedro.larroy.lists@gmail.com>
> >> > > > > >> > wrote:
> >> > > > > >> >
> >> > > > > >> > > I see your point.
> >> > > > > >> > >
> >> > > > > >> > > I checked the failures on the v1.2.0
branch and I don't
> see
> >> > > > > segfaults,
> >> > > > > >> > just
> >> > > > > >> > > minor failures due to flaky tests.
> >> > > > > >> > >
> >> > > > > >> > > I will trigger it repeatedly a few times
until Sunday to
> >> have
> >> > a
> >> > > > and
> >> > > > > >> > change
> >> > > > > >> > > my vote accordingly.
> >> > > > > >> > >
> >> > > > > >> > >
> >> > > > > http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-
> >> > mxnet/job/v1.2.0/
> >> > > > > >> > > http://jenkins.mxnet-ci.amazon-ml.com/blue/
> >> > > organizations/jenkins/
> >> > > > > >> > > incubator-mxnet/detail/v1.2.0/17/pipeline
> >> > > > > >> > > http://jenkins.mxnet-ci.amazon-ml.com/blue/
> >> > > organizations/jenkins/
> >> > > > > >> > > incubator-mxnet/detail/v1.2.0/15/pipeline/
> >> > > > > >> > >
> >> > > > > >> > >
> >> > > > > >> > > Pedro.
> >> > > > > >> > >
> >> > > > > >> > > On Fri, May 4, 2018 at 7:16 PM, Anirudh
<
> >> > anirudh2290@gmail.com>
> >> > > > > >> wrote:
> >> > > > > >> > >
> >> > > > > >> > > > Hi Pedro,
> >> > > > > >> > > >
> >> > > > > >> > > > Thank you for the suggestions. I
will try to reproduce
> >> this
> >> > > > > without
> >> > > > > >> > fixed
> >> > > > > >> > > > seeds and also run it for a longer
time duration.
> >> > > > > >> > > > Having said that, running unit tests
over and over for
> a
> >> > > couple
> >> > > > of
> >> > > > > >> days
> >> > > > > >> > > > will likely cause
> >> > > > > >> > > > problems  because there around 42
open issues for flaky
> >> > tests:
> >> > > > > >> > > > https://github.com/apache/incubator-mxnet/issues?q=is%
> >> > > > > >> > > > 3Aopen+is%3Aissue+label%3AFlaky
> >> > > > > >> > > > Also, the release branch has diverged
from master
> around
> >> 3
> >> > > weeks
> >> > > > > >> back
> >> > > > > >> > and
> >> > > > > >> > > > it doesn't have many of the changes
merged to the
> master.
> >> > > > > >> > > > So, my question essentially is,
what will be your
> >> benchmark
> >> > to
> >> > > > > >> accept
> >> > > > > >> > the
> >> > > > > >> > > > release ?
> >> > > > > >> > > > Is it that we run the test which
you provided on 1.2
> >> without
> >> > > > fixed
> >> > > > > >> > seeds
> >> > > > > >> > > > and for a longer duration without
failures ?
> >> > > > > >> > > > Or is it that all unit tests should
pass over a period
> >> of 2
> >> > > days
> >> > > > > >> > without
> >> > > > > >> > > > issues. This may require fixing
all of the flaky tests
> >> which
> >> > > > would
> >> > > > > >> > delay
> >> > > > > >> > > > the release by considerable amount
of time.
> >> > > > > >> > > > Or is it something else ?
> >> > > > > >> > > >
> >> > > > > >> > > > Anirudh
> >> > > > > >> > > >
> >> > > > > >> > > >
> >> > > > > >> > > > On Fri, May 4, 2018 at 4:49 AM,
Pedro Larroy <
> >> > > > > >> > > pedro.larroy.lists@gmail.com
> >> > > > > >> > > > >
> >> > > > > >> > > > wrote:
> >> > > > > >> > > >
> >> > > > > >> > > > > Could you remove the fixed
seeds and run it for a
> >> couple
> >> > of
> >> > > > > hours
> >> > > > > >> > with
> >> > > > > >> > > an
> >> > > > > >> > > > > additional loop?  Also I would
suggest running the
> unit
> >> > > tests
> >> > > > > over
> >> > > > > >> > and
> >> > > > > >> > > > over
> >> > > > > >> > > > > for a couple of days if possible.
> >> > > > > >> > > > >
> >> > > > > >> > > > >
> >> > > > > >> > > > > Pedro.
> >> > > > > >> > > > >
> >> > > > > >> > > > > On Thu, May 3, 2018 at 8:33
PM, Anirudh <
> >> > > > anirudh2290@gmail.com>
> >> > > > > >> > wrote:
> >> > > > > >> > > > >
> >> > > > > >> > > > > > Hi Pedro and Naveen,
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > I am unable to reproduce
this issue with MKLDNN on
> >> the
> >> > > > master
> >> > > > > >> but
> >> > > > > >> > not
> >> > > > > >> > > > on
> >> > > > > >> > > > > > the 1.2.RC2 branch.
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > Did the following on 1.2.RC2
branch:
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > make -j $(nproc) USE_OPENCV=1
USE_BLAS=openblas
> >> > > > > >> USE_DIST_KVSTORE=0
> >> > > > > >> > > > > > USE_CUDA=0 USE_CUDNN=0
USE_MKLDNN=1
> >> > > > > >> > > > > > export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
> >> > > > > >> > > > > > export MXNET_TEST_SEED=11
> >> > > > > >> > > > > > export MXNET_MODULE_SEED=812478194
> >> > > > > >> > > > > > export MXNET_TEST_COUNT=10000
> >> > > > > >> > > > > > nosetests-2.7 -v tests/python/unittest/test_
> >> > > > > >> > > > > module.py:test_forward_reshape
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > Was able to do the 10k
runs successfully.
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > Anirudh
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > On Thu, May 3, 2018 at
8:46 AM, Anirudh <
> >> > > > > anirudh2290@gmail.com>
> >> > > > > >> > > wrote:
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > > Hi Pedro and Naveen,
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > > Is this issue reproducible
when MXNet is built
> with
> >> > > > > >> USE_MKLDNN=0?
> >> > > > > >> > > > > > > Also, there are a
bunch of MKLDNN fixes that
> >> didn't go
> >> > > > into
> >> > > > > >> the
> >> > > > > >> > > > release
> >> > > > > >> > > > > > > branch. Is this issue
reproducible on the release
> >> > > branch ?
> >> > > > > >> > > > > > > In my opinion, since
we have marked MKLDNN as
> >> > > experimental
> >> > > > > >> > feature
> >> > > > > >> > > > for
> >> > > > > >> > > > > > the
> >> > > > > >> > > > > > > release, if it is
confirmed to be a MKLDNN issue
> >> > > > > >> > > > > > > we don't need to
block the release on it.
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > > Anirudh
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > > On Thu, May 3, 2018
at 6:58 AM, Naveen Swamy <
> >> > > > > >> mnnaveen@gmail.com
> >> > > > > >> > >
> >> > > > > >> > > > > wrote:
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > >> Thanks for raising
this issue Pedro.
> >> > > > > >> > > > > > >>
> >> > > > > >> > > > > > >> -1(binding)
> >> > > > > >> > > > > > >>
> >> > > > > >> > > > > > >> We were in a
similar state for a while a year
> >> ago, a
> >> > > lot
> >> > > > of
> >> > > > > >> > effort
> >> > > > > >> > > > > went
> >> > > > > >> > > > > > to
> >> > > > > >> > > > > > >> stabilize the
tests and the CI. I have seen the
> PR
> >> > > builds
> >> > > > > are
> >> > > > > >> > > > > > >> non-deterministic
and you have to retry over and
> >> over
> >> > > > > >> (wasting
> >> > > > > >> > > > > resources
> >> > > > > >> > > > > > >> and time) and
hope you get lucky.
> >> > > > > >> > > > > > >>
> >> > > > > >> > > > > > >> Look at the dashboard
for master build
> >> > > > > >> > > > > > >> http://jenkins.mxnet-ci.amazon
> >> -ml.com/job/incubator-
> >> > > > > >> > > > mxnet/job/master/
> >> > > > > >> > > > > > >>
> >> > > > > >> > > > > > >> -Naveen
> >> > > > > >> > > > > > >>
> >> > > > > >> > > > > > >> On Thu, May 3,
2018 at 5:11 AM, Pedro Larroy <
> >> > > > > >> > > > > > >> pedro.larroy.lists@gmail.com>
> >> > > > > >> > > > > > >> wrote:
> >> > > > > >> > > > > > >>
> >> > > > > >> > > > > > >> > -1  nondeterminisitc
failures on CI master:
> >> > > > > >> > > > > > >> > https://issues.apache.org/
> jira/browse/MXNET-396
> >> > > > > >> > > > > > >> >
> >> > > > > >> > > > > > >> > Was able
to reproduce once in a fresh p3
> >> instance
> >> > > with
> >> > > > > >> DLAMI
> >> > > > > >> > > > can't
> >> > > > > >> > > > > > >> > reproduce
consistently.
> >> > > > > >> > > > > > >> >
> >> > > > > >> > > > > > >> > On Wed,
May 2, 2018 at 9:51 PM, Anirudh <
> >> > > > > >> > anirudh2290@gmail.com>
> >> > > > > >> > > > > > wrote:
> >> > > > > >> > > > > > >> >
> >> > > > > >> > > > > > >> > > Hi
all,
> >> > > > > >> > > > > > >> > >
> >> > > > > >> > > > > > >> > > As
part of RC2 release, we have addressed
> bugs
> >> > and
> >> > > > some
> >> > > > > >> > > concerns
> >> > > > > >> > > > > > that
> >> > > > > >> > > > > > >> > were
> >> > > > > >> > > > > > >> > > raised.
> >> > > > > >> > > > > > >> > >
> >> > > > > >> > > > > > >> > > I would
like to propose a vote to release
> >> Apache
> >> > > > MXNet
> >> > > > > >> > > > > (incubating)
> >> > > > > >> > > > > > >> > version
> >> > > > > >> > > > > > >> > > 1.2.0.RC2.
Voting will start now (Wednesday,
> >> May
> >> > > 2nd)
> >> > > > > and
> >> > > > > >> > end
> >> > > > > >> > > at
> >> > > > > >> > > > > > >> 12:50 PM
> >> > > > > >> > > > > > >> > > PDT,
Sunday, May 6th.
> >> > > > > >> > > > > > >> > >
> >> > > > > >> > > > > > >> > > Link
to release notes:
> >> > > > > >> > > > > > >> > > https://cwiki.apache.org/
> >> > confluence/display/MXNET/
> >> > > > > >> > > > > > >> > > Apache+MXNet+%28incubating%29+
> >> > 1.2.0+Release+Notes
> >> > > > > >> > > > > > >> > >
> >> > > > > >> > > > > > >> > > Link
to release candidate 1.2.0.rc2:
> >> > > > > >> > > > > > >> > >
> >> > > > > https://github.com/apache/incubator-mxnet/releases/tag/
> >> > > > > >> > > > 1.2.0.rc2
> >> > > > > >> > > > > > >> > >
> >> > > > > >> > > > > > >> > > Voting
results for 1.2.0.rc2:
> >> > > > > >> > > > > > >> > > https://lists.apache.org/thread.html/
> >> > > > > >> > > > > ebe561c609a8e32351dfe4aafc8876
> >> > > > > >> > > > > > >> > > 199560336472726b58c3455e85@%3C
> >> > dev.mxnet.apache.org
> >> > > > %3E
> >> > > > > >> > > > > > >> > >
> >> > > > > >> > > > > > >> > > View
this page, click on "Build from
> Source",
> >> and
> >> > > use
> >> > > > > the
> >> > > > > >> > > source
> >> > > > > >> > > > > > code
> >> > > > > >> > > > > > >> > > obtained
from 1.2.0.rc2 tag:
> >> > > > > >> > > > > > >> > > https://mxnet.incubator.
> >> > > > apache.org/install/index.html
> >> > > > > >> > > > > > >> > >
> >> > > > > >> > > > > > >> > > (Note:
The README.md points to the 1.2.0 tag
> >> and
> >> > > does
> >> > > > > not
> >> > > > > >> > work
> >> > > > > >> > > > at
> >> > > > > >> > > > > > the
> >> > > > > >> > > > > > >> > > moment.)
> >> > > > > >> > > > > > >> > >
> >> > > > > >> > > > > > >> > > Please
remember to test first before voting
> >> > > > > accordingly:
> >> > > > > >> > > > > > >> > >
> >> > > > > >> > > > > > >> > > +1
= approve
> >> > > > > >> > > > > > >> > > +0
= no opinion
> >> > > > > >> > > > > > >> > > -1
= disapprove (provide reason)
> >> > > > > >> > > > > > >> > >
> >> > > > > >> > > > > > >> > > Anirudh
> >> > > > > >> > > > > > >> > >
> >> > > > > >> > > > > > >> >
> >> > > > > >> > > > > > >>
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > >
> >> > > > > >> > > > >
> >> > > > > >> > > >
> >> > > > > >> > >
> >> > > > > >> >
> >> > > > > >>
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message