mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Markham <aaron.s.mark...@gmail.com>
Subject Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0
Date Tue, 11 Jun 2019 22:05:47 GMT
-1
There's an autogenerated file that doesn't get cleaned up in the
scala-package folder when you run make clean. This causes the scaladoc
step to fail. I'm putting in workaround messaging in the error message
and that'll go into master, but if anyone wants to specifically run
the scaladocs for 1.5.x, they're going to have a hard time. The
current error messaging is not helpful at all. You can get around it
by cloning fresh, which means no previously created files are in
there, but this isn't ideal for someone that has already been using
the repo and has scripts and other utilities all dialed in.
Zack's already working on a fix for this issue. If we're putting out
another RC anyway, then I'd vote to cherrypick Zack's fix so that docs
building works well.

Cheers,
Aaron

On Tue, Jun 11, 2019 at 2:31 PM Lai Wei <royweilai@gmail.com> wrote:
>
> Hi guys,
>
> Thanks for the updates. Currently, we are able to confirm Lin's issue with
> Horovod, and there is a fix pending. [1]
> Will update later today to see if we need to cancel this vote for the fix.
>
> As for the hybridize with static alloc performance regression. IMO it does
> not need to be a blocker if we have the following speed order.
> 1.5.0 w/o static > 1.5.0 w/ static  > 1.4.1 w/ static > 1.4.1 w/o static
> and it will be great to know the following to better make a decision on
> whether this should block the release.
> 1) if this is a model specific or a general regression.
> 2) if this is platform specific or general (w/ or w/o CUDA, w/ or w/o
> MKLDNN)
>
>
> [1]https://github.com/apache/incubator-mxnet/pull/15213
>
>
> Thanks
>
> Best Regards
>
> Lai
>
>
> On Tue, Jun 11, 2019 at 1:46 PM Zhi Zhang <zhreshold@apache.org> wrote:
>
> >
> >
> > On 2019/06/11 18:53:56, Pedro Larroy <pedro.larroy.lists@gmail.com>
> > wrote:
> > > The stack trace doesn't seem to come from MXNet, do you have more info?
> > >
> > > On Tue, Jun 11, 2019 at 11:46 AM Zhi Zhang <zhreshold@apache.org> wrote:
> > > >
> > > >
> > > >
> > > > On 2019/06/11 17:36:09, Pedro Larroy <pedro.larroy.lists@gmail.com>
> > wrote:
> > > > > A bit more background into this:
> > > > >
> > > > > While tuning a model using LSTM and convolutions we find that using
> > > > > hybridize with static_alloc and static_shape is 15% slower in the
> > > > > latest revision vs in version 1.4.1 in which using hybridize with
> > > > > static_alloc and static_shape is 10% faster than without.
> > > > >
> > > > > Overwall we are still 33% faster when comparing master to 1.5.
> > > > >
> > > > > Let me know if you think this is a release blocker or not.
> > > > >
> > > > > Pedro.
> > > > >
> > > > > On Mon, Jun 10, 2019 at 4:51 PM Pedro Larroy
> > > > > <pedro.larroy.lists@gmail.com> wrote:
> > > > > >
> > > > > > -1
> > > > > >
> > > > > > We found a performance regression vs 1.4 related to CachedOp
which
> > > > > > affects Hybrid forward, which we are looking into.
> > > > > >
> > > > > > Pedro.
> > > > > >
> > > > > > On Mon, Jun 10, 2019 at 4:33 PM Lin Yuan <apeforest@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > -1 (Tentatively until resolved)
> > > > > > >
> > > > > > > I tried to build MXNet 1.5.0 from source and pip install
horovod
> > but got
> > > > > > > the following error:
> > > > > > >
> > > > > > > Reproduce:
> > > > > > > 1) cp make/config.mk .
> > > > > > > 2) turn on USE_CUDA, USE_CUDNN, USE_NCCL
> > > > > > > 3) make -j
> > > > > > >
> > > > > > > MXNet can build successfully.
> > > > > > >
> > > > > > > 4) pip install horovod
> > > > > > >
> > > > > > >
> > > > > > >
> > /home/ubuntu/src/incubator-mxnet/python/mxnet/../../include/mkldnn/mkldnn.h:55:28:
> > > > > > > fatal error: mkldnn_version.h: No such file or directory
> > > > > > >     compilation terminated.
> > > > > > >     INFO: Unable to build MXNet plugin, will skip it.
> > > > > > >
> > > > > > > I did not change any setting of MKLDNN in my config.mk.
I am
> > building on
> > > > > > > DLAMI base 18.0 which is Ubuntu 16.04 and CUDA 10.0
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Lin
> > > > > > >
> > > > > > >
> > > > > > > On Sat, Jun 8, 2019 at 5:39 PM shiwen hu <yajiedesign@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > Lai Wei <royweilai@gmail.com> 于2019年6月9日周日
上午4:12写道:
> > > > > > > >
> > > > > > > > > Dear MXNet community,
> > > > > > > > >
> > > > > > > > > This is the 3-day vote to release Apache MXNet
(incubating)
> > version
> > > > > > > > 1.5.0.
> > > > > > > > > Voting on dev@ will start June 8, 23:59:59(PST)
 and close
> > on June 11,
> > > > > > > > > 23:59:59.
> > > > > > > > >
> > > > > > > > > 1) Link to release notes:
> > > > > > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes
> > > > > > > > >
> > > > > > > > > 2) Link to release candidate:
> > > > > > > > >
> > > > > > > > >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc0
> > > > > > > > >
> > > > > > > > > 3) Link to source and signatures on apache dist
server:
> > > > > > > > >
> > > > > > > > >
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc0/
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > > > > +1 = approve
> > > > > > > > > +0 = no opinion
> > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best Regards
> > > > > > > > >
> > > > > > > > > Lai
> > > > > > > > >
> > > > > > > >
> > > > >
> > > >
> > > > -1. Built from source, import mxnet in python cause Segfault.
> > > >
> > > > back trace:
> > > >
> > > > Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
> > > > 0x00007fff3e8a9f20 in ?? ()
> > > > (gdb) bt
> > > > #0  0x00007fff3e8a9f20 in ?? ()
> > > > #1  0x00007fffebbf440c in ReadConfigFile(Configuration&,
> > > > std::__cxx11::basic_string<char, std::char_traits<char>,
> > > > std::allocator<char> > const&, bool const&, unsigned
int const&) ()
> > from
> > > > /usr/lib/x86_64-linux-gnu/libapt-pkg.so.5.0
> > > > #2  0x00007fffebbf3d97 in ReadConfigDir(Configuration&,
> > > > std::__cxx11::basic_string<char, std::char_traits<char>,
> > > > std::allocator<char> > const&, bool const&, unsigned
int const&) ()
> > from
> > > > /usr/lib/x86_64-linux-gnu/libapt-pkg.so.5.0
> > > > #3  0x00007fffebc5e9aa in pkgInitConfig(Configuration&) () from
> > > > /usr/lib/x86_64-linux-gnu/libapt-pkg.so.5.0
> > > > #4  0x00007ffff29d5c48 in ?? () from /usr/lib/python3/dist-packages/
> > > > apt_pkg.cpython-35m-x86_64-linux-gnu.so
> > > > #5  0x00000000004ea10f in PyCFunction_Call ()
> > > > #6  0x0000000000536d94 in PyEval_EvalFrameEx ()
> > > > #7  0x000000000053fc97 in ?? ()
> > > > #8  0x00000000005409bf in PyEval_EvalCode ()
> > > > #9  0x000000000054a328 in ?? ()
> > > > #10 0x00000000004ea1c6 in PyCFunction_Call ()
> > > > #11 0x000000000053d353 in PyEval_EvalFrameEx ()
> > > > #12 0x000000000053fc97 in ?? ()
> > > > #13 0x000000000053bc93 in PyEval_EvalFrameEx ()
> > > > #14 0x000000000053b294 in PyEval_EvalFrameEx ()
> > > > #15 0x000000000053b294 in PyEval_EvalFrameEx ()
> > > > #16 0x000000000053b294 in PyEval_EvalFrameEx ()
> > > > #17 0x0000000000540b0b in PyEval_EvalCodeEx ()
> > > > #18 0x00000000004ec2e3 in ?? ()
> > > > #19 0x00000000005c20e7 in PyObject_Call ()
> > > >
> > > > I was using fresh DLAMI ubuntu 18.0 and CUDA 10.0, built with
> > USE_CUDA=1,
> > > > USE_CUDNN=1, the rest are default values.
> > > >
> > > > -Zhi
> > >
> >
> > Change to +1, I figured out that it was due to the dependencies. I still
> > have issue using DL base AMI with python3, but I will not regard it as a
> > blocker to 1.5 release.
> > Tested Gluon-CV training and works fine.
> >
> > -Zhi
> >

Mime
View raw message