From dev-return-6222-archive-asf-public=cust-asf.ponee.io@mxnet.incubator.apache.org Tue Jun 11 20:46:23 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 9A502180627 for ; Tue, 11 Jun 2019 22:46:23 +0200 (CEST) Received: (qmail 65217 invoked by uid 500); 11 Jun 2019 20:46:22 -0000 Mailing-List: contact dev-help@mxnet.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mxnet.incubator.apache.org Delivered-To: mailing list dev@mxnet.incubator.apache.org Received: (qmail 65206 invoked by uid 99); 11 Jun 2019 20:46:22 -0000 Received: from ui-eu-01.ponee.io (HELO localhost) (176.9.59.70) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Jun 2019 20:46:22 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 From: Zhi Zhang Message-ID: X-Mailer: LuaSocket 3.0-rc1 References: Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0 Date: Tue, 11 Jun 2019 20:46:16 -0000 x-ponymail-agent: PonyMail Composer/0.3 In-Reply-To: To: x-ponymail-sender: 702d1c1f846a27ac9021eb7c782443289d183e03 On 2019/06/11 18:53:56, Pedro Larroy wrote: > The stack trace doesn't seem to come from MXNet, do you have more info? > > On Tue, Jun 11, 2019 at 11:46 AM Zhi Zhang wrote: > > > > > > > > On 2019/06/11 17:36:09, Pedro Larroy wrote: > > > A bit more background into this: > > > > > > While tuning a model using LSTM and convolutions we find that using > > > hybridize with static_alloc and static_shape is 15% slower in the > > > latest revision vs in version 1.4.1 in which using hybridize with > > > static_alloc and static_shape is 10% faster than without. > > > > > > Overwall we are still 33% faster when comparing master to 1.5. > > > > > > Let me know if you think this is a release blocker or not. > > > > > > Pedro. > > > > > > On Mon, Jun 10, 2019 at 4:51 PM Pedro Larroy > > > wrote: > > > > > > > > -1 > > > > > > > > We found a performance regression vs 1.4 related to CachedOp which > > > > affects Hybrid forward, which we are looking into. > > > > > > > > Pedro. > > > > > > > > On Mon, Jun 10, 2019 at 4:33 PM Lin Yuan wrote: > > > > > > > > > > -1 (Tentatively until resolved) > > > > > > > > > > I tried to build MXNet 1.5.0 from source and pip install horovod but got > > > > > the following error: > > > > > > > > > > Reproduce: > > > > > 1) cp make/config.mk . > > > > > 2) turn on USE_CUDA, USE_CUDNN, USE_NCCL > > > > > 3) make -j > > > > > > > > > > MXNet can build successfully. > > > > > > > > > > 4) pip install horovod > > > > > > > > > > > > > > > /home/ubuntu/src/incubator-mxnet/python/mxnet/../../include/mkldnn/mkldnn.h:55:28: > > > > > fatal error: mkldnn_version.h: No such file or directory > > > > > compilation terminated. > > > > > INFO: Unable to build MXNet plugin, will skip it. > > > > > > > > > > I did not change any setting of MKLDNN in my config.mk. I am building on > > > > > DLAMI base 18.0 which is Ubuntu 16.04 and CUDA 10.0 > > > > > > > > > > Thanks, > > > > > > > > > > Lin > > > > > > > > > > > > > > > On Sat, Jun 8, 2019 at 5:39 PM shiwen hu wrote: > > > > > > > > > > > +1 > > > > > > > > > > > > Lai Wei 于2019年6月9日周日 上午4:12写道: > > > > > > > > > > > > > Dear MXNet community, > > > > > > > > > > > > > > This is the 3-day vote to release Apache MXNet (incubating) version > > > > > > 1.5.0. > > > > > > > Voting on dev@ will start June 8, 23:59:59(PST) and close on June 11, > > > > > > > 23:59:59. > > > > > > > > > > > > > > 1) Link to release notes: > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes > > > > > > > > > > > > > > 2) Link to release candidate: > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc0 > > > > > > > > > > > > > > 3) Link to source and signatures on apache dist server: > > > > > > > > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc0/ > > > > > > > > > > > > > > > > > > > > > Please remember to TEST first before voting accordingly: > > > > > > > +1 = approve > > > > > > > +0 = no opinion > > > > > > > -1 = disapprove (provide reason) > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > > > > > > > > Lai > > > > > > > > > > > > > > > > > > > > -1. Built from source, import mxnet in python cause Segfault. > > > > back trace: > > > > Thread 1 "python3" received signal SIGSEGV, Segmentation fault. > > 0x00007fff3e8a9f20 in ?? () > > (gdb) bt > > #0 0x00007fff3e8a9f20 in ?? () > > #1 0x00007fffebbf440c in ReadConfigFile(Configuration&, > > std::__cxx11::basic_string, > > std::allocator > const&, bool const&, unsigned int const&) () from > > /usr/lib/x86_64-linux-gnu/libapt-pkg.so.5.0 > > #2 0x00007fffebbf3d97 in ReadConfigDir(Configuration&, > > std::__cxx11::basic_string, > > std::allocator > const&, bool const&, unsigned int const&) () from > > /usr/lib/x86_64-linux-gnu/libapt-pkg.so.5.0 > > #3 0x00007fffebc5e9aa in pkgInitConfig(Configuration&) () from > > /usr/lib/x86_64-linux-gnu/libapt-pkg.so.5.0 > > #4 0x00007ffff29d5c48 in ?? () from /usr/lib/python3/dist-packages/ > > apt_pkg.cpython-35m-x86_64-linux-gnu.so > > #5 0x00000000004ea10f in PyCFunction_Call () > > #6 0x0000000000536d94 in PyEval_EvalFrameEx () > > #7 0x000000000053fc97 in ?? () > > #8 0x00000000005409bf in PyEval_EvalCode () > > #9 0x000000000054a328 in ?? () > > #10 0x00000000004ea1c6 in PyCFunction_Call () > > #11 0x000000000053d353 in PyEval_EvalFrameEx () > > #12 0x000000000053fc97 in ?? () > > #13 0x000000000053bc93 in PyEval_EvalFrameEx () > > #14 0x000000000053b294 in PyEval_EvalFrameEx () > > #15 0x000000000053b294 in PyEval_EvalFrameEx () > > #16 0x000000000053b294 in PyEval_EvalFrameEx () > > #17 0x0000000000540b0b in PyEval_EvalCodeEx () > > #18 0x00000000004ec2e3 in ?? () > > #19 0x00000000005c20e7 in PyObject_Call () > > > > I was using fresh DLAMI ubuntu 18.0 and CUDA 10.0, built with USE_CUDA=1, > > USE_CUDNN=1, the rest are default values. > > > > -Zhi > Change to +1, I figured out that it was due to the dependencies. I still have issue using DL base AMI with python3, but I will not regard it as a blocker to 1.5 release. Tested Gluon-CV training and works fine. -Zhi