mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lausen, Leonard" <lau...@amazon.com.INVALID>
Subject Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2
Date Wed, 05 Feb 2020 19:24:35 GMT
Hi Markus,

you point out a critical flaw of the current MXNet website. We don't have any
versioning and the website is always built from master branch.

Thus while recent improvements to the build system are backwards compatible (ie.
old instructions continue to work), there is no way to find the old instructions
to build "old" releases.

https://github.com/apache/incubator-mxnet/issues/17497 tracks the issue.

Including the package build instructions with the source release makes sense.
To make sure they don't get out of date, including the html pages built from the
source release is another option.

Best regards
Leonard

On Wed, 2020-02-05 at 11:06 -0800, Markus Weimer wrote:
> Hi,
> 
> I was trying to follow the build instructions[0] on Ubuntu 18.04.
> However, I a stumped at step 2:
> 
> `cp config/config.cmake config.cmake`
> 
> The file `cmake.conf` does not seem to exist in the tarball on the
> dist sit. `find . -name "cmake.conf" -print` finds nothing. In fact,
> the `config` folder doesn't seem to exist in the tarball either.
> However, the file and folder do exist on GitHub[1]. Are the build
> instructions for a release different from the build from the
> repository?
> 
> On a related note: It might make sense to package build instructions
> with the source release. Websites get updated to reflect current use,
> and it might be difficult for future users of this version of mxnet to
> piece together the build instructions.
> 
> Thanks,
> 
> Markus
> 
> 
> [0]: https://mxnet.apache.org/get_started/ubuntu_setup
> [1]: https://github.com/apache/incubator-mxnet/tree/master/config
> 
> On Tue, Feb 4, 2020 at 3:05 PM Lausen, Leonard
> <lausen@amazon.com.invalid> wrote:
> > Using latest upstream jemalloc
> > https://github.com/leezu/mxnet/commit/fd4c78a635087f6164344da53a55ba2b67da2fd2
> > fixes the issue.
> > 
> > However, there were concerns that this commit relies on unreleased
> > development
> > features of jemalloc (jemalloc cmake build system support) and we'll not
> > merge
> > this commit until upstream releases cmake build system support in a release.
> > 
> > In the meantime anyone is welcome to work on an equivalent patch based on
> > the
> > custom build system in latest stable jemalloc.
> > 
> > On Tue, 2020-02-04 at 22:46 +0000, Lausen, Leonard wrote:
> > > Bisect identifies
> > > https://github.com/apache/incubator-mxnet/commit/425319cb59904573bd3fe1b6fe0a7381eceb9bbd
> > > 
> > > Thus this is an issue with jemalloc + llvm libopemnp.
> > > 
> > > The correct reproducer for latest master branch is
> > > 
> > > 
> > >   git clone --recursive https://github.com/apache/incubator-mxnet/ mxnet
> > >   cd mxnet
> > >   git checkout a726c406964b9cd17efa826738a662e09d973972 # workaround
> > > https://github.com/apache/incubator-mxnet/issues/17514
> > >   mkdir build; cd build;
> > >   cmake -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja
> > > -DUSE_CUDA=OFF -DUSE_JEMALLOC=ON ..
> > >   ninja
> > >   ./cpp-package/example/test_regress_label  # run a 2-3 times to reproduce
> > > 
> > > Let's move the discussion to about fixing the jemalloc, openmp
> > > incompatibility
> > > to https://github.com/apache/incubator-mxnet/issues/17043
> > > 
> > > 
> > > 
> > > @Chris, could you look into this issue as it only happens with LLVM
> > > OpenMP?
> > > 
> > > 
> > > 
> > > @Przemek: For 1.6.0 releas notes I suggest include recommendation to set
> > > USE_JEMALLOC=OFF when compiling from source.
> > > 
> > > This note should probably be added in any case, as building with
> > > USE_JEMALLOC=ON
> > > is broken on Ubuntu Ubuntu 18.10 and higher, as well as Debian Stable.
> > > 
> > > Given these release notes, +1 for the release.
> > > 
> > > 
> > > Best regards
> > > Leonard
> > > 
> > > On Tue, 2020-02-04 at 22:26 +0000, Lausen, Leonard wrote:
> > > > Actually below reproducer is wrong. The issue was apparently fixed on
> > > > master
> > > > recently. I'm running an automated bisect and will report the result
> > > > later.
> > > > 
> > > > On Tue, 2020-02-04 at 21:44 +0000, Lausen, Leonard wrote:
> > > > > Hi Chris,
> > > > > 
> > > > > you previously found and fixed a OMP race condition during fork at
> > > > > https://github.com/apache/incubator-mxnet/pull/17039
> > > > > 
> > > > > This time no forks are involved. Could you run the following
> > > > > reproducer on
> > > > > master branch:
> > > > > 
> > > > >   git clone --recursive https://github.com/apache/incubator-mxnet/
> > > > > mxnet
> > > > >   cd mxnet
> > > > >   git checkout a726c406964b9cd17efa826738a662e09d973972 # workaround
> > > > > https://github.com/apache/incubator-mxnet/issues/17514
> > > > >   mkdir build; cd build;
> > > > >   cmake -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja
> > > > > -DUSE_CUDA=OFF ..
> > > > >   ninja
> > > > >   ./cpp-package/example/test_regress_label  # run a 2-3 times to
> > > > > reproduce
> > > > > 
> > > > > 
> > > > > As you are OpenMP expert, you may be able to identify the root cause
> > > > > withe
> > > > > relative ease.
> > > > > 
> > > > > Thank you,
> > > > > 
> > > > > Leonard
> > > > > 
> > > > > On Tue, 2020-02-04 at 11:06 -0800, Chris Olivier wrote:
> > > > > > When "fixing", please "fix" through actual root-cause analysis
(use
> > > > > > gdb,
> > > > > > for instance) and not simply by guesswork and cutting out things
> > > > > > which
> > > > > > probably aren't actually at fault (blaming an OMP library that's
in
> > > > > > worldwide distribution int he billions should be treated with
great
> > > > > > skepticism).
> > > > > > 
> > > > > > On Tue, Feb 4, 2020 at 10:44 AM Lin Yuan <apeforest@gmail.com>
> > > > > > wrote:
> > > > > > 
> > > > > > > Pedro,
> > > > > > > 
> > > > > > > While I agree with you we need to fix this usability issue,
I
> > > > > > > don't
> > > > > > > think
> > > > > > > this is a release blocker as Przemek mentioned above. Could
we fix
> > > > > > > this
> > > > > > > in
> > > > > > > the next minor release?
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > 
> > > > > > > Lin
> > > > > > > 
> > > > > > > On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy <
> > > > > > > pedro.larroy.lists@gmail.com
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > Right. Would it be possible to have the CMake build
also use
> > > > > > > > libgomp
> > > > > > > > for
> > > > > > > > consistency with the releases until these issues are
resolved?
> > > > > > > > This can affect anyone compiling the distribution
with CMake and
> > > > > > > > also
> > > > > > > > happens randomly in CI, worsening the contributor
experience due
> > > > > > > > to
> > > > > > > > CI
> > > > > > > > failures.
> > > > > > > > 
> > > > > > > > On Tue, Feb 4, 2020 at 9:33 AM Przemysław Trędak
<
> > > > > > > > ptrendx@apache.org
> > > > > > > > wrote:
> > > > > > > > 
> > > > > > > > > Hi Pedro,
> > > > > > > > > 
> > > > > > > > > From the issue that you linked it seems that
you are using the
> > > > > > > > > LLVM
> > > > > > > > > OpenMP, whereas I believe the actual release
uses libgomp (at
> > > > > > > > > least
> > > > > > > > that's
> > > > > > > > > what seems to be the conclusion from this issue:
> > > > > > > > > https://github.com/apache/incubator-mxnet/issues/16891)?
> > > > > > > > > 
> > > > > > > > > Przemek
> > > > > > > > > 
> > > > > > > > > On 2020/02/04 03:42:30, Pedro Larroy <
> > > > > > > > > pedro.larroy.lists@gmail.com
> > > > > > > > > wrote:
> > > > > > > > > > -1
> > > > > > > > > > 
> > > > > > > > > > Unit tests passed in CPU build.
> > > > > > > > > > 
> > > > > > > > > > I observe crashes related to openmp using
cpp unit tests:
> > > > > > > > > > 
> > > > > > > > > > https://github.com/apache/incubator-mxnet/issues/17043
> > > > > > > > > > 
> > > > > > > > > > Pedro.
> > > > > > > > > > 
> > > > > > > > > > On Mon, Feb 3, 2020 at 6:44 PM Chaitanya
Bapat <
> > > > > > > > > > chai.bapat@gmail.com
> > > > > > > > > wrote:
> > > > > > > > > > > +1
> > > > > > > > > > > Successfully built MXNet 1.6.0rc2 on
Linux
> > > > > > > > > > > Tested for OpPerf utility
> > > > > > > > > > > For CPU -
> > > > > > > > > > > 
> > > > > > > https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c
> > > > > > > > > > > Works well!
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > On Mon, 3 Feb 2020 at 15:43, Lin Yuan
<apeforest@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > +1
> > > > > > > > > > > > 
> > > > > > > > > > > > Tested Horovod with mnist example.
My compiler flags are
> > > > > > > > > > > > below:
> > > > > > > > > > > > 
> > > > > > > > > > > > [✔ CUDA, ✔ CUDNN, ✔ NCCL,
✔ CUDA_RTC, ✖ TENSORRT, ✔
> > > > > > > > > > > > CPU_SSE,
> > > > > > > > > > > > ✔
> > > > > > > > > CPU_SSE2,
> > > > > > > > > > > ✔
> > > > > > > > > > > > CPU_SSE3, ✔ CPU_SSE4_1, ✔
CPU_SSE4_2, ✖ CPU_SSE4A, ✔
> > > > > > > > > > > > CPU_AVX,
> > > > > > > > > > > > ✖
> > > > > > > > > > > CPU_AVX2, ✔
> > > > > > > > > > > > OPENMP, ✖ SSE, ✔ F16C, ✖
JEMALLOC, ✔ BLAS_OPEN, ✖
> > > > > > > > > > > > BLAS_ATLAS,
> > > > > > > > > > > > ✖
> > > > > > > > > > > BLAS_MKL, ✖
> > > > > > > > > > > > BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN,
✔ OPENCV, ✖ CAFFE, ✖
> > > > > > > > > > > > PROFILER,
> > > > > > > > > > > > ✔
> > > > > > > > > > > > DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE,
✖
> > > > > > > > > > > > SIGNAL_HANDLER,
> > > > > > > > > > > > ✖
> > > > > > > > > DEBUG, ✖
> > > > > > > > > > > > TVM_OP]
> > > > > > > > > > > > 
> > > > > > > > > > > > Lin
> > > > > > > > > > > > 
> > > > > > > > > > > > On Sat, Feb 1, 2020 at 9:55 PM
Tao Lv <taolv@apache.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > > +1
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I tested below items:
> > > > > > > > > > > > > 1. download artifacts from
Apache dist repo;
> > > > > > > > > > > > > 2. the signature looks good;
> > > > > > > > > > > > > 3. build from source code
with MKL-DNN and MKL on
> > > > > > > > > > > > > centos;
> > > > > > > > > > > > > 4. run fp32 and int8 inference
of ResNet50 under
> > > > > > > > > > > /example/quantization/.
> > > > > > > > > > > > > thanks,
> > > > > > > > > > > > > -tao
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Sun, Feb 2, 2020 at 11:00
AM Tao Lv <
> > > > > > > > > > > > > taolv@apache.org>
> > > > > > > wrote:
> > > > > > > > > > > > > > I see. I was looking
at this page:
> > > > > > > > > > > > > > 
> > > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > > > > > > > > > On Sun, Feb 2, 2020
at 4:54 AM Przemysław Trędak <
> > > > > > > > > ptrendx@apache.org
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Hi Tao,
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Could you tell
me where did you look for it and
> > > > > > > > > > > > > > > did
> > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > find
> > > > > > > > > it? I
> > > > > > > > > > > > just
> > > > > > > > > > > > > > > checked and both
> > > > > > > > > > > > > > > 
> > > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > > > > > > > > and
> > > > > > > > > > > > > > > draft of the release
on GitHub have them.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Thank you
> > > > > > > > > > > > > > > Przemek
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > On 2020/02/01 14:23:11,
Tao Lv <taolv@apache.org>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > It seems the
src tar and signature are missing
> > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > the
> > > > > > > tag.
> > > > > > > > > > > > > > > > On Fri, Jan
31, 2020 at 11:09 AM Przemysław
> > > > > > > > > > > > > > > > Trędak <
> > > > > > > > > > > > > ptrendx@apache.org>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Dear
MXNet community,
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > This
is the vote to release Apache MXNet
> > > > > > > > > > > > > > > > > (incubating)
> > > > > > > > > version
> > > > > > > > > > > > 1.6.0.
> > > > > > > > > > > > > > > > > Voting
starts today and will close on Monday
> > > > > > > > > > > > > > > > > 2/3/2020
> > > > > > > > 23:59
> > > > > > > > > PST.
> > > > > > > > > > > > > > > > > Link
to release notes:
> > > > > > > > > > > > > > > > > 
> > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
> > > > > > > > > > > > > > > > > Link
to release candidate:
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > > > > > > > > > > > > Link
to source and signatures on apache dist
> > > > > > > > > > > > > > > > > server:
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > > > > > > > > > > > > > > The differences
comparing to previous release
> > > > > > > > > > > > > > > > > candidate
> > > > > > > > > > > 1.6.0.rc1:
> > > > > > > > > > > > > > > > >  * Fixes
for license issues (#17361, #17375,
> > > > > > > > > > > > > > > > > #17370,
> > > > > > > > #17460)
> > > > > > > > > > > > > > > > >  * Bugfix
for saving LSTM layer parameter
> > > > > > > > > > > > > > > > > (#17288)
> > > > > > > > > > > > > > > > >  * Bugfix
for downloading the model from model
> > > > > > > > > > > > > > > > > zoo
> > > > > > > > > > > > > > > > > from
> > > > > > > > > multiple
> > > > > > > > > > > > > > > processes
> > > > > > > > > > > > > > > > > (#17372)
> > > > > > > > > > > > > > > > >  * Fixed
a symbol.py in AMP for GluonNLP
> > > > > > > > > > > > > > > > > (#17408)
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Please
remember to TEST first before voting
> > > > > > > > > > > > > > > > > accordingly:
> > > > > > > > > > > > > > > > > +1 =
approve
> > > > > > > > > > > > > > > > > +0 =
no opinion
> > > > > > > > > > > > > > > > > -1 =
disapprove (provide reason)
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Best
regards,
> > > > > > > > > > > > > > > > > Przemyslaw
Tredak
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > --
> > > > > > > > > > > *Chaitanya Prakash Bapat*
> > > > > > > > > > > *+1 (973) 953-6299*
> > > > > > > > > > > 
> > > > > > > > > > > [image: https://www.linkedin.com//in/chaibapat25]
> > > > > > > > > > > <https://github.com/ChaiBapchya>[image:
> > > > > > > > > https://www.facebook.com/chaibapat
> > > > > > > > > > > ]
> > > > > > > > > > > <https://www.facebook.com/chaibapchya>[image:
> > > > > > > > > > > https://twitter.com/ChaiBapchya] <
> > > > > > > > > > > https://twitter.com/ChaiBapchya
> > > > > > > > > > [image:
> > > > > > > > > > > https://www.linkedin.com//in/chaibapat25]
> > > > > > > > > > > <https://www.linkedin.com//in/chaibapchya/>
> > > > > > > > > > > 
Mime
View raw message