mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lausen, Leonard" <lau...@amazon.com.INVALID>
Subject Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2
Date Tue, 04 Feb 2020 22:26:28 GMT
Actually below reproducer is wrong. The issue was apparently fixed on master
recently. I'm running an automated bisect and will report the result later.

On Tue, 2020-02-04 at 21:44 +0000, Lausen, Leonard wrote:
> Hi Chris,
> 
> you previously found and fixed a OMP race condition during fork at 
> https://github.com/apache/incubator-mxnet/pull/17039
> 
> This time no forks are involved. Could you run the following reproducer on
> master branch:
> 
>   git clone --recursive https://github.com/apache/incubator-mxnet/ mxnet
>   cd mxnet
>   git checkout a726c406964b9cd17efa826738a662e09d973972 # workaround 
> https://github.com/apache/incubator-mxnet/issues/17514
>   mkdir build; cd build;
>   cmake -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja
> -DUSE_CUDA=OFF ..
>   ninja
>   ./cpp-package/example/test_regress_label  # run a 2-3 times to reproduce
> 
> 
> As you are OpenMP expert, you may be able to identify the root cause withe
> relative ease.
> 
> Thank you,
> 
> Leonard
> 
> On Tue, 2020-02-04 at 11:06 -0800, Chris Olivier wrote:
> > When "fixing", please "fix" through actual root-cause analysis (use gdb,
> > for instance) and not simply by guesswork and cutting out things which
> > probably aren't actually at fault (blaming an OMP library that's in
> > worldwide distribution int he billions should be treated with great
> > skepticism).
> > 
> > On Tue, Feb 4, 2020 at 10:44 AM Lin Yuan <apeforest@gmail.com> wrote:
> > 
> > > Pedro,
> > > 
> > > While I agree with you we need to fix this usability issue, I don't think
> > > this is a release blocker as Przemek mentioned above. Could we fix this in
> > > the next minor release?
> > > 
> > > Thanks,
> > > 
> > > Lin
> > > 
> > > On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy <pedro.larroy.lists@gmail.com
> > > wrote:
> > > 
> > > > Right. Would it be possible to have the CMake build also use libgomp for
> > > > consistency with the releases until these issues are resolved?
> > > > This can affect anyone compiling the distribution with CMake and also
> > > > happens randomly in CI, worsening the contributor experience due to CI
> > > > failures.
> > > > 
> > > > On Tue, Feb 4, 2020 at 9:33 AM Przemysław Trędak <ptrendx@apache.org>
> > > > wrote:
> > > > 
> > > > > Hi Pedro,
> > > > > 
> > > > > From the issue that you linked it seems that you are using the LLVM
> > > > > OpenMP, whereas I believe the actual release uses libgomp (at least
> > > > that's
> > > > > what seems to be the conclusion from this issue:
> > > > > https://github.com/apache/incubator-mxnet/issues/16891)?
> > > > > 
> > > > > Przemek
> > > > > 
> > > > > On 2020/02/04 03:42:30, Pedro Larroy <pedro.larroy.lists@gmail.com>
> > > > > wrote:
> > > > > > -1
> > > > > > 
> > > > > > Unit tests passed in CPU build.
> > > > > > 
> > > > > > I observe crashes related to openmp using cpp unit tests:
> > > > > > 
> > > > > > https://github.com/apache/incubator-mxnet/issues/17043
> > > > > > 
> > > > > > Pedro.
> > > > > > 
> > > > > > On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat <chai.bapat@gmail.com
> > > > > wrote:
> > > > > > > +1
> > > > > > > Successfully built MXNet 1.6.0rc2 on Linux
> > > > > > > Tested for OpPerf utility
> > > > > > > For CPU -
> > > > > > > 
> > > https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c
> > > > > > > Works well!
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > On Mon, 3 Feb 2020 at 15:43, Lin Yuan <apeforest@gmail.com>
wrote:
> > > > > > > 
> > > > > > > > +1
> > > > > > > > 
> > > > > > > > Tested Horovod with mnist example. My compiler flags
are below:
> > > > > > > > 
> > > > > > > > [✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖
TENSORRT, ✔ CPU_SSE, ✔
> > > > > CPU_SSE2,
> > > > > > > ✔
> > > > > > > > CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A,
✔ CPU_AVX, ✖
> > > > > > > CPU_AVX2, ✔
> > > > > > > > OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN,
✖ BLAS_ATLAS, ✖
> > > > > > > BLAS_MKL, ✖
> > > > > > > > BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖
CAFFE, ✖ PROFILER, ✔
> > > > > > > > DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖
SIGNAL_HANDLER, ✖
> > > > > DEBUG, ✖
> > > > > > > > TVM_OP]
> > > > > > > > 
> > > > > > > > Lin
> > > > > > > > 
> > > > > > > > On Sat, Feb 1, 2020 at 9:55 PM Tao Lv <taolv@apache.org>
wrote:
> > > > > > > > 
> > > > > > > > > +1
> > > > > > > > > 
> > > > > > > > > I tested below items:
> > > > > > > > > 1. download artifacts from Apache dist repo;
> > > > > > > > > 2. the signature looks good;
> > > > > > > > > 3. build from source code with MKL-DNN and MKL
on centos;
> > > > > > > > > 4. run fp32 and int8 inference of ResNet50 under
> > > > > > > /example/quantization/.
> > > > > > > > > thanks,
> > > > > > > > > -tao
> > > > > > > > > 
> > > > > > > > > On Sun, Feb 2, 2020 at 11:00 AM Tao Lv <taolv@apache.org>
> > > wrote:
> > > > > > > > > > I see. I was looking at this page:
> > > > > > > > > > 
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > > > > > On Sun, Feb 2, 2020 at 4:54 AM Przemysław
Trędak <
> > > > > ptrendx@apache.org
> > > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > > Hi Tao,
> > > > > > > > > > > 
> > > > > > > > > > > Could you tell me where did you look
for it and did not
> > > > > > > > > > > find
> > > > > it? I
> > > > > > > > just
> > > > > > > > > > > checked and both
> > > > > > > > > > > 
> > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > > > > and
> > > > > > > > > > > draft of the release on GitHub have
them.
> > > > > > > > > > > 
> > > > > > > > > > > Thank you
> > > > > > > > > > > Przemek
> > > > > > > > > > > 
> > > > > > > > > > > On 2020/02/01 14:23:11, Tao Lv <taolv@apache.org>
wrote:
> > > > > > > > > > > > It seems the src tar and signature
are missing from the
> > > tag.
> > > > > > > > > > > > On Fri, Jan 31, 2020 at 11:09
AM Przemysław Trędak <
> > > > > > > > > ptrendx@apache.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > > Dear MXNet community,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > This is the vote to release
Apache MXNet (incubating)
> > > > > version
> > > > > > > > 1.6.0.
> > > > > > > > > > > > > Voting starts today and will
close on Monday 2/3/2020
> > > > 23:59
> > > > > PST.
> > > > > > > > > > > > > Link to release notes:
> > > > > > > > > > > > > 
> > > > > https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
> > > > > > > > > > > > > Link to release candidate:
> > > > > > > > > > > > > 
> > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > > > > > > > > Link to source and signatures
on apache dist server:
> > > > > > > > > > > > > 
> > > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > > > > > > > > > > The differences comparing
to previous release
> > > > > > > > > > > > > candidate
> > > > > > > 1.6.0.rc1:
> > > > > > > > > > > > >  * Fixes for license issues
(#17361, #17375, #17370,
> > > > #17460)
> > > > > > > > > > > > >  * Bugfix for saving LSTM
layer parameter (#17288)
> > > > > > > > > > > > >  * Bugfix for downloading
the model from model zoo
> > > > > > > > > > > > > from
> > > > > multiple
> > > > > > > > > > > processes
> > > > > > > > > > > > > (#17372)
> > > > > > > > > > > > >  * Fixed a symbol.py in AMP
for GluonNLP (#17408)
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Please remember to TEST first
before voting
> > > > > > > > > > > > > accordingly:
> > > > > > > > > > > > > +1 = approve
> > > > > > > > > > > > > +0 = no opinion
> > > > > > > > > > > > > -1 = disapprove (provide
reason)
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Best regards,
> > > > > > > > > > > > > Przemyslaw Tredak
> > > > > > > > > > > > > 
> > > > > > > 
> > > > > > > --
> > > > > > > *Chaitanya Prakash Bapat*
> > > > > > > *+1 (973) 953-6299*
> > > > > > > 
> > > > > > > [image: https://www.linkedin.com//in/chaibapat25]
> > > > > > > <https://github.com/ChaiBapchya>[image:
> > > > > https://www.facebook.com/chaibapat
> > > > > > > ]
> > > > > > > <https://www.facebook.com/chaibapchya>[image:
> > > > > > > https://twitter.com/ChaiBapchya] <https://twitter.com/ChaiBapchya
> > > > > > [image:
> > > > > > > https://www.linkedin.com//in/chaibapat25]
> > > > > > > <https://www.linkedin.com//in/chaibapchya/>
> > > > > > > 
Mime
View raw message