mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lausen, Leonard" <lau...@amazon.com.INVALID>
Subject Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2
Date Tue, 04 Feb 2020 21:44:00 GMT
Hi Chris,

you previously found and fixed a OMP race condition during fork at 
https://github.com/apache/incubator-mxnet/pull/17039

This time no forks are involved. Could you run the following reproducer on
master branch:

  git clone --recursive https://github.com/apache/incubator-mxnet/ mxnet
  cd mxnet
  git checkout a726c406964b9cd17efa826738a662e09d973972 # workaround 
https://github.com/apache/incubator-mxnet/issues/17514
  mkdir build; cd build;
  cmake -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja
-DUSE_CUDA=OFF ..
  ninja
  ./cpp-package/example/test_regress_label  # run a 2-3 times to reproduce


As you are OpenMP expert, you may be able to identify the root cause withe
relative ease.

Thank you,

Leonard

On Tue, 2020-02-04 at 11:06 -0800, Chris Olivier wrote:
> When "fixing", please "fix" through actual root-cause analysis (use gdb,
> for instance) and not simply by guesswork and cutting out things which
> probably aren't actually at fault (blaming an OMP library that's in
> worldwide distribution int he billions should be treated with great
> skepticism).
> 
> On Tue, Feb 4, 2020 at 10:44 AM Lin Yuan <apeforest@gmail.com> wrote:
> 
> > Pedro,
> > 
> > While I agree with you we need to fix this usability issue, I don't think
> > this is a release blocker as Przemek mentioned above. Could we fix this in
> > the next minor release?
> > 
> > Thanks,
> > 
> > Lin
> > 
> > On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy <pedro.larroy.lists@gmail.com
> > wrote:
> > 
> > > Right. Would it be possible to have the CMake build also use libgomp for
> > > consistency with the releases until these issues are resolved?
> > > This can affect anyone compiling the distribution with CMake and also
> > > happens randomly in CI, worsening the contributor experience due to CI
> > > failures.
> > > 
> > > On Tue, Feb 4, 2020 at 9:33 AM Przemysław Trędak <ptrendx@apache.org>
> > > wrote:
> > > 
> > > > Hi Pedro,
> > > > 
> > > > From the issue that you linked it seems that you are using the LLVM
> > > > OpenMP, whereas I believe the actual release uses libgomp (at least
> > > that's
> > > > what seems to be the conclusion from this issue:
> > > > https://github.com/apache/incubator-mxnet/issues/16891)?
> > > > 
> > > > Przemek
> > > > 
> > > > On 2020/02/04 03:42:30, Pedro Larroy <pedro.larroy.lists@gmail.com>
> > > > wrote:
> > > > > -1
> > > > > 
> > > > > Unit tests passed in CPU build.
> > > > > 
> > > > > I observe crashes related to openmp using cpp unit tests:
> > > > > 
> > > > > https://github.com/apache/incubator-mxnet/issues/17043
> > > > > 
> > > > > Pedro.
> > > > > 
> > > > > On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat <chai.bapat@gmail.com
> > > > wrote:
> > > > > > +1
> > > > > > Successfully built MXNet 1.6.0rc2 on Linux
> > > > > > Tested for OpPerf utility
> > > > > > For CPU -
> > > > > > 
> > https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c
> > > > > > Works well!
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On Mon, 3 Feb 2020 at 15:43, Lin Yuan <apeforest@gmail.com>
wrote:
> > > > > > 
> > > > > > > +1
> > > > > > > 
> > > > > > > Tested Horovod with mnist example. My compiler flags are
below:
> > > > > > > 
> > > > > > > [✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT,
✔ CPU_SSE, ✔
> > > > CPU_SSE2,
> > > > > > ✔
> > > > > > > CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A,
✔ CPU_AVX, ✖
> > > > > > CPU_AVX2, ✔
> > > > > > > OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN,
✖ BLAS_ATLAS, ✖
> > > > > > BLAS_MKL, ✖
> > > > > > > BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE,
✖ PROFILER, ✔
> > > > > > > DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER,
✖
> > > > DEBUG, ✖
> > > > > > > TVM_OP]
> > > > > > > 
> > > > > > > Lin
> > > > > > > 
> > > > > > > On Sat, Feb 1, 2020 at 9:55 PM Tao Lv <taolv@apache.org>
wrote:
> > > > > > > 
> > > > > > > > +1
> > > > > > > > 
> > > > > > > > I tested below items:
> > > > > > > > 1. download artifacts from Apache dist repo;
> > > > > > > > 2. the signature looks good;
> > > > > > > > 3. build from source code with MKL-DNN and MKL on
centos;
> > > > > > > > 4. run fp32 and int8 inference of ResNet50 under
> > > > > > /example/quantization/.
> > > > > > > > thanks,
> > > > > > > > -tao
> > > > > > > > 
> > > > > > > > On Sun, Feb 2, 2020 at 11:00 AM Tao Lv <taolv@apache.org>
> > wrote:
> > > > > > > > > I see. I was looking at this page:
> > > > > > > > > 
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > > > > On Sun, Feb 2, 2020 at 4:54 AM Przemysław Trędak
<
> > > > ptrendx@apache.org
> > > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > > Hi Tao,
> > > > > > > > > > 
> > > > > > > > > > Could you tell me where did you look for
it and did not find
> > > > it? I
> > > > > > > just
> > > > > > > > > > checked and both
> > > > > > > > > > 
> > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > > > and
> > > > > > > > > > draft of the release on GitHub have them.
> > > > > > > > > > 
> > > > > > > > > > Thank you
> > > > > > > > > > Przemek
> > > > > > > > > > 
> > > > > > > > > > On 2020/02/01 14:23:11, Tao Lv <taolv@apache.org>
wrote:
> > > > > > > > > > > It seems the src tar and signature
are missing from the
> > tag.
> > > > > > > > > > > On Fri, Jan 31, 2020 at 11:09 AM Przemysław
Trędak <
> > > > > > > > ptrendx@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > Dear MXNet community,
> > > > > > > > > > > > 
> > > > > > > > > > > > This is the vote to release Apache
MXNet (incubating)
> > > > version
> > > > > > > 1.6.0.
> > > > > > > > > > > > Voting starts today and will close
on Monday 2/3/2020
> > > 23:59
> > > > PST.
> > > > > > > > > > > > Link to release notes:
> > > > > > > > > > > > 
> > > > https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
> > > > > > > > > > > > Link to release candidate:
> > > > > > > > > > > > 
> > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > > > > > > > Link to source and signatures
on apache dist server:
> > > > > > > > > > > > 
> > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > > > > > > > > > The differences comparing to previous
release candidate
> > > > > > 1.6.0.rc1:
> > > > > > > > > > > >  * Fixes for license issues (#17361,
#17375, #17370,
> > > #17460)
> > > > > > > > > > > >  * Bugfix for saving LSTM layer
parameter (#17288)
> > > > > > > > > > > >  * Bugfix for downloading the
model from model zoo from
> > > > multiple
> > > > > > > > > > processes
> > > > > > > > > > > > (#17372)
> > > > > > > > > > > >  * Fixed a symbol.py in AMP for
GluonNLP (#17408)
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Please remember to TEST first
before voting accordingly:
> > > > > > > > > > > > +1 = approve
> > > > > > > > > > > > +0 = no opinion
> > > > > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Best regards,
> > > > > > > > > > > > Przemyslaw Tredak
> > > > > > > > > > > > 
> > > > > > 
> > > > > > --
> > > > > > *Chaitanya Prakash Bapat*
> > > > > > *+1 (973) 953-6299*
> > > > > > 
> > > > > > [image: https://www.linkedin.com//in/chaibapat25]
> > > > > > <https://github.com/ChaiBapchya>[image:
> > > > https://www.facebook.com/chaibapat
> > > > > > ]
> > > > > > <https://www.facebook.com/chaibapchya>[image:
> > > > > > https://twitter.com/ChaiBapchya] <https://twitter.com/ChaiBapchya
> > > > > [image:
> > > > > > https://www.linkedin.com//in/chaibapat25]
> > > > > > <https://www.linkedin.com//in/chaibapchya/>
> > > > > > 
Mime
View raw message