mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Larroy <pedro.larroy.li...@gmail.com>
Subject Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2
Date Tue, 04 Feb 2020 19:25:37 GMT
@Chris: If you actually go and read the issue that I linked above, you can
see that I was using gdb. Maybe you can have a look into the issue if you
have an idea to fix. The backtrace points to a segfault in the omp library.
While the cause could be somewhere else which is causing undefined
behaviour, taking into consideration that this is not happening with
libgomp and other engineers believe that mixing openmp implementations at
runtime can cause UB, it's reasonable to believe that there's a good chance
that is related to this. I personally don't have time to investigate this
further, as I don't think introducing this dependency is worth the trouble
is causing, when the one provided by the platform works well enough.

0x00007ffff43b284a in __kmp_fork_call () from
/home/piotr/mxnet/build/3rdparty/openmp/runtime/src/libomp.so
(gdb) bt


@Lin: I personally wouldn't be comfortable releasing a version that
segfaults, I don't think that meets the quality bar. but this is up to the
community to decide, I'm only reporting what I observe.

Releasing with indications of this kind of problems causes issues later in
downstream projects and running services.

On Tue, Feb 4, 2020 at 11:07 AM Chris Olivier <cjolivier01@gmail.com> wrote:

> When "fixing", please "fix" through actual root-cause analysis (use gdb,
> for instance) and not simply by guesswork and cutting out things which
> probably aren't actually at fault (blaming an OMP library that's in
> worldwide distribution int he billions should be treated with great
> skepticism).
>
> On Tue, Feb 4, 2020 at 10:44 AM Lin Yuan <apeforest@gmail.com> wrote:
>
> > Pedro,
> >
> > While I agree with you we need to fix this usability issue, I don't think
> > this is a release blocker as Przemek mentioned above. Could we fix this
> in
> > the next minor release?
> >
> > Thanks,
> >
> > Lin
> >
> > On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy <
> pedro.larroy.lists@gmail.com
> > >
> > wrote:
> >
> > > Right. Would it be possible to have the CMake build also use libgomp
> for
> > > consistency with the releases until these issues are resolved?
> > > This can affect anyone compiling the distribution with CMake and also
> > > happens randomly in CI, worsening the contributor experience due to CI
> > > failures.
> > >
> > > On Tue, Feb 4, 2020 at 9:33 AM Przemysław Trędak <ptrendx@apache.org>
> > > wrote:
> > >
> > > > Hi Pedro,
> > > >
> > > > From the issue that you linked it seems that you are using the LLVM
> > > > OpenMP, whereas I believe the actual release uses libgomp (at least
> > > that's
> > > > what seems to be the conclusion from this issue:
> > > > https://github.com/apache/incubator-mxnet/issues/16891)?
> > > >
> > > > Przemek
> > > >
> > > > On 2020/02/04 03:42:30, Pedro Larroy <pedro.larroy.lists@gmail.com>
> > > > wrote:
> > > > > -1
> > > > >
> > > > > Unit tests passed in CPU build.
> > > > >
> > > > > I observe crashes related to openmp using cpp unit tests:
> > > > >
> > > > > https://github.com/apache/incubator-mxnet/issues/17043
> > > > >
> > > > > Pedro.
> > > > >
> > > > > On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat <
> chai.bapat@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > > > +1
> > > > > > Successfully built MXNet 1.6.0rc2 on Linux
> > > > > > Tested for OpPerf utility
> > > > > > For CPU -
> > > > > >
> > https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c
> > > > > >
> > > > > > Works well!
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, 3 Feb 2020 at 15:43, Lin Yuan <apeforest@gmail.com>
> wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Tested Horovod with mnist example. My compiler flags are
below:
> > > > > > >
> > > > > > > [✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT,
✔ CPU_SSE, ✔
> > > > CPU_SSE2,
> > > > > > ✔
> > > > > > > CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A,
✔ CPU_AVX, ✖
> > > > > > CPU_AVX2, ✔
> > > > > > > OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN,
✖ BLAS_ATLAS, ✖
> > > > > > BLAS_MKL, ✖
> > > > > > > BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE,
✖ PROFILER,
> ✔
> > > > > > > DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER,
✖
> > > > DEBUG, ✖
> > > > > > > TVM_OP]
> > > > > > >
> > > > > > > Lin
> > > > > > >
> > > > > > > On Sat, Feb 1, 2020 at 9:55 PM Tao Lv <taolv@apache.org>
> wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > I tested below items:
> > > > > > > > 1. download artifacts from Apache dist repo;
> > > > > > > > 2. the signature looks good;
> > > > > > > > 3. build from source code with MKL-DNN and MKL on
centos;
> > > > > > > > 4. run fp32 and int8 inference of ResNet50 under
> > > > > > /example/quantization/.
> > > > > > > >
> > > > > > > > thanks,
> > > > > > > > -tao
> > > > > > > >
> > > > > > > > On Sun, Feb 2, 2020 at 11:00 AM Tao Lv <taolv@apache.org>
> > wrote:
> > > > > > > >
> > > > > > > > > I see. I was looking at this page:
> > > > > > > > >
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > > > >
> > > > > > > > > On Sun, Feb 2, 2020 at 4:54 AM Przemysław Trędak
<
> > > > ptrendx@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Hi Tao,
> > > > > > > > >>
> > > > > > > > >> Could you tell me where did you look for
it and did not
> find
> > > > it? I
> > > > > > > just
> > > > > > > > >> checked and both
> > > > > > > > >>
> > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > > > and
> > > > > > > > >> draft of the release on GitHub have them.
> > > > > > > > >>
> > > > > > > > >> Thank you
> > > > > > > > >> Przemek
> > > > > > > > >>
> > > > > > > > >> On 2020/02/01 14:23:11, Tao Lv <taolv@apache.org>
wrote:
> > > > > > > > >> > It seems the src tar and signature are
missing from the
> > tag.
> > > > > > > > >> >
> > > > > > > > >> > On Fri, Jan 31, 2020 at 11:09 AM Przemysław
Trędak <
> > > > > > > > ptrendx@apache.org>
> > > > > > > > >> > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Dear MXNet community,
> > > > > > > > >> > >
> > > > > > > > >> > > This is the vote to release Apache
MXNet (incubating)
> > > > version
> > > > > > > 1.6.0.
> > > > > > > > >> > > Voting starts today and will close
on Monday 2/3/2020
> > > 23:59
> > > > PST.
> > > > > > > > >> > >
> > > > > > > > >> > > Link to release notes:
> > > > > > > > >> > >
> > > > > > > >
> > > >
> https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
> > > > > > > > >> > >
> > > > > > > > >> > > Link to release candidate:
> > > > > > > > >> > >
> > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > > > >> > >
> > > > > > > > >> > > Link to source and signatures on
apache dist server:
> > > > > > > > >> > >
> > > > > >
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > > > > > >> > >
> > > > > > > > >> > > The differences comparing to previous
release
> candidate
> > > > > > 1.6.0.rc1:
> > > > > > > > >> > >  * Fixes for license issues (#17361,
#17375, #17370,
> > > #17460)
> > > > > > > > >> > >  * Bugfix for saving LSTM layer
parameter (#17288)
> > > > > > > > >> > >  * Bugfix for downloading the model
from model zoo
> from
> > > > multiple
> > > > > > > > >> processes
> > > > > > > > >> > > (#17372)
> > > > > > > > >> > >  * Fixed a symbol.py in AMP for
GluonNLP (#17408)
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > > Please remember to TEST first before
voting
> accordingly:
> > > > > > > > >> > > +1 = approve
> > > > > > > > >> > > +0 = no opinion
> > > > > > > > >> > > -1 = disapprove (provide reason)
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > > Best regards,
> > > > > > > > >> > > Przemyslaw Tredak
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > *Chaitanya Prakash Bapat*
> > > > > > *+1 (973) 953-6299*
> > > > > >
> > > > > > [image: https://www.linkedin.com//in/chaibapat25]
> > > > > > <https://github.com/ChaiBapchya>[image:
> > > > https://www.facebook.com/chaibapat
> > > > > > ]
> > > > > > <https://www.facebook.com/chaibapchya>[image:
> > > > > > https://twitter.com/ChaiBapchya] <
> https://twitter.com/ChaiBapchya
> > > > >[image:
> > > > > > https://www.linkedin.com//in/chaibapat25]
> > > > > > <https://www.linkedin.com//in/chaibapchya/>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message