mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Larroy <pedro.larroy.li...@gmail.com>
Subject Re: [VOTE] Remove conflicting OpenMP from CMake builds
Date Mon, 17 Jun 2019 19:55:25 GMT
Thanks for your answer Chris. I'm not militant in regards to one
option or another nor belong of any club that will accept me as
member, but I think we should drive this to conclusion and understand
why we keep it or if we should remove it and use the platform provided
version. There are open issues linked on that PR that are causing
problems, that remain unaddressed it has been said that linking with
two openmp versions causes undefined behaviour, I had the experience
of having MXNet hang when using OpenMP running the tests in plain
ubuntu CPU, I know we are reentering OpenMP initialization when
creating threads in the engine and getting an assertion, etc, so I
have serious concerns when running MXNet with OpenMP enabled in
production.

I think you are the expert in OpenMP and HPC and it would be good to
have a documented and explainable outcome of one option or another
either in the mailing list or in the wiki.

I also feel bad for contributors spending time measuring and
benchmarking without conclusion, I feel they have given up working on
this issue since the PR keeps getting closed and is not moved forward.

Please help us understand whats the best way forward with this issue
and not just close the PR without explanations.

Pedro.

On Mon, Jun 17, 2019 at 11:15 AM Chris Olivier <cjolivier01@gmail.com> wrote:
>
> I am curious why you're being so militant troll about this.  libomp is used
> in every MKL build (download mxnet-mkl yourself and see).  I don't see any
> convincing reason to change it and so far as I can tell, no real issue has
> been proven to be related.  Anyway, I am reluctant to feed trolls any more
> than this, so I don't really have much else to add.
>
> ldd /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so
>         linux-vdso.so.1 (0x00007ffc989cf000)
>         libmklml_intel.so =>
> /usr/local/lib/python3.6/dist-packages/mxnet/libmklml_intel.so
> (0x00007f0afb7c1000)
>        * libiomp5.so =>
> /usr/local/lib/python3.6/dist-packages/mxnet/libiomp5.so
> (0x00007f0afb3e5000)*
>         librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0afb1dd000)
>         libmkldnn.so.0 =>
> /usr/local/lib/python3.6/dist-packages/mxnet/libmkldnn.so.0
> (0x00007f0afa7ba000)
>         libgfortran.so.3 =>
> /usr/local/lib/python3.6/dist-packages/mxnet/libgfortran.so.3
> (0x00007f0afa493000)
>         libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0afa28f000)
>         libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> (0x00007f0af9f06000)
>         libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0af9b68000)
>         libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> (0x00007f0af9950000)
>         libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> (0x00007f0af9731000)
>         libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0af9340000)
>         /lib64/ld-linux-x86-64.so.2 (0x00007f0b073f4000)
>         libquadmath.so.0 =>
> /usr/local/lib/python3.6/dist-packages/mxnet/libquadmath.so.0
> (0x00007f0af9100000)
>
>
> On Mon, Jun 17, 2019 at 10:58 AM Pedro Larroy <pedro.larroy.lists@gmail.com>
> wrote:
>
> > I had read the "Apache Voting Process" guide here:
> > https://www.apache.org/foundation/voting.html  and I thought code
> > changes could be discussed on the mailing list in cases where the PR
> > is stuck or there's no response for a long time, also I understood
> > that -1's have to be justified.  Could you, or someone more familiar
> > which the Apache way enlighten on how to move this issue forward in a
> > constructive way?
> >
> > Thanks a lot.
> >
> > Pedro.
> >
> > On Mon, Jun 17, 2019 at 10:46 AM Pedro Larroy
> > <pedro.larroy.lists@gmail.com> wrote:
> > >
> > > Thanks.
> > >
> > > How do we go on advancing this PR then? all the questions have been
> > > answered, performance numbers provided and more. Until how long can a
> > > veto stand? Also without replies to contributors.
> > >
> > > Pedro.
> > >
> > > On Fri, Jun 14, 2019 at 5:44 PM Sheng Zha <zhasheng@apache.org> wrote:
> > > >
> > > > This vote is invalid as the original PR has been vetoed by a
> > committer. A vote on dev@ won't help you circumvent a veto.
> > > >
> > > > -sz
> > > >
> > > > On 2019/06/14 23:59:33, Pedro Larroy <pedro.larroy.lists@gmail.com>
> > wrote:
> > > > > Hi all
> > > > >
> > > > > This is a 5-day vote to act and wrap up an outstanding PR that
> > removes
> > > > > linkage with multiple OpenMP from 3rdparty and uses the system
> > > > > provided one which might resolve a number of difficult to debug
> > issues
> > > > > and possible undefined behaviour.
> > > > >
> > > > > https://github.com/apache/incubator-mxnet/pull/12160
> > > > >
> > > > > See the comments in the thread for more details but in short, linking
> > > > > with multiple openmp versions seems to lead to undefined behaviour,
> > > > > plus it would simplify not having to deal with a custom openmp
> > version
> > > > > and rely on the platform provided one.
> > > > >
> > > > > This is expected to simplify builds and address a number of problems.
> > > > > Seems it doesn't cause any performance degradation, (the Gluon tests
> > > > > run almost 4x faster in my 64 core machine).
> > > > >
> > > > > There has been in depth study of performance implications by
> > > > > contributors like Stanislav Tsukrov and Anton Chernov.  All the
> > > > > concerns and comments from the reviewers have been addressed and
we
> > > > > can't keep asking open ended questions to block PRs. Reviewers are
> > > > > expected to be proactive and responsive to contributors so we keep
> > > > > encouraging active contributors.
> > > > >
> > > > > please vote to merge this PR accordingly:
> > > > >
> > > > > +1 = approve
> > > > > +0 = no opinion
> > > > > -1 = disapprove (provide reason)
> > > > >
> > > > > If we observe regressions reported by any internal performance
> > systems
> > > > > or by contributors the PR can be reverted easily. So it's not a one
> > > > > way door. But it will be useful to try this in master for a while.
> > > > >
> > > > > Thank you.
> > > > >
> > > > > Pedro.
> > > > >
> >

Mime
View raw message