mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Chernov <mecher...@gmail.com>
Subject Re: [VOTE] Remove conflicting OpenMP from CMake builds
Date Tue, 18 Jun 2019 10:15:14 GMT
I would like to cite Apache Glossary [1]:

Veto
According to the Apache methodology, a change which has been made or
proposed may be made moot through the exercise of a veto by a committer to
the codebase in question. If the R-T-C commit policy is in effect, a veto
prevents the change from being made. In either the R-T-C or C-T-R
environments, a veto applied to a change that has already been made forces
it to be reverted. Vetos may not be overridden nor voted down, and only
cease to apply when the committer who issued the veto withdraws it. All
vetos must be accompanied by a valid technical justification; a veto
without such a justification is invalid; in case of doubt, deciding whether
a technical justification is valid is up to the PMC. Vetos force discussion
and, if supported, version control rollback or appropriate code changes.
Vetoed code commits are best reverted by the original committer, unless an
urgent solution is needed (e.g., build breakers). Vetos only apply to code
changes; they do not apply to procedural issues such as software releases.

Therefore the vote opened by Pedro has it's right to exist as a voting of
PMC members on whether the veto of Chris has technical justification. It
would be great if the committee could cast a vote on this to clarify the
situation.

Best
Anton

[1] https://www.apache.org/foundation/glossary.html


On Mon, 17 Jun 2019 at 22:01, Pedro Larroy <pedro.larroy.lists@gmail.com>
wrote:

> Regarding your paste of ldd, not sure what's your point. This is the
> output with the patch from PR applied, the openmp version used is the
> one provided by MKL:
>
>
> Version of the PR that removes openmp from 3rdparty comes from MKL:
>
>         linux-vdso.so.1 (0x00007ffc4c993000)
>         libmkldnn.so.0 =>
> /home/piotr/mxnet_openmp/build/3rdparty/mkldnn/src/libmkldnn.so.0
> (0x00007f7874519000)
>         libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0
> (0x00007f7872273000)
>         librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f787206b000)
>         libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
> (0x00007f7871e35000)
>         libmklml_intel.so =>
>
> /home/piotr/mxnet_openmp/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libmklml_intel.so
> (0x00007f786a3a3000)
>         libiomp5.so =>
>
> /home/piotr/mxnet_openmp/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libiomp5.so
> (0x00007f7869fae000)
>
>
>
> In master, ldd shows that openmp is coming from 3rdparty:
>
>         libomp.so =>
> /home/piotr/mxnet_master/build/3rdparty/openmp/runtime/src/libomp.so
> (0x00007f1ea2353000)
>
>
>
> Could you please explain your argument on what's the recommended way
> and what's the divergence?  As explained in the PR there seems to be
> no performance advantage on adding our own version of openmp.
>
> Thanks.
>
> On Mon, Jun 17, 2019 at 12:55 PM Pedro Larroy
> <pedro.larroy.lists@gmail.com> wrote:
> >
> > Thanks for your answer Chris. I'm not militant in regards to one
> > option or another nor belong of any club that will accept me as
> > member, but I think we should drive this to conclusion and understand
> > why we keep it or if we should remove it and use the platform provided
> > version. There are open issues linked on that PR that are causing
> > problems, that remain unaddressed it has been said that linking with
> > two openmp versions causes undefined behaviour, I had the experience
> > of having MXNet hang when using OpenMP running the tests in plain
> > ubuntu CPU, I know we are reentering OpenMP initialization when
> > creating threads in the engine and getting an assertion, etc, so I
> > have serious concerns when running MXNet with OpenMP enabled in
> > production.
> >
> > I think you are the expert in OpenMP and HPC and it would be good to
> > have a documented and explainable outcome of one option or another
> > either in the mailing list or in the wiki.
> >
> > I also feel bad for contributors spending time measuring and
> > benchmarking without conclusion, I feel they have given up working on
> > this issue since the PR keeps getting closed and is not moved forward.
> >
> > Please help us understand whats the best way forward with this issue
> > and not just close the PR without explanations.
> >
> > Pedro.
> >
> > On Mon, Jun 17, 2019 at 11:15 AM Chris Olivier <cjolivier01@gmail.com>
> wrote:
> > >
> > > I am curious why you're being so militant troll about this.  libomp is
> used
> > > in every MKL build (download mxnet-mkl yourself and see).  I don't see
> any
> > > convincing reason to change it and so far as I can tell, no real issue
> has
> > > been proven to be related.  Anyway, I am reluctant to feed trolls any
> more
> > > than this, so I don't really have much else to add.
> > >
> > > ldd /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so
> > >         linux-vdso.so.1 (0x00007ffc989cf000)
> > >         libmklml_intel.so =>
> > > /usr/local/lib/python3.6/dist-packages/mxnet/libmklml_intel.so
> > > (0x00007f0afb7c1000)
> > >        * libiomp5.so =>
> > > /usr/local/lib/python3.6/dist-packages/mxnet/libiomp5.so
> > > (0x00007f0afb3e5000)*
> > >         librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
> (0x00007f0afb1dd000)
> > >         libmkldnn.so.0 =>
> > > /usr/local/lib/python3.6/dist-packages/mxnet/libmkldnn.so.0
> > > (0x00007f0afa7ba000)
> > >         libgfortran.so.3 =>
> > > /usr/local/lib/python3.6/dist-packages/mxnet/libgfortran.so.3
> > > (0x00007f0afa493000)
> > >         libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
> (0x00007f0afa28f000)
> > >         libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > > (0x00007f0af9f06000)
> > >         libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
> (0x00007f0af9b68000)
> > >         libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> > > (0x00007f0af9950000)
> > >         libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> > > (0x00007f0af9731000)
> > >         libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
> (0x00007f0af9340000)
> > >         /lib64/ld-linux-x86-64.so.2 (0x00007f0b073f4000)
> > >         libquadmath.so.0 =>
> > > /usr/local/lib/python3.6/dist-packages/mxnet/libquadmath.so.0
> > > (0x00007f0af9100000)
> > >
> > >
> > > On Mon, Jun 17, 2019 at 10:58 AM Pedro Larroy <
> pedro.larroy.lists@gmail.com>
> > > wrote:
> > >
> > > > I had read the "Apache Voting Process" guide here:
> > > > https://www.apache.org/foundation/voting.html  and I thought code
> > > > changes could be discussed on the mailing list in cases where the PR
> > > > is stuck or there's no response for a long time, also I understood
> > > > that -1's have to be justified.  Could you, or someone more familiar
> > > > which the Apache way enlighten on how to move this issue forward in a
> > > > constructive way?
> > > >
> > > > Thanks a lot.
> > > >
> > > > Pedro.
> > > >
> > > > On Mon, Jun 17, 2019 at 10:46 AM Pedro Larroy
> > > > <pedro.larroy.lists@gmail.com> wrote:
> > > > >
> > > > > Thanks.
> > > > >
> > > > > How do we go on advancing this PR then? all the questions have been
> > > > > answered, performance numbers provided and more. Until how long
> can a
> > > > > veto stand? Also without replies to contributors.
> > > > >
> > > > > Pedro.
> > > > >
> > > > > On Fri, Jun 14, 2019 at 5:44 PM Sheng Zha <zhasheng@apache.org>
> wrote:
> > > > > >
> > > > > > This vote is invalid as the original PR has been vetoed by a
> > > > committer. A vote on dev@ won't help you circumvent a veto.
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > > On 2019/06/14 23:59:33, Pedro Larroy <
> pedro.larroy.lists@gmail.com>
> > > > wrote:
> > > > > > > Hi all
> > > > > > >
> > > > > > > This is a 5-day vote to act and wrap up an outstanding
PR that
> > > > removes
> > > > > > > linkage with multiple OpenMP from 3rdparty and uses the
system
> > > > > > > provided one which might resolve a number of difficult
to debug
> > > > issues
> > > > > > > and possible undefined behaviour.
> > > > > > >
> > > > > > > https://github.com/apache/incubator-mxnet/pull/12160
> > > > > > >
> > > > > > > See the comments in the thread for more details but in
short,
> linking
> > > > > > > with multiple openmp versions seems to lead to undefined
> behaviour,
> > > > > > > plus it would simplify not having to deal with a custom
openmp
> > > > version
> > > > > > > and rely on the platform provided one.
> > > > > > >
> > > > > > > This is expected to simplify builds and address a number
of
> problems.
> > > > > > > Seems it doesn't cause any performance degradation, (the
Gluon
> tests
> > > > > > > run almost 4x faster in my 64 core machine).
> > > > > > >
> > > > > > > There has been in depth study of performance implications
by
> > > > > > > contributors like Stanislav Tsukrov and Anton Chernov.
 All the
> > > > > > > concerns and comments from the reviewers have been addressed
> and we
> > > > > > > can't keep asking open ended questions to block PRs. Reviewers
> are
> > > > > > > expected to be proactive and responsive to contributors
so we
> keep
> > > > > > > encouraging active contributors.
> > > > > > >
> > > > > > > please vote to merge this PR accordingly:
> > > > > > >
> > > > > > > +1 = approve
> > > > > > > +0 = no opinion
> > > > > > > -1 = disapprove (provide reason)
> > > > > > >
> > > > > > > If we observe regressions reported by any internal performance
> > > > systems
> > > > > > > or by contributors the PR can be reverted easily. So it's
not
> a one
> > > > > > > way door. But it will be useful to try this in master for
a
> while.
> > > > > > >
> > > > > > > Thank you.
> > > > > > >
> > > > > > > Pedro.
> > > > > > >
> > > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message