mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Per da Silva <perdasi...@gmail.com>
Subject Re: [VOTE] Remove conflicting OpenMP from CMake builds
Date Tue, 18 Jun 2019 18:41:14 GMT
+1

On Tue., 18 Jun. 2019, 12:20 pm Anton Chernov, <mechernov@gmail.com> wrote:

> I would like to cite Apache Glossary [1]:
>
> Veto
> According to the Apache methodology, a change which has been made or
> proposed may be made moot through the exercise of a veto by a committer to
> the codebase in question. If the R-T-C commit policy is in effect, a veto
> prevents the change from being made. In either the R-T-C or C-T-R
> environments, a veto applied to a change that has already been made forces
> it to be reverted. Vetos may not be overridden nor voted down, and only
> cease to apply when the committer who issued the veto withdraws it. All
> vetos must be accompanied by a valid technical justification; a veto
> without such a justification is invalid; in case of doubt, deciding whether
> a technical justification is valid is up to the PMC. Vetos force discussion
> and, if supported, version control rollback or appropriate code changes.
> Vetoed code commits are best reverted by the original committer, unless an
> urgent solution is needed (e.g., build breakers). Vetos only apply to code
> changes; they do not apply to procedural issues such as software releases.
>
> Therefore the vote opened by Pedro has it's right to exist as a voting of
> PMC members on whether the veto of Chris has technical justification. It
> would be great if the committee could cast a vote on this to clarify the
> situation.
>
> Best
> Anton
>
> [1] https://www.apache.org/foundation/glossary.html
>
>
> On Mon, 17 Jun 2019 at 22:01, Pedro Larroy <pedro.larroy.lists@gmail.com>
> wrote:
>
> > Regarding your paste of ldd, not sure what's your point. This is the
> > output with the patch from PR applied, the openmp version used is the
> > one provided by MKL:
> >
> >
> > Version of the PR that removes openmp from 3rdparty comes from MKL:
> >
> >         linux-vdso.so.1 (0x00007ffc4c993000)
> >         libmkldnn.so.0 =>
> > /home/piotr/mxnet_openmp/build/3rdparty/mkldnn/src/libmkldnn.so.0
> > (0x00007f7874519000)
> >         libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0
> > (0x00007f7872273000)
> >         librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
> (0x00007f787206b000)
> >         libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
> > (0x00007f7871e35000)
> >         libmklml_intel.so =>
> >
> >
> /home/piotr/mxnet_openmp/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libmklml_intel.so
> > (0x00007f786a3a3000)
> >         libiomp5.so =>
> >
> >
> /home/piotr/mxnet_openmp/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libiomp5.so
> > (0x00007f7869fae000)
> >
> >
> >
> > In master, ldd shows that openmp is coming from 3rdparty:
> >
> >         libomp.so =>
> > /home/piotr/mxnet_master/build/3rdparty/openmp/runtime/src/libomp.so
> > (0x00007f1ea2353000)
> >
> >
> >
> > Could you please explain your argument on what's the recommended way
> > and what's the divergence?  As explained in the PR there seems to be
> > no performance advantage on adding our own version of openmp.
> >
> > Thanks.
> >
> > On Mon, Jun 17, 2019 at 12:55 PM Pedro Larroy
> > <pedro.larroy.lists@gmail.com> wrote:
> > >
> > > Thanks for your answer Chris. I'm not militant in regards to one
> > > option or another nor belong of any club that will accept me as
> > > member, but I think we should drive this to conclusion and understand
> > > why we keep it or if we should remove it and use the platform provided
> > > version. There are open issues linked on that PR that are causing
> > > problems, that remain unaddressed it has been said that linking with
> > > two openmp versions causes undefined behaviour, I had the experience
> > > of having MXNet hang when using OpenMP running the tests in plain
> > > ubuntu CPU, I know we are reentering OpenMP initialization when
> > > creating threads in the engine and getting an assertion, etc, so I
> > > have serious concerns when running MXNet with OpenMP enabled in
> > > production.
> > >
> > > I think you are the expert in OpenMP and HPC and it would be good to
> > > have a documented and explainable outcome of one option or another
> > > either in the mailing list or in the wiki.
> > >
> > > I also feel bad for contributors spending time measuring and
> > > benchmarking without conclusion, I feel they have given up working on
> > > this issue since the PR keeps getting closed and is not moved forward.
> > >
> > > Please help us understand whats the best way forward with this issue
> > > and not just close the PR without explanations.
> > >
> > > Pedro.
> > >
> > > On Mon, Jun 17, 2019 at 11:15 AM Chris Olivier <cjolivier01@gmail.com>
> > wrote:
> > > >
> > > > I am curious why you're being so militant troll about this.  libomp
> is
> > used
> > > > in every MKL build (download mxnet-mkl yourself and see).  I don't
> see
> > any
> > > > convincing reason to change it and so far as I can tell, no real
> issue
> > has
> > > > been proven to be related.  Anyway, I am reluctant to feed trolls any
> > more
> > > > than this, so I don't really have much else to add.
> > > >
> > > > ldd /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so
> > > >         linux-vdso.so.1 (0x00007ffc989cf000)
> > > >         libmklml_intel.so =>
> > > > /usr/local/lib/python3.6/dist-packages/mxnet/libmklml_intel.so
> > > > (0x00007f0afb7c1000)
> > > >        * libiomp5.so =>
> > > > /usr/local/lib/python3.6/dist-packages/mxnet/libiomp5.so
> > > > (0x00007f0afb3e5000)*
> > > >         librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
> > (0x00007f0afb1dd000)
> > > >         libmkldnn.so.0 =>
> > > > /usr/local/lib/python3.6/dist-packages/mxnet/libmkldnn.so.0
> > > > (0x00007f0afa7ba000)
> > > >         libgfortran.so.3 =>
> > > > /usr/local/lib/python3.6/dist-packages/mxnet/libgfortran.so.3
> > > > (0x00007f0afa493000)
> > > >         libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
> > (0x00007f0afa28f000)
> > > >         libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > > > (0x00007f0af9f06000)
> > > >         libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
> > (0x00007f0af9b68000)
> > > >         libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> > > > (0x00007f0af9950000)
> > > >         libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> > > > (0x00007f0af9731000)
> > > >         libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
> > (0x00007f0af9340000)
> > > >         /lib64/ld-linux-x86-64.so.2 (0x00007f0b073f4000)
> > > >         libquadmath.so.0 =>
> > > > /usr/local/lib/python3.6/dist-packages/mxnet/libquadmath.so.0
> > > > (0x00007f0af9100000)
> > > >
> > > >
> > > > On Mon, Jun 17, 2019 at 10:58 AM Pedro Larroy <
> > pedro.larroy.lists@gmail.com>
> > > > wrote:
> > > >
> > > > > I had read the "Apache Voting Process" guide here:
> > > > > https://www.apache.org/foundation/voting.html  and I thought code
> > > > > changes could be discussed on the mailing list in cases where the
> PR
> > > > > is stuck or there's no response for a long time, also I understood
> > > > > that -1's have to be justified.  Could you, or someone more
> familiar
> > > > > which the Apache way enlighten on how to move this issue forward
> in a
> > > > > constructive way?
> > > > >
> > > > > Thanks a lot.
> > > > >
> > > > > Pedro.
> > > > >
> > > > > On Mon, Jun 17, 2019 at 10:46 AM Pedro Larroy
> > > > > <pedro.larroy.lists@gmail.com> wrote:
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > How do we go on advancing this PR then? all the questions have
> been
> > > > > > answered, performance numbers provided and more. Until how long
> > can a
> > > > > > veto stand? Also without replies to contributors.
> > > > > >
> > > > > > Pedro.
> > > > > >
> > > > > > On Fri, Jun 14, 2019 at 5:44 PM Sheng Zha <zhasheng@apache.org>
> > wrote:
> > > > > > >
> > > > > > > This vote is invalid as the original PR has been vetoed
by a
> > > > > committer. A vote on dev@ won't help you circumvent a veto.
> > > > > > >
> > > > > > > -sz
> > > > > > >
> > > > > > > On 2019/06/14 23:59:33, Pedro Larroy <
> > pedro.larroy.lists@gmail.com>
> > > > > wrote:
> > > > > > > > Hi all
> > > > > > > >
> > > > > > > > This is a 5-day vote to act and wrap up an outstanding
PR
> that
> > > > > removes
> > > > > > > > linkage with multiple OpenMP from 3rdparty and uses
the
> system
> > > > > > > > provided one which might resolve a number of difficult
to
> debug
> > > > > issues
> > > > > > > > and possible undefined behaviour.
> > > > > > > >
> > > > > > > > https://github.com/apache/incubator-mxnet/pull/12160
> > > > > > > >
> > > > > > > > See the comments in the thread for more details but
in short,
> > linking
> > > > > > > > with multiple openmp versions seems to lead to undefined
> > behaviour,
> > > > > > > > plus it would simplify not having to deal with a custom
> openmp
> > > > > version
> > > > > > > > and rely on the platform provided one.
> > > > > > > >
> > > > > > > > This is expected to simplify builds and address a
number of
> > problems.
> > > > > > > > Seems it doesn't cause any performance degradation,
(the
> Gluon
> > tests
> > > > > > > > run almost 4x faster in my 64 core machine).
> > > > > > > >
> > > > > > > > There has been in depth study of performance implications
by
> > > > > > > > contributors like Stanislav Tsukrov and Anton Chernov.
 All
> the
> > > > > > > > concerns and comments from the reviewers have been
addressed
> > and we
> > > > > > > > can't keep asking open ended questions to block PRs.
> Reviewers
> > are
> > > > > > > > expected to be proactive and responsive to contributors
so we
> > keep
> > > > > > > > encouraging active contributors.
> > > > > > > >
> > > > > > > > please vote to merge this PR accordingly:
> > > > > > > >
> > > > > > > > +1 = approve
> > > > > > > > +0 = no opinion
> > > > > > > > -1 = disapprove (provide reason)
> > > > > > > >
> > > > > > > > If we observe regressions reported by any internal
> performance
> > > > > systems
> > > > > > > > or by contributors the PR can be reverted easily.
So it's not
> > a one
> > > > > > > > way door. But it will be useful to try this in master
for a
> > while.
> > > > > > > >
> > > > > > > > Thank you.
> > > > > > > >
> > > > > > > > Pedro.
> > > > > > > >
> > > > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message