mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Larroy <pedro.larroy.li...@gmail.com>
Subject Re: [VOTE] Remove conflicting OpenMP from CMake builds
Date Tue, 18 Jun 2019 18:46:51 GMT
I tried to have a vote to bring light and clarify a PR which was not
advancing and not handled well in my opinion. As I understand, and
indicated before here by Apache mentors, the mailing list is the main
communication medium and where votes and discussion happen, so my
understanding was that it should be always possible to bring a
discussion from Github to the mailing list so we can build consensus
and have things moving in the right direction we have many PRs open
and thousands of issues open, bringing something to dev@ gives the
opportunity to build consensus and community.

Sheng was quick to call the vote invalid. I think we should be also
quick to call out toxic behaviors and communications which are not
helping having a healthy community. This is also part of what I would
expect from the leaders of this project as an external contributor.

There should be possible to bring discussion from github to the
mailing list, for example in cases where there are hastily merged PRs
which might need revisiting or further discussion. Maybe someone else
with more Apache experience can chime in, but personally I  don't
think a request for changes or closing a PR should constitute a formal
veto. Having a vote in the mailing list doesn't prevent anyone from
casting their vote or their veto with appropriate justification if
they desire.

Pedro.

On Tue, Jun 18, 2019 at 3:20 AM Anton Chernov <mechernov@gmail.com> wrote:
>
> I would like to cite Apache Glossary [1]:
>
> Veto
> According to the Apache methodology, a change which has been made or
> proposed may be made moot through the exercise of a veto by a committer to
> the codebase in question. If the R-T-C commit policy is in effect, a veto
> prevents the change from being made. In either the R-T-C or C-T-R
> environments, a veto applied to a change that has already been made forces
> it to be reverted. Vetos may not be overridden nor voted down, and only
> cease to apply when the committer who issued the veto withdraws it. All
> vetos must be accompanied by a valid technical justification; a veto
> without such a justification is invalid; in case of doubt, deciding whether
> a technical justification is valid is up to the PMC. Vetos force discussion
> and, if supported, version control rollback or appropriate code changes.
> Vetoed code commits are best reverted by the original committer, unless an
> urgent solution is needed (e.g., build breakers). Vetos only apply to code
> changes; they do not apply to procedural issues such as software releases.
>
> Therefore the vote opened by Pedro has it's right to exist as a voting of
> PMC members on whether the veto of Chris has technical justification. It
> would be great if the committee could cast a vote on this to clarify the
> situation.
>
> Best
> Anton
>
> [1] https://www.apache.org/foundation/glossary.html
>
>
> On Mon, 17 Jun 2019 at 22:01, Pedro Larroy <pedro.larroy.lists@gmail.com>
> wrote:
>
> > Regarding your paste of ldd, not sure what's your point. This is the
> > output with the patch from PR applied, the openmp version used is the
> > one provided by MKL:
> >
> >
> > Version of the PR that removes openmp from 3rdparty comes from MKL:
> >
> >         linux-vdso.so.1 (0x00007ffc4c993000)
> >         libmkldnn.so.0 =>
> > /home/piotr/mxnet_openmp/build/3rdparty/mkldnn/src/libmkldnn.so.0
> > (0x00007f7874519000)
> >         libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0
> > (0x00007f7872273000)
> >         librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f787206b000)
> >         libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
> > (0x00007f7871e35000)
> >         libmklml_intel.so =>
> >
> > /home/piotr/mxnet_openmp/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libmklml_intel.so
> > (0x00007f786a3a3000)
> >         libiomp5.so =>
> >
> > /home/piotr/mxnet_openmp/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libiomp5.so
> > (0x00007f7869fae000)
> >
> >
> >
> > In master, ldd shows that openmp is coming from 3rdparty:
> >
> >         libomp.so =>
> > /home/piotr/mxnet_master/build/3rdparty/openmp/runtime/src/libomp.so
> > (0x00007f1ea2353000)
> >
> >
> >
> > Could you please explain your argument on what's the recommended way
> > and what's the divergence?  As explained in the PR there seems to be
> > no performance advantage on adding our own version of openmp.
> >
> > Thanks.
> >
> > On Mon, Jun 17, 2019 at 12:55 PM Pedro Larroy
> > <pedro.larroy.lists@gmail.com> wrote:
> > >
> > > Thanks for your answer Chris. I'm not militant in regards to one
> > > option or another nor belong of any club that will accept me as
> > > member, but I think we should drive this to conclusion and understand
> > > why we keep it or if we should remove it and use the platform provided
> > > version. There are open issues linked on that PR that are causing
> > > problems, that remain unaddressed it has been said that linking with
> > > two openmp versions causes undefined behaviour, I had the experience
> > > of having MXNet hang when using OpenMP running the tests in plain
> > > ubuntu CPU, I know we are reentering OpenMP initialization when
> > > creating threads in the engine and getting an assertion, etc, so I
> > > have serious concerns when running MXNet with OpenMP enabled in
> > > production.
> > >
> > > I think you are the expert in OpenMP and HPC and it would be good to
> > > have a documented and explainable outcome of one option or another
> > > either in the mailing list or in the wiki.
> > >
> > > I also feel bad for contributors spending time measuring and
> > > benchmarking without conclusion, I feel they have given up working on
> > > this issue since the PR keeps getting closed and is not moved forward.
> > >
> > > Please help us understand whats the best way forward with this issue
> > > and not just close the PR without explanations.
> > >
> > > Pedro.
> > >
> > > On Mon, Jun 17, 2019 at 11:15 AM Chris Olivier <cjolivier01@gmail.com>
> > wrote:
> > > >
> > > > I am curious why you're being so militant troll about this.  libomp is
> > used
> > > > in every MKL build (download mxnet-mkl yourself and see).  I don't see
> > any
> > > > convincing reason to change it and so far as I can tell, no real issue
> > has
> > > > been proven to be related.  Anyway, I am reluctant to feed trolls any
> > more
> > > > than this, so I don't really have much else to add.
> > > >
> > > > ldd /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so
> > > >         linux-vdso.so.1 (0x00007ffc989cf000)
> > > >         libmklml_intel.so =>
> > > > /usr/local/lib/python3.6/dist-packages/mxnet/libmklml_intel.so
> > > > (0x00007f0afb7c1000)
> > > >        * libiomp5.so =>
> > > > /usr/local/lib/python3.6/dist-packages/mxnet/libiomp5.so
> > > > (0x00007f0afb3e5000)*
> > > >         librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
> > (0x00007f0afb1dd000)
> > > >         libmkldnn.so.0 =>
> > > > /usr/local/lib/python3.6/dist-packages/mxnet/libmkldnn.so.0
> > > > (0x00007f0afa7ba000)
> > > >         libgfortran.so.3 =>
> > > > /usr/local/lib/python3.6/dist-packages/mxnet/libgfortran.so.3
> > > > (0x00007f0afa493000)
> > > >         libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
> > (0x00007f0afa28f000)
> > > >         libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > > > (0x00007f0af9f06000)
> > > >         libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
> > (0x00007f0af9b68000)
> > > >         libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> > > > (0x00007f0af9950000)
> > > >         libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> > > > (0x00007f0af9731000)
> > > >         libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
> > (0x00007f0af9340000)
> > > >         /lib64/ld-linux-x86-64.so.2 (0x00007f0b073f4000)
> > > >         libquadmath.so.0 =>
> > > > /usr/local/lib/python3.6/dist-packages/mxnet/libquadmath.so.0
> > > > (0x00007f0af9100000)
> > > >
> > > >
> > > > On Mon, Jun 17, 2019 at 10:58 AM Pedro Larroy <
> > pedro.larroy.lists@gmail.com>
> > > > wrote:
> > > >
> > > > > I had read the "Apache Voting Process" guide here:
> > > > > https://www.apache.org/foundation/voting.html  and I thought code
> > > > > changes could be discussed on the mailing list in cases where the
PR
> > > > > is stuck or there's no response for a long time, also I understood
> > > > > that -1's have to be justified.  Could you, or someone more familiar
> > > > > which the Apache way enlighten on how to move this issue forward
in a
> > > > > constructive way?
> > > > >
> > > > > Thanks a lot.
> > > > >
> > > > > Pedro.
> > > > >
> > > > > On Mon, Jun 17, 2019 at 10:46 AM Pedro Larroy
> > > > > <pedro.larroy.lists@gmail.com> wrote:
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > How do we go on advancing this PR then? all the questions have
been
> > > > > > answered, performance numbers provided and more. Until how long
> > can a
> > > > > > veto stand? Also without replies to contributors.
> > > > > >
> > > > > > Pedro.
> > > > > >
> > > > > > On Fri, Jun 14, 2019 at 5:44 PM Sheng Zha <zhasheng@apache.org>
> > wrote:
> > > > > > >
> > > > > > > This vote is invalid as the original PR has been vetoed
by a
> > > > > committer. A vote on dev@ won't help you circumvent a veto.
> > > > > > >
> > > > > > > -sz
> > > > > > >
> > > > > > > On 2019/06/14 23:59:33, Pedro Larroy <
> > pedro.larroy.lists@gmail.com>
> > > > > wrote:
> > > > > > > > Hi all
> > > > > > > >
> > > > > > > > This is a 5-day vote to act and wrap up an outstanding
PR that
> > > > > removes
> > > > > > > > linkage with multiple OpenMP from 3rdparty and uses
the system
> > > > > > > > provided one which might resolve a number of difficult
to debug
> > > > > issues
> > > > > > > > and possible undefined behaviour.
> > > > > > > >
> > > > > > > > https://github.com/apache/incubator-mxnet/pull/12160
> > > > > > > >
> > > > > > > > See the comments in the thread for more details but
in short,
> > linking
> > > > > > > > with multiple openmp versions seems to lead to undefined
> > behaviour,
> > > > > > > > plus it would simplify not having to deal with a custom
openmp
> > > > > version
> > > > > > > > and rely on the platform provided one.
> > > > > > > >
> > > > > > > > This is expected to simplify builds and address a
number of
> > problems.
> > > > > > > > Seems it doesn't cause any performance degradation,
(the Gluon
> > tests
> > > > > > > > run almost 4x faster in my 64 core machine).
> > > > > > > >
> > > > > > > > There has been in depth study of performance implications
by
> > > > > > > > contributors like Stanislav Tsukrov and Anton Chernov.
 All the
> > > > > > > > concerns and comments from the reviewers have been
addressed
> > and we
> > > > > > > > can't keep asking open ended questions to block PRs.
Reviewers
> > are
> > > > > > > > expected to be proactive and responsive to contributors
so we
> > keep
> > > > > > > > encouraging active contributors.
> > > > > > > >
> > > > > > > > please vote to merge this PR accordingly:
> > > > > > > >
> > > > > > > > +1 = approve
> > > > > > > > +0 = no opinion
> > > > > > > > -1 = disapprove (provide reason)
> > > > > > > >
> > > > > > > > If we observe regressions reported by any internal
performance
> > > > > systems
> > > > > > > > or by contributors the PR can be reverted easily.
So it's not
> > a one
> > > > > > > > way door. But it will be useful to try this in master
for a
> > while.
> > > > > > > >
> > > > > > > > Thank you.
> > > > > > > >
> > > > > > > > Pedro.
> > > > > > > >
> > > > >
> >

Mime
View raw message