mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Chernov <mecher...@gmail.com>
Subject Re: [Discussion] Remove bundled llvm OpenMP
Date Tue, 27 Nov 2018 16:15:17 GMT
Another thing to take into consideration:

All python artefacts that are created (PyPi) are built with make and are
not using the bundled OpenMP library.

One step for the switch to CMake to happen is the approval and merging of
the mentioned PR:

https://github.com/apache/incubator-mxnet/pull/12160

If there are no other objections I kindly ask Chris Olivier to remove his
'requesting changes' veto on it to unblock the CMake overhaul work.

Thank you.

Best
Anton

чт, 22 нояб. 2018 г. в 17:11, Anton Chernov <mechernov@gmail.com>:

>
> Thank you for you answer, Chris.
>
> > The whole “mixing omp libraries” is something that occurs in production
> every day and certainly in everything that uses mkl.
>
> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that
> this mixture is not happening:
>
> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime
> library to work. As different OpenMP runtimes may not be binary compatible
> it's important to ensure that only one OpenMP runtime is used throughout
> the application. Having more than one OpenMP runtime initialized may lead
> to undefined behavior resulting in incorrect results or crashes." [1]
>
> That is why 2 different MKLML libraries are provided:
>
> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP runtime
> lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP runtime
>
> > is the suggestion that libiomp be removed from mkl?
>
> That is certainly not my suggestion.
>
> > have you spoken with intel? have you consulted Intel at all?
>
> Yes, I have asked for comments on the issue.
>
> > “hard to debug random crash”. you’re seeing an assertion which is
> probably ...
>
> I'm seeing the result of undefined behaviour. And I want to put emphasis
> on the following statement:
>
> I disregards of whether there is a particular reason for the assert - it
> is a result of behaviour that should not happen. There are valid ways how
> to use llvm OpenMP in MXNet and the current way is not one of them.
>
> > The lack of root-causing the problem and knee-jerk solution here makes me
> uncomfortable.
>
> I hope that my efforts highlighting the problems reach you to mitigate
> your uncomfort.
>
> > if you want to see performance differences there’s an environment
> variable
> you can set in the mxnet omp tuning code that will print overhead and
> execution times for the current omp library.
>
> I don't want to see performance differences in the current OpenMP library.
> I want to remove the current OpenMP library and use the one provided by the
> compiler.
>
>
>
> Best
> Anton
>
> [1] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>
> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier <cjolivier01@apache.org>:
>
>> Do you not work on CI mostly? My apologies for thinking that was some sort
>> of team effort between you and a few others that were passionate about CI
>> keeping the CI system running smoothly.
>>
>> You have source code, you have the line the assertion is on. If you can’t
>> describe what’s going wrong that causes the assertion, then I don’t really
>> have anything more to add to this conversation beyond what’s below:
>>
>> The whole “mixing omp libraries” is something that occurs in production
>> every day and certainly in everything that uses mkl.  It may occasionally
>> cause problems for some edge cases when there is super-complex linking
>> strategies and dynamic loading.  But this is not one of those edge cases.
>> Mostly blaming this is a red herring for other thread-related problems and
>> people switch omp library and the timing of their code changes and they
>> stop seeing the problem. I’ve spent my entire career doing heavily
>> multiphreaded c++ development and i’ve seen that a million times.  is the
>> suggestion that libiomp be removed from mkl? have you spoken with intel?
>> have you consulted Intel at all?
>>
>> and what you are seeing isn’t some “hard to debug random crash”. you’re
>> seeing an assertion which is probably related to omp trying to create a
>> thread pool after a fork and something was done in the mxnet code to make
>> that sketchy to do. I’d suggest filing an issue with the llvm openmp just
>> like you’d file with any other not-well-understood behavior in mxnet.
>>
>> The lack of root-causing the problem and knee-jerk solution here makes me
>> uncomfortable.
>>
>> if you want to see performance differences there’s an environment variable
>> you can set in the mxnet omp tuning code that will print overhead and
>> execution times for the current omp library.
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov <mechernov@gmail.com>
>> wrote:
>>
>> > Hi Chris,
>> >
>> > Thank you for your answer. If you have noticed the initial email comes
>> from
>> > me, Anton Chernov (@lebeg on Github) and thus the proposal is not from
>> any
>> > 'Ci' team that you've mentioned, but from me personally.
>> >
>> > You are writing:
>> >
>> > > someone is doing something unhealthy when they fork ...
>> >
>> > I'm missing any context to understand what you mean.
>> >
>> > > we get a lot of performance gain from OMP ...
>> >
>> > There is no data that would prove this statement and therefore it is a
>> > random guess.
>> >
>> > > in many months, no investigation has occurred as to WHY the assertion
>> is
>> > failing.
>> >
>> > The investigation has concluded that this is happening due to undefined
>> > behaviour which is, in my opinion, a suffient answer that does not
>> require
>> > to go any deeper.
>> >
>> > > the pr is vetoed until such a time that the actual root cause of the
>> > problem is known.
>> >
>> > And considering the statements above there is no valid reason to veto
>> the
>> > PR.
>> >
>> >
>> > Best
>> > Anton
>> >
>> > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <cjolivier01@gmail.com>:
>> >
>> > > 3x less overhead*
>> > >
>> > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <cjolivier01@gmail.com>
>> > > wrote:
>> > >
>> > > > someone is doing something unhealthy when they fork, which is
>> causing
>> > an
>> > > > assertion in the openmp library. the same assertion that would fire
>> in
>> > > mkl,
>> > > > which is linked to libiomp5 (exact same omp library). this is new
>> > > behavior
>> > > > and most likely due to an error or suboptimal approach in the
>> forking
>> > > logic
>> > > > in mxnet.
>> > > >
>> > > > in order to circumvent the assert, the Ci team is proposing to
>> remove
>> > the
>> > > > library completely which is equivalent to cutting off your leg to
>> make
>> > > the
>> > > > pain from stubbing your toe go away.
>> > > >
>> > > > we get a lot of performance gain from OMP. is has about a 1/3 less
>> > > > overhead for entering omp regions and also supports omp regions
>> after a
>> > > > fork, which libgomp does not.
>> > > >
>> > > > in many months, no investigation has occurred as to WHY the
>> assertion
>> > is
>> > > > failing.
>> > > >
>> > > > the pr is vetoed until such a time that the actual root cause of the
>> > > > problem is known.
>> > > >
>> > > >
>> > > > thanks,
>> > > >
>> > > > -Chris.
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <mechernov@gmail.com>
>> > > wrote:
>> > > >
>> > > >> Dear MXNet community,
>> > > >>
>> > > >> I would like to drive attention to an important issue that is
>> present
>> > in
>> > > >> the MXNet CMake build: usage of bundled llvm OpenMP library.
>> > > >>
>> > > >> I have opened a PR to remove it:
>> > > >> https://github.com/apache/incubator-mxnet/pull/12160
>> > > >>
>> > > >> The issue was closed, but I am strong in my oppinion that it's
the
>> > right
>> > > >> thing to do.
>> > > >>
>> > > >> *Background*
>> > > >> If you want to use OpenMP pragmas in your code for parallelization
>> you
>> > > >> would supply a special flag to the compiler:
>> > > >>
>> > > >> - Clang / -fopenmp
>> > > >> https://openmp.llvm.org/
>> > > >>
>> > > >> - GCC / -fopenmp
>> > > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
>> > > >>
>> > > >> - Intel / [Q]openmp
>> > > >>
>> > > >>
>> > >
>> >
>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>> > > >>
>> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
>> > > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
>> > > >>
>> > > >> Each of the compilers would enable the '#pragma omp' directive
>> during
>> > > >> C/C++
>> > > >> compilation and arrange for automatic linking of the OpenMP runtime
>> > > >> library
>> > > >> supplied by each complier separately.
>> > > >>
>> > > >> Thus, to use the advantages of an OpenMP implementation one has
to
>> > > compile
>> > > >> the code with the corresponding compiler.
>> > > >>
>> > > >> Currently, in MXNet CMake build scripts a bundled version of llvm
>> > OpenMP
>> > > >> is
>> > > >> used ([1] and [2]) to replace the OpenMP library supplied by the
>> > > compiler.
>> > > >>
>> > > >> I will quote here the README from the MKL-DNN (Intel(R) Math Kernel
>> > > >> Library
>> > > >> for Deep Neural Networks):
>> > > >>
>> > > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
>> > > runtime
>> > > >> library to work. As different OpenMP runtimes may not be binary
>> > > compatible
>> > > >> it's important to ensure that only one OpenMP runtime is used
>> > throughout
>> > > >> the application. Having more than one OpenMP runtime initialized
>> may
>> > > lead
>> > > >> to undefined behavior resulting in incorrect results or crashes."
>> [3]
>> > > >>
>> > > >> And:
>> > > >>
>> > > >> "Using GNU compiler with -fopenmp and -liomp5 options will link
the
>> > > >> application with both Intel and GNU OpenMP runtime libraries.
This
>> > will
>> > > >> lead to undefined behavior of the application." [4]
>> > > >>
>> > > >> As can be seen from ldd for MXNet:
>> > > >>
>> > > >> $ ldd build/tests/mxnet_unit_tests | grep omp
>> > > >>     libomp.so =>
>> > /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
>> > > >> (0x00007f697bc55000)
>> > > >>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
>> > > >> (0x00007f69660cd000)
>> > > >>
>> > > >> *Performance*
>> > > >>
>> > > >> The only performance data related to OpenMP in MXNet I was able
to
>> > find
>> > > is
>> > > >> here:
>> > > >>
>> > > >>
>> > >
>> >
>> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
>> > > >>
>> > > >> Which in my understanding is testing imact of different environment
>> > > >> variables for the same setup (using same bundled OpenMP library).
>> > > >>
>> > > >> The libraries may differ in implementation and the Thread Affinity
>> > > >> Interface [5] may have significant impact on performance.
>> > > >>
>> > > >> All compliers support it:
>> > > >>
>> > > >> - Clang / KMP_AFFINITY
>> > > >>
>> > > >>
>> > >
>> >
>> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
>> > > >>
>> > > >> - GCC / GOMP_CPU_AFFINITY
>> > > >>
>> > > >>
>> > >
>> >
>> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
>> > > >>
>> > > >> - Intel / KMP_AFFINITY
>> > > >>
>> > > >>
>> > >
>> >
>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>> > > >>
>> > > >> - Visual Studio / SetThreadAffinityMask
>> > > >>
>> > > >>
>> > >
>> >
>> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
>> > > >>
>> > > >> *Issues*
>> > > >>
>> > > >> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
>> > > >> https://github.com/apache/incubator-mxnet/issues/10856
>> > > >>
>> > > >> libomp.so dependency (need REAL fix)
>> > > >> https://github.com/apache/incubator-mxnet/issues/11417
>> > > >>
>> > > >> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with
>> MKL
>> > > >> https://github.com/apache/incubator-mxnet/issues/8532
>> > > >>
>> > > >> Performance regression when OMP_NUM_THREADS environment variable
is
>> > not
>> > > >> set
>> > > >> https://github.com/apache/incubator-mxnet/issues/9744
>> > > >>
>> > > >> Poor concat CPU performance on CUDA builds
>> > > >> https://github.com/apache/incubator-mxnet/issues/11905
>> > > >>
>> > > >> I would appreciate hearing your thoughts.
>> > > >>
>> > > >>
>> > > >> Best
>> > > >> Anton
>> > > >>
>> > > >> [1]
>> > > >>
>> > > >>
>> > >
>> >
>> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
>> > > >> [2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty
>> > > >> [3]
>> https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>> > > >> [4]
>> https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
>> > > >> [5] https://software.intel.com/en-us/node/522691
>> > > >>
>> > > >
>> > >
>> >
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message