mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Olivier <cjolivie...@gmail.com>
Subject Re: Performance regression from removing libiomp5.so
Date Fri, 13 Dec 2019 15:13:41 GMT
the follow/up PR shouldn’t affect directly since the problem was taken care
of with the first PR., it was just something i did in the course that is
probably a good idea in general.

The dual linking probably can be addressed if it is found that the
performance returns. I would suggest that when doing a performance test, be
sure that build doesn’t have both linked in because it may be that the one
it uses happens to be the first one loaded, so the performance might look
the same in that case if a dependency uses is libgomp.


On Thu, Dec 12, 2019 at 9:48 PM Lausen, Leonard <lausen@amazon.com.invalid>
wrote:

> That error should be fixed by Chris's work at
> https://github.com/apache/incubator-mxnet/pull/17039
>
> It is currently expected that libmxnet.so transitively requires both
> libomp.so
> and libgomp.so. If this is an issue, we need to build OpenBLAS from source
> as
> part of our build scripts, because it introduces the libgomp.so
> requirement.
>
> Are there any other known issues currently?
>
> @Sam, do we have any performance comparison with the CMake build of 1.6 (+
> #
> 17039 fix backported)?
>
> Chris also mentioned he still has a follow-up PR
>
> > I actually also did change (removed from this PR in order to keep it
> > simple) that stops forcing the creation of the engine during static init
> and
> > the fork handlers, if it is desired. This has a couple of effects
> including
> > reducing the overhead of resources and threads created simply by the
> library
> > being loaded, which then, in turn, causes a lot of thread churn (Stop(),
> > Start()) when forking. Even if hey never use mxnet, we spin up a lot of
> stuff
> > unnecessarily. It actually gets lot faster on startp when this is done.
>
> Best regards
> Leonard
>
> On Fri, 2019-12-13 at 13:34 +0800, Tao Lv wrote:
> > Hi Chris,
> >
> > From the licensing standpoint, llvm omp is indeed a choice. But
> previously
> > we noticed that building mxnet with cmake and the llvm omp under 3rdparty
> > folder will cause two runtimes linked to libmxnet.so [1]. Do you think
> > that's still a problem?
> >
> > Also with the current two build systems, linking llvm omp means we need
> > move the binary release process from make to cmake, which i think need
> more
> > discussion in the community. It's not likely we can finish that within
> the
> > schedule of 1.6.0 release.
> >
> > [1]
> >
> https://github.com/apache/incubator-mxnet/issues/11417#issuecomment-555413002
> >
> > On Thu, Dec 12, 2019 at 10:28 PM Chris Olivier <cjolivier01@gmail.com>
> > wrote:
> >
> > > Hi Patric,
> > >
> > > The llvm openmp we compile (originally from same Intel source as we all
> > > know) seems to be Apache 2.0 licensed. Could we use that instead from a
> > > licensing standpoint?
> > >
> > > On Wed, Dec 11, 2019 at 10:36 PM Zhao, Patric <patric.zhao@intel.com>
> > > wrote:
> > >
> > > > Thanks, Sam.
> > > >
> > > > The root cause is from different OpenMP library. Intel OpenMP will
> > > provide
> > > > better performance as your data shown.
> > > >
> > > > Regarding release, since the license issue[1], we can't ship Intel
> OpenMP
> > > > in the binary, but the most of performance boost from MKLDNN is still
> > > > available.
> > > > I think it should be acceptable to release 1.6 with MKLDNN  + GNU
> OpenMP
> > > > for suboptimal performance.
> > > >
> > > > To achieve the best performance, user should build from source to
> enable
> > > > more advanced features like Intel MKL, Intel OpenMP, AVX512.
> > > >
> > > > Thanks,
> > > >
> > > > --Patric
> > > >
> > > > [1] https://www.apache.org/legal/resolved.html#category-x
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Skalicky, Sam <sskalic@amazon.com.INVALID>
> > > > > Sent: Wednesday, December 11, 2019 1:36 PM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Cc: Keshavan, Arjuna <arjunake@amazon.com>; Harish, Nihal
> > > > > <harnihal@amazon.com>
> > > > > Subject: Performance regression from removing libiomp5.so
> > > > >
> > > > > Hi MXNet community,
> > > > >
> > > > > I would like to bring your attention to the performance regression
> that
> > > > was
> > > > > found [1] between 1.5.1 and 1.6.0 due to removing the libiomp5.so
> > > library
> > > > > due to licensing issues. This change was made since this library
> has a
> > > > category
> > > > > x license [2] that is not compatible with the MXNet Apache
> > > > > license/distribution.
> > > > >
> > > > > We found that using OpenBLAS instead of MKL BLAS caused a
> regression
> > > > > from 1500 samples/sec to 1300 samples/sec a 13.3% regression in
> > > training
> > > > > speed for a resnet18 training benchmark on a C5.18xlarge EC2
> instance
> > > > (with
> > > > > 72 cores). Rebuilding with MKL BLAS showed an increase in
> performance
> > > to
> > > > > 1600 samples/sec in the 1.6.0 branch.
> > > > >
> > > > > Please provide your feedback on the licensing issue (are there any
> > > work-
> > > > > arounds) and the tradeoff in performance (is the benefit worth
> trying
> > > to
> > > > > include back into MXNet builds).
> > > > >
> > > > > Thanks to the efforts of the following folks for working on this
> issue
> > > > (in no
> > > > > particular order):
> > > > > Patric Zhao
> > > > > Amol Lele
> > > > > Tao Lv A
> > > > > Pedro Larroy
> > > > > Nihal Harish
> > > > > Chai Bapat
> > > > > Arjuna Keshavan
> > > > > Rong Zhang
> > > > >
> > > > > Thanks!
> > > > > Sam
> > > > >
> > > > > [1] https://github.com/apache/incubator-mxnet/issues/16891
> > > > > [2] https://www.apache.org/legal/resolved.html#category-x
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message