mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Olivier <cjolivie...@gmail.com>
Subject Re: Proposal to make MKLDNN as default CPU backend
Date Tue, 19 Nov 2019 16:08:27 GMT
Thanks, Patric. I was just trying to point out that there was currently no
guarantee of deterministic results without MKL, so there’s not necessarily
an expectation of determinism with MKL (ie requirement isn’t relaxed).

On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric <patric.zhao@intel.com> wrote:

> It may be a concern but little noise can't affect the final results if the
> algorithm is stable in numerical.
> The MKLDNN backend with mxnet-mkl has been used for 2 years and we didn't
> see the coverage issue caused by multiple threading.
> In other words, GPU programming mode works well on training where the
> non-deterministic also exists from multiple threads.
>
> Parts of training accuracy was pasted in the first PR when MKLDNN is
> integrated.
> https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-359674818
>
> In conclusion, it may happen with very little probability. I believe we
> can get a solution in case it happens someday.
>
> Thanks,
>
> --Patric
>
>
> > -----Original Message-----
> > From: Chris Olivier <cjolivier01@gmail.com>
> > Sent: Tuesday, November 19, 2019 11:51 AM
> > To: dev@mxnet.incubator.apache.org
> > Cc: Tao Lv <mutouorz@gmail.com>
> > Subject: Re: Proposal to make MKLDNN as default CPU backend
> >
> > (for non mkl dropout, for instance)
> >
> > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier <cjolivier01@gmail.com>
> > wrote:
> >
> > > To address the deterministic item, I know for a fact that training
> > > will not be deterministic in some cases where the “parallel random”
> > > class is utilized in parallel threads, such as OMP, if the number of
> > > cores is different, even with the same seed, because threads are
> > > seeded independently and different number of threads will end up
> > > generating different random number sequences. Dropout operator being
> > an example.
> > >
> > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
> > > <alfredo.luque@airbnb.com.invalid> wrote:
> > >
> > >> For AMD CPUs, you’d want to perform validation because now MKL-DNN
> > >> would be enabled by default. Historically, other intel libraries
> > >> (along with the ICC
> > >> compiler) have had performance issues on AMD CPUs. It’s just worth
> > >> double checking to make sure that’s not the case here. Perhaps some
> > >> MKL-DNN authors can chime in though. It’s not sufficient to double
> > >> check that an
> > >> AVX2 package passes tests.
> > >>
> > >> Agreed in the case we’re not releasing ARM binaries.
> > >>
> > >> The reproducibility argument is around the results being numerically
> > >> reproducible. That is, eg; if I train a model with some fixed set of
> > >> data, some random seed, etc. and then run inference on it do I get
> > >> the exact same floating point values for the weights and results?
> > >> Does MxNet already offer this without MKL-DNN?
> > >>
> > >> On November 18, 2019 at 6:32:07 PM, Tao Lv (mutouorz@gmail.com)
> > wrote:
> > >>
> > >> Regarding the cases listed by Marco:
> > >> - AMD CPU
> > >> From my architecture knowledge, what works on C4 instances (with AVX2
> > >> support) should also work well on m5a, right? I think mxnet-mkl and
> > >> mxnet-cuxxmkl packages have been fully validated on AVX2 machines.
> > >> Also, we didn't perform any validation on AMD CPU before, why we need
> > >> do that for this time?
> > >>
> > >> - ARM CPU
> > >> I don't know we're releasing any convenience binaries for ARM CPU.
> > >> This proposal mainly targets those pypi packages.
> > >>
> > >> - Windows
> > >> Already validated by CI. We're also releasing mxnet-mkl packages for
> Win.
> > >>
> > >> - GPU and MKLDNN enabled
> > >> Already validated by CI and mxnet-cuxxmkl packages have been released
> > >> for several versions.
> > >>
> > >> - Fully reproducible results (medical and financial sector requested
> > >> that and we have some flags for cuda) Not sure I understand this
> > >> case. We already have MKL-DNN backend for a while. Functionality and
> > >> correctness of it have been verified by MXNet users.
> > >>
> > >> -tao
> > >>
> > >> On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu
> > >> <marco.g.abreu@gmail.com>
> > >> wrote:
> > >>
> > >> > Sorry, my intent with the "non-standard" phrase was not about
> > >> > general
> > >> MXNet
> > >> > but rather from MKLDNNs point of view, considering that it's being
> > >> > developed by Intel, I assumed that MKLDNN might consider non-intel
> > >> > use-cases non standard.
> > >> >
> > >> > -Marco
> > >> >
> > >> > Skalicky, Sam <sskalic@amazon.com.invalid> schrieb am Mo., 18.
Nov.
> > >> 2019,
> > >> > 21:34:
> > >> >
> > >> > > Thanks Alfredo, if you can create a GitHub issue with notes/steps
> > >> > > we
> > >> can
> > >> > > add this to the todo list for integrating with the MXNet CI to
> > >> > > test on
> > >> > m5a
> > >> > > instances too. Then we can start tracking this on a regular
> > >> > > basis. It
> > >> > would
> > >> > > be great to actually test on ARM instances now that AWS has A1
> > >> instances
> > >> > > too…..ill add it to the wish list ;-D
> > >> > >
> > >> > > Sam
> > >> > >
> > >> > > > On Nov 18, 2019, at 12:32 PM, Alfredo Luque <
> > >> alfredo.luque@airbnb.com
> > >> > .INVALID>
> > >> > > wrote:
> > >> > > >
> > >> > > > Happy to run some benchmarks on an AWS m5a instance (Epyc)
and
> > >> > > > first generation AMD Threadripper Gen 1 if someone has
> > >> > > > something easy to
> > >> run
> > >> > > and
> > >> > > > representative.
> > >> > > >
> > >> > > > On November 18, 2019 at 12:29:31 PM, Skalicky, Sam (
> > >> > > > sskalic@amazon.com.invalid) wrote:
> > >> > > >
> > >> > > > Thanks a good idea Alfredo, are you able to help test on
AMD
> CPUs?
> > >> Or
> > >> > is
> > >> > > > there someone else in the mxnet dev@ community who can help?
> > >> > > >
> > >> > > > Sam
> > >> > > >
> > >> > > >> On Nov 18, 2019, at 12:27 PM, Alfredo Luque
> > >> > > > <alfredo.luque@airbnb.com.INVALID> wrote:
> > >> > > >>
> > >> > > >> Verifying that there isn’t a slowdown on AMD CPUs
(eg; Ryzen /
> > >> Epyc)
> > >> > > > would
> > >> > > >> definitely make sense as a requirement. It seems odd
to
> > >> > > >> classify
> > >> that
> > >> > as
> > >> > > > a
> > >> > > >> “nonstandard” use case.
> > >> > > >>
> > >> > > >> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
> > >> > > >> sskalic@amazon.com.invalid) wrote:
> > >> > > >>
> > >> > > >> Thanks Patric & team for your work over the years
to make
> > >> > > >> MXNet
> > >> fast
> > >> > > with
> > >> > > >> MKLDNN!
> > >> > > >>
> > >> > > >> I think it would be great to make MKLDNN enabled by
default.
> > >> > > >> We
> > >> will
> > >> > > need
> > >> > > >> to continue producing variants without MKLDNN for those
who
> > >> > > >> don’t
> > >> want
> > >> > > it
> > >> > > >> (Marco enumerated some use cases). How do you propose
to
> > >> > > >> identify
> > >> the
> > >> > > pip
> > >> > > >> wheels with/without MKLDNN? Previously we had: mxnet-mkl
and
> > >> > > > mxnet-cu101mkl
> > >> > > >> with MKLDNN. If the plain “mxnet” pip wheel now
contains
> > >> > > >> MKLDNN
> > >> what
> > >> > do
> > >> > > > you
> > >> > > >> propose we call the build without MKLDNN? mxnet-nomkl?
> > >> > > >>
> > >> > > >> Thanks!
> > >> > > >> Sam
> > >> > > >>
> > >> > > >>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu <
> > >> > marco.g.abreu@gmail.com>
> > >> > > >> wrote:
> > >> > > >>>
> > >> > > >>> Hi Patric,
> > >> > > >>>
> > >> > > >>> First of all, thanks a lot to you and your team
for all the
> > >> > > >>> effort
> > >> on
> > >> > > >> MXNet
> > >> > > >>> and mkldnn!
> > >> > > >>>
> > >> > > >>> Generally I'm inclined towards your proposal, but
I'm
> > >> > > >>> thinking
> > >> about
> > >> > > the
> > >> > > >>> non-standard use cases:
> > >> > > >>> - AMD CPU
> > >> > > >>> - ARM CPU
> > >> > > >>> - Windows
> > >> > > >>> - GPU and MKLDNN enabled
> > >> > > >>> - Fully reproducible results (medical and financial
sector
> > >> requested
> > >> > > > that
> > >> > > >>> and we have some flags for cuda)
> > >> > > >>>
> > >> > > >>> Is mkldnn fully compatible with these use cases?
If not, what
> > >> would
> > >> > > >> happen?
> > >> > > >>> If yes, do we have performance numbers?
> > >> > > >>>
> > >> > > >>> Best regards,
> > >> > > >>> Marco
> > >> > > >>>
> > >> > > >>> Zhao, Patric <patric.zhao@intel.com> schrieb
am Mo., 18. Nov.
> > >> 2019,
> > >> > > >> 14:00:
> > >> > > >>>
> > >> > > >>>> Hi MXNet community,
> > >> > > >>>>
> > >> > > >>>> From the first MKLDNN backend integrated in
release 1.2, the
> > >> > community
> > >> > > >> is
> > >> > > >>>> continuously improving the quality and performance
of MKLDNN
> > >> > > >>>> CPU
> > >> > > >> backend.
> > >> > > >>>> Nowadays, the MKLDNN backend is widely used
for the
> > >> > > >>>> inference,
> > >> > > >> especially
> > >> > > >>>> for INT8 inference, and we got lots of very
positive
> > >> > > >>>> feedbacks
> > >> from
> > >> > > >> MXNet
> > >> > > >>>> users.
> > >> > > >>>>
> > >> > > >>>> Achieved milestones as below:
> > >> > > >>>>
> > >> > > >>>> - MKLDNN integrated into Apache MXNet from release
1.2, Feb,
> > >> > > >>>> 2018
> > >> > [1]
> > >> > > >>>> - MKLDNN backend as default CPU backend from
source
> > >> > > >>>> building,
> > >> Jan,
> > >> > > 2019
> > >> > > >> [2]
> > >> > > >>>> - MKLDNN subgraph optimization as default for
the inference,
> > >> > > >>>> Jul,
> > >> > 2019
> > >> > > >> [3]
> > >> > > >>>> - MKLDNN major version upgrade in release 1.6,
Oct, 2019 [4]
> > >> > > >>>>
> > >> > > >>>> To make more successful and technical leadership
for Apache
> > >> > > >>>> MXNet
> > >> in
> > >> > > > the
> > >> > > >>>> industry, I propose to make MKLDNN as default
CPU backend in
> > >> > > >>>> all
> > >> > > binary
> > >> > > >>>> distribution from the next release.
> > >> > > >>>> The new milestone includes:
> > >> > > >>>>
> > >> > > >>>> - Static link MKLDNN library in the binary avoiding
the
> > >> > > >>>> mismatch
> > >> > > > version
> > >> > > >>>> in the runtime [5]
> > >> > > >>>> - Make nightly build with MKLDNN default from
master pre 1.7
> > >> release
> > >> > > >>>> - Binary distribution with MKLDNN default from
1.7 release.
> > >> > > >>>>
> > >> > > >>>> What will be changed:
> > >> > > >>>>
> > >> > > >>>> - mxnet and mxnet-cuXX binary will be built
with MKLDNN=1
> > >> > > >>>> - mxnet-mkl and mxnet-cuXXmkl will be not changed
in the
> > >> > > >>>> minor
> > >> > release
> > >> > > >>>> (1.x) and plan to remove in next major release
(2.0)
> > >> > > >>>>
> > >> > > >>>> Suggestions and comments are highly appreciated.
> > >> > > >>>>
> > >> > > >>>> Thanks,
> > >> > > >>>>
> > >> > > >>>> --Patric
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>> [1] https://github.com/apache/incubator-mxnet/pull/9677
> > >> > > >>>> [2]
> > >> > > >>>>
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c26295
> > >> 9101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
> > >> > > >>>> [3] https://github.com/apache/incubator-mxnet/pull/15518
> > >> > > >>>> [4]
> > >> > > >>>>
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561
> > >> c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
> > >> > > >>>> [5] https://github.com/apache/incubator-mxnet/pull/16731
> > >> > > >>>>
> > >> > > >>
> > >> > > >> —
> > >> > > >> Alfredo Luque
> > >> > > >> Software Engineer
> > >> > > >> Machine Learning Infrastructure Airbnb San Francisco,
CA
> > >> > > >
> > >> > > > —
> > >> > > > Alfredo Luque
> > >> > > > Software Engineer
> > >> > > > Machine Learning Infrastructure Airbnb San Francisco, CA
> > >> > >
> > >> > >
> > >> >
> > >>
> > >> —
> > >> Alfredo Luque
> > >> Software Engineer
> > >> Machine Learning Infrastructure
> > >> Airbnb
> > >> San Francisco, CA
> > >>
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message