mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhao, Patric" <patric.z...@intel.com>
Subject RE: Proposal to make MKLDNN as default CPU backend
Date Wed, 20 Nov 2019 05:27:57 GMT
Thanks all of the great suggestions. 

Regarding the binary release, including w/o MKLDNN build, I have summarized a table (check
attachment).

- Major changes in python packages, see attached table. 
- Switch on MKLDNN for no mkl suffix binary in release 1.7 (Red check mark) 
- Add new mxnet-native build w/o MKLDNN and cuDNN (Yellow background)
  Track the usage/download in 1-2 releases and then decide if we need it for a long time
- Drop all mkl suffix binary in next major release v2.x.

Thanks,

--Patric

> -----Original Message-----
> From: Lin Yuan <apeforest@gmail.com>
> Sent: Wednesday, November 20, 2019 5:40 AM
> To: dev@mxnet.incubator.apache.org
> Cc: Tao Lv <mutouorz@gmail.com>
> Subject: Re: Proposal to make MKLDNN as default CPU backend
> 
> Also per Sam's suggestion, we could still release a build without MKLDNN
> (name it mxnet-nomkldnn?) and track the usage/download for one or two
> releases. If there is no usage, we could drop that build in the future.
> 
> Best,
> 
> Lin
> 
> On Tue, Nov 19, 2019 at 1:23 PM Lin Yuan <apeforest@gmail.com> wrote:
> 
> > Just to summarize base on the concerns Marco raised and discussed
> abvove:
> >
> > - AMD CPU (it should work with MKLDNN:
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+M
> KL
> > -DNN+-+Performance+Benchmarking
> > )
> > - ARM CPU (we don't have it today w/o MKLDNN either)
> > - Windows (Windows support is there regardless of MKLDNN or not)
> > - GPU and MKLDNN enabled (already supported)
> > - Fully reproducible results (medical and financial sector requested
> > that and we have some flags for cuda) (The nondeterminism exists even
> > today w/o MKLDNN. We should address it regardless of MLKDNN)
> >
> > Marco, please let us know if your concerns are properly addressed?
> >
> > Given that MKLDNN gives significant performance speed up in CPU, I am
> > inclined to make it default in pip build.
> >
> > Best,
> >
> > Lin
> >
> > On Tue, Nov 19, 2019 at 8:08 AM Chris Olivier <cjolivier01@gmail.com>
> > wrote:
> >
> >> Thanks, Patric. I was just trying to point out that there was
> >> currently no guarantee of deterministic results without MKL, so
> >> there’s not necessarily an expectation of determinism with MKL (ie
> requirement isn’t relaxed).
> >>
> >> On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric <patric.zhao@intel.com>
> >> wrote:
> >>
> >> > It may be a concern but little noise can't affect the final results
> >> > if
> >> the
> >> > algorithm is stable in numerical.
> >> > The MKLDNN backend with mxnet-mkl has been used for 2 years and
> we
> >> didn't
> >> > see the coverage issue caused by multiple threading.
> >> > In other words, GPU programming mode works well on training where
> >> > the non-deterministic also exists from multiple threads.
> >> >
> >> > Parts of training accuracy was pasted in the first PR when MKLDNN
> >> > is integrated.
> >> >
> >> https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-
> 3596
> >> 74818
> >> >
> >> > In conclusion, it may happen with very little probability. I
> >> > believe we can get a solution in case it happens someday.
> >> >
> >> > Thanks,
> >> >
> >> > --Patric
> >> >
> >> >
> >> > > -----Original Message-----
> >> > > From: Chris Olivier <cjolivier01@gmail.com>
> >> > > Sent: Tuesday, November 19, 2019 11:51 AM
> >> > > To: dev@mxnet.incubator.apache.org
> >> > > Cc: Tao Lv <mutouorz@gmail.com>
> >> > > Subject: Re: Proposal to make MKLDNN as default CPU backend
> >> > >
> >> > > (for non mkl dropout, for instance)
> >> > >
> >> > > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier
> >> > > <cjolivier01@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > To address the deterministic item, I know for a fact that
> >> > > > training will not be deterministic in some cases where the “parallel
> random”
> >> > > > class is utilized in parallel threads, such as OMP, if the
> >> > > > number of cores is different, even with the same seed, because
> >> > > > threads are seeded independently and different number of
> >> > > > threads will end up generating different random number
> >> > > > sequences. Dropout operator being
> >> > > an example.
> >> > > >
> >> > > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
> >> > > > <alfredo.luque@airbnb.com.invalid> wrote:
> >> > > >
> >> > > >> For AMD CPUs, you’d want to perform validation because
now
> >> > > >> MKL-DNN would be enabled by default. Historically, other
intel
> >> > > >> libraries (along with the ICC
> >> > > >> compiler) have had performance issues on AMD CPUs. It’s
just
> >> > > >> worth double checking to make sure that’s not the case
here.
> >> > > >> Perhaps some MKL-DNN authors can chime in though. It’s
not
> >> > > >> sufficient to double check that an
> >> > > >> AVX2 package passes tests.
> >> > > >>
> >> > > >> Agreed in the case we’re not releasing ARM binaries.
> >> > > >>
> >> > > >> The reproducibility argument is around the results being
> >> numerically
> >> > > >> reproducible. That is, eg; if I train a model with some fixed
> >> > > >> set
> >> of
> >> > > >> data, some random seed, etc. and then run inference on it
do I
> >> > > >> get the exact same floating point values for the weights
and results?
> >> > > >> Does MxNet already offer this without MKL-DNN?
> >> > > >>
> >> > > >> On November 18, 2019 at 6:32:07 PM, Tao Lv
> >> > > >> (mutouorz@gmail.com)
> >> > > wrote:
> >> > > >>
> >> > > >> Regarding the cases listed by Marco:
> >> > > >> - AMD CPU
> >> > > >> From my architecture knowledge, what works on C4 instances
> >> > > >> (with
> >> AVX2
> >> > > >> support) should also work well on m5a, right? I think
> >> > > >> mxnet-mkl and mxnet-cuxxmkl packages have been fully validated
> on AVX2 machines.
> >> > > >> Also, we didn't perform any validation on AMD CPU before,
why
> >> > > >> we
> >> need
> >> > > >> do that for this time?
> >> > > >>
> >> > > >> - ARM CPU
> >> > > >> I don't know we're releasing any convenience binaries for
ARM
> CPU.
> >> > > >> This proposal mainly targets those pypi packages.
> >> > > >>
> >> > > >> - Windows
> >> > > >> Already validated by CI. We're also releasing mxnet-mkl
> >> > > >> packages
> >> for
> >> > Win.
> >> > > >>
> >> > > >> - GPU and MKLDNN enabled
> >> > > >> Already validated by CI and mxnet-cuxxmkl packages have been
> >> released
> >> > > >> for several versions.
> >> > > >>
> >> > > >> - Fully reproducible results (medical and financial sector
> >> requested
> >> > > >> that and we have some flags for cuda) Not sure I understand
> >> > > >> this case. We already have MKL-DNN backend for a while.
> >> > > >> Functionality
> >> and
> >> > > >> correctness of it have been verified by MXNet users.
> >> > > >>
> >> > > >> -tao
> >> > > >>
> >> > > >> On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu
> >> > > >> <marco.g.abreu@gmail.com>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Sorry, my intent with the "non-standard" phrase was
not
> >> > > >> > about general
> >> > > >> MXNet
> >> > > >> > but rather from MKLDNNs point of view, considering that
it's
> >> being
> >> > > >> > developed by Intel, I assumed that MKLDNN might consider
> >> non-intel
> >> > > >> > use-cases non standard.
> >> > > >> >
> >> > > >> > -Marco
> >> > > >> >
> >> > > >> > Skalicky, Sam <sskalic@amazon.com.invalid> schrieb
am Mo., 18.
> >> Nov.
> >> > > >> 2019,
> >> > > >> > 21:34:
> >> > > >> >
> >> > > >> > > Thanks Alfredo, if you can create a GitHub issue
with
> >> notes/steps
> >> > > >> > > we
> >> > > >> can
> >> > > >> > > add this to the todo list for integrating with
the MXNet
> >> > > >> > > CI to test on
> >> > > >> > m5a
> >> > > >> > > instances too. Then we can start tracking this
on a
> >> > > >> > > regular basis. It
> >> > > >> > would
> >> > > >> > > be great to actually test on ARM instances now
that AWS
> >> > > >> > > has A1
> >> > > >> instances
> >> > > >> > > too…..ill add it to the wish list ;-D
> >> > > >> > >
> >> > > >> > > Sam
> >> > > >> > >
> >> > > >> > > > On Nov 18, 2019, at 12:32 PM, Alfredo Luque
<
> >> > > >> alfredo.luque@airbnb.com
> >> > > >> > .INVALID>
> >> > > >> > > wrote:
> >> > > >> > > >
> >> > > >> > > > Happy to run some benchmarks on an AWS m5a
instance
> >> > > >> > > > (Epyc)
> >> and
> >> > > >> > > > first generation AMD Threadripper Gen 1 if
someone has
> >> > > >> > > > something easy to
> >> > > >> run
> >> > > >> > > and
> >> > > >> > > > representative.
> >> > > >> > > >
> >> > > >> > > > On November 18, 2019 at 12:29:31 PM, Skalicky,
Sam (
> >> > > >> > > > sskalic@amazon.com.invalid) wrote:
> >> > > >> > > >
> >> > > >> > > > Thanks a good idea Alfredo, are you able to
help test on
> >> > > >> > > > AMD
> >> > CPUs?
> >> > > >> Or
> >> > > >> > is
> >> > > >> > > > there someone else in the mxnet dev@ community
who can
> help?
> >> > > >> > > >
> >> > > >> > > > Sam
> >> > > >> > > >
> >> > > >> > > >> On Nov 18, 2019, at 12:27 PM, Alfredo
Luque
> >> > > >> > > > <alfredo.luque@airbnb.com.INVALID> wrote:
> >> > > >> > > >>
> >> > > >> > > >> Verifying that there isn’t a slowdown
on AMD CPUs (eg;
> >> Ryzen /
> >> > > >> Epyc)
> >> > > >> > > > would
> >> > > >> > > >> definitely make sense as a requirement.
It seems odd to
> >> > > >> > > >> classify
> >> > > >> that
> >> > > >> > as
> >> > > >> > > > a
> >> > > >> > > >> “nonstandard” use case.
> >> > > >> > > >>
> >> > > >> > > >> On November 18, 2019 at 12:20:33 PM, Skalicky,
Sam (
> >> > > >> > > >> sskalic@amazon.com.invalid) wrote:
> >> > > >> > > >>
> >> > > >> > > >> Thanks Patric & team for your work
over the years to
> >> > > >> > > >> make MXNet
> >> > > >> fast
> >> > > >> > > with
> >> > > >> > > >> MKLDNN!
> >> > > >> > > >>
> >> > > >> > > >> I think it would be great to make MKLDNN
enabled by
> default.
> >> > > >> > > >> We
> >> > > >> will
> >> > > >> > > need
> >> > > >> > > >> to continue producing variants without
MKLDNN for those
> >> > > >> > > >> who don’t
> >> > > >> want
> >> > > >> > > it
> >> > > >> > > >> (Marco enumerated some use cases). How
do you propose
> >> > > >> > > >> to identify
> >> > > >> the
> >> > > >> > > pip
> >> > > >> > > >> wheels with/without MKLDNN? Previously
we had:
> >> > > >> > > >> mxnet-mkl and
> >> > > >> > > > mxnet-cu101mkl
> >> > > >> > > >> with MKLDNN. If the plain “mxnet”
pip wheel now
> >> > > >> > > >> contains MKLDNN
> >> > > >> what
> >> > > >> > do
> >> > > >> > > > you
> >> > > >> > > >> propose we call the build without MKLDNN?
mxnet-nomkl?
> >> > > >> > > >>
> >> > > >> > > >> Thanks!
> >> > > >> > > >> Sam
> >> > > >> > > >>
> >> > > >> > > >>> On Nov 18, 2019, at 11:08 AM, Marco
de Abreu <
> >> > > >> > marco.g.abreu@gmail.com>
> >> > > >> > > >> wrote:
> >> > > >> > > >>>
> >> > > >> > > >>> Hi Patric,
> >> > > >> > > >>>
> >> > > >> > > >>> First of all, thanks a lot to you
and your team for
> >> > > >> > > >>> all the effort
> >> > > >> on
> >> > > >> > > >> MXNet
> >> > > >> > > >>> and mkldnn!
> >> > > >> > > >>>
> >> > > >> > > >>> Generally I'm inclined towards your
proposal, but I'm
> >> > > >> > > >>> thinking
> >> > > >> about
> >> > > >> > > the
> >> > > >> > > >>> non-standard use cases:
> >> > > >> > > >>> - AMD CPU
> >> > > >> > > >>> - ARM CPU
> >> > > >> > > >>> - Windows
> >> > > >> > > >>> - GPU and MKLDNN enabled
> >> > > >> > > >>> - Fully reproducible results (medical
and financial
> >> > > >> > > >>> sector
> >> > > >> requested
> >> > > >> > > > that
> >> > > >> > > >>> and we have some flags for cuda)
> >> > > >> > > >>>
> >> > > >> > > >>> Is mkldnn fully compatible with these
use cases? If
> >> > > >> > > >>> not,
> >> what
> >> > > >> would
> >> > > >> > > >> happen?
> >> > > >> > > >>> If yes, do we have performance numbers?
> >> > > >> > > >>>
> >> > > >> > > >>> Best regards,
> >> > > >> > > >>> Marco
> >> > > >> > > >>>
> >> > > >> > > >>> Zhao, Patric <patric.zhao@intel.com>
schrieb am Mo., 18.
> >> Nov.
> >> > > >> 2019,
> >> > > >> > > >> 14:00:
> >> > > >> > > >>>
> >> > > >> > > >>>> Hi MXNet community,
> >> > > >> > > >>>>
> >> > > >> > > >>>> From the first MKLDNN backend
integrated in release
> >> > > >> > > >>>> 1.2,
> >> the
> >> > > >> > community
> >> > > >> > > >> is
> >> > > >> > > >>>> continuously improving the quality
and performance of
> >> MKLDNN
> >> > > >> > > >>>> CPU
> >> > > >> > > >> backend.
> >> > > >> > > >>>> Nowadays, the MKLDNN backend is
widely used for the
> >> > > >> > > >>>> inference,
> >> > > >> > > >> especially
> >> > > >> > > >>>> for INT8 inference, and we got
lots of very positive
> >> > > >> > > >>>> feedbacks
> >> > > >> from
> >> > > >> > > >> MXNet
> >> > > >> > > >>>> users.
> >> > > >> > > >>>>
> >> > > >> > > >>>> Achieved milestones as below:
> >> > > >> > > >>>>
> >> > > >> > > >>>> - MKLDNN integrated into Apache
MXNet from release
> >> > > >> > > >>>> 1.2,
> >> Feb,
> >> > > >> > > >>>> 2018
> >> > > >> > [1]
> >> > > >> > > >>>> - MKLDNN backend as default CPU
backend from source
> >> > > >> > > >>>> building,
> >> > > >> Jan,
> >> > > >> > > 2019
> >> > > >> > > >> [2]
> >> > > >> > > >>>> - MKLDNN subgraph optimization
as default for the
> >> inference,
> >> > > >> > > >>>> Jul,
> >> > > >> > 2019
> >> > > >> > > >> [3]
> >> > > >> > > >>>> - MKLDNN major version upgrade
in release 1.6, Oct,
> >> > > >> > > >>>> 2019
> >> [4]
> >> > > >> > > >>>>
> >> > > >> > > >>>> To make more successful and technical
leadership for
> >> Apache
> >> > > >> > > >>>> MXNet
> >> > > >> in
> >> > > >> > > > the
> >> > > >> > > >>>> industry, I propose to make MKLDNN
as default CPU
> >> > > >> > > >>>> backend
> >> in
> >> > > >> > > >>>> all
> >> > > >> > > binary
> >> > > >> > > >>>> distribution from the next release.
> >> > > >> > > >>>> The new milestone includes:
> >> > > >> > > >>>>
> >> > > >> > > >>>> - Static link MKLDNN library in
the binary avoiding
> >> > > >> > > >>>> the mismatch
> >> > > >> > > > version
> >> > > >> > > >>>> in the runtime [5]
> >> > > >> > > >>>> - Make nightly build with MKLDNN
default from master
> >> > > >> > > >>>> pre
> >> 1.7
> >> > > >> release
> >> > > >> > > >>>> - Binary distribution with MKLDNN
default from 1.7
> >> release.
> >> > > >> > > >>>>
> >> > > >> > > >>>> What will be changed:
> >> > > >> > > >>>>
> >> > > >> > > >>>> - mxnet and mxnet-cuXX binary
will be built with
> >> > > >> > > >>>> MKLDNN=1
> >> > > >> > > >>>> - mxnet-mkl and mxnet-cuXXmkl
will be not changed in
> >> > > >> > > >>>> the minor
> >> > > >> > release
> >> > > >> > > >>>> (1.x) and plan to remove in next
major release (2.0)
> >> > > >> > > >>>>
> >> > > >> > > >>>> Suggestions and comments are highly
appreciated.
> >> > > >> > > >>>>
> >> > > >> > > >>>> Thanks,
> >> > > >> > > >>>>
> >> > > >> > > >>>> --Patric
> >> > > >> > > >>>>
> >> > > >> > > >>>>
> >> > > >> > > >>>> [1]
> >> > > >> > > >>>> https://github.com/apache/incubator-mxnet/pull/9677
> >> > > >> > > >>>> [2]
> >> > > >> > > >>>>
> >> > > >> > > >>
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > > >>
> >> > > https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c2
> >> > > 6295
> >> > > >> 9101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
> >> > > >> > > >>>> [3]
> >> > > >> > > >>>> https://github.com/apache/incubator-mxnet/pull/15518
> >> > > >> > > >>>> [4]
> >> > > >> > > >>>>
> >> > > >> > > >>
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > > >>
> >> > > https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9
> >> > > e561
> >> > > >> c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
> >> > > >> > > >>>> [5]
> >> > > >> > > >>>> https://github.com/apache/incubator-mxnet/pull/16731
> >> > > >> > > >>>>
> >> > > >> > > >>
> >> > > >> > > >> —
> >> > > >> > > >> Alfredo Luque
> >> > > >> > > >> Software Engineer
> >> > > >> > > >> Machine Learning Infrastructure Airbnb
San Francisco,
> >> > > >> > > >> CA
> >> > > >> > > >
> >> > > >> > > > —
> >> > > >> > > > Alfredo Luque
> >> > > >> > > > Software Engineer
> >> > > >> > > > Machine Learning Infrastructure Airbnb San
Francisco, CA
> >> > > >> > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > > >> —
> >> > > >> Alfredo Luque
> >> > > >> Software Engineer
> >> > > >> Machine Learning Infrastructure Airbnb San Francisco, CA
> >> > > >>
> >> > > >
> >> >
> >>
> >
Mime
View raw message