mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Olivier <cjolivie...@gmail.com>
Subject Re: Proposal to make MKLDNN as default CPU backend
Date Tue, 19 Nov 2019 03:51:12 GMT
(for non mkl dropout, for instance)

On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier <cjolivier01@gmail.com> wrote:

> To address the deterministic item, I know for a fact that training will
> not be deterministic in some cases where the “parallel random” class is
> utilized in parallel threads, such as OMP, if the number of cores is
> different, even with the same seed, because threads are seeded
> independently and different number of threads will end up generating
> different random number sequences. Dropout operator being an example.
>
> On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
> <alfredo.luque@airbnb.com.invalid> wrote:
>
>> For AMD CPUs, you’d want to perform validation because now MKL-DNN would
>> be
>> enabled by default. Historically, other intel libraries (along with the
>> ICC
>> compiler) have had performance issues on AMD CPUs. It’s just worth double
>> checking to make sure that’s not the case here. Perhaps some MKL-DNN
>> authors can chime in though. It’s not sufficient to double check that an
>> AVX2 package passes tests.
>>
>> Agreed in the case we’re not releasing ARM binaries.
>>
>> The reproducibility argument is around the results being numerically
>> reproducible. That is, eg; if I train a model with some fixed set of data,
>> some random seed, etc. and then run inference on it do I get the exact
>> same
>> floating point values for the weights and results? Does MxNet already
>> offer
>> this without MKL-DNN?
>>
>> On November 18, 2019 at 6:32:07 PM, Tao Lv (mutouorz@gmail.com) wrote:
>>
>> Regarding the cases listed by Marco:
>> - AMD CPU
>> From my architecture knowledge, what works on C4 instances (with AVX2
>> support) should also work well on m5a, right? I think mxnet-mkl and
>> mxnet-cuxxmkl packages have been fully validated on AVX2 machines.
>> Also, we didn't perform any validation on AMD CPU before, why we need do
>> that for this time?
>>
>> - ARM CPU
>> I don't know we're releasing any convenience binaries for ARM CPU. This
>> proposal mainly targets those pypi packages.
>>
>> - Windows
>> Already validated by CI. We're also releasing mxnet-mkl packages for Win.
>>
>> - GPU and MKLDNN enabled
>> Already validated by CI and mxnet-cuxxmkl packages have been released for
>> several versions.
>>
>> - Fully reproducible results (medical and financial sector requested that
>> and we have some flags for cuda)
>> Not sure I understand this case. We already have MKL-DNN backend for a
>> while. Functionality and correctness of it have been verified by MXNet
>> users.
>>
>> -tao
>>
>> On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu <marco.g.abreu@gmail.com>
>> wrote:
>>
>> > Sorry, my intent with the "non-standard" phrase was not about general
>> MXNet
>> > but rather from MKLDNNs point of view, considering that it's being
>> > developed by Intel, I assumed that MKLDNN might consider non-intel
>> > use-cases non standard.
>> >
>> > -Marco
>> >
>> > Skalicky, Sam <sskalic@amazon.com.invalid> schrieb am Mo., 18. Nov.
>> 2019,
>> > 21:34:
>> >
>> > > Thanks Alfredo, if you can create a GitHub issue with notes/steps we
>> can
>> > > add this to the todo list for integrating with the MXNet CI to test on
>> > m5a
>> > > instances too. Then we can start tracking this on a regular basis. It
>> > would
>> > > be great to actually test on ARM instances now that AWS has A1
>> instances
>> > > too…..ill add it to the wish list ;-D
>> > >
>> > > Sam
>> > >
>> > > > On Nov 18, 2019, at 12:32 PM, Alfredo Luque <
>> alfredo.luque@airbnb.com
>> > .INVALID>
>> > > wrote:
>> > > >
>> > > > Happy to run some benchmarks on an AWS m5a instance (Epyc) and first
>> > > > generation AMD Threadripper Gen 1 if someone has something easy to
>> run
>> > > and
>> > > > representative.
>> > > >
>> > > > On November 18, 2019 at 12:29:31 PM, Skalicky, Sam (
>> > > > sskalic@amazon.com.invalid) wrote:
>> > > >
>> > > > Thanks a good idea Alfredo, are you able to help test on AMD CPUs?
>> Or
>> > is
>> > > > there someone else in the mxnet dev@ community who can help?
>> > > >
>> > > > Sam
>> > > >
>> > > >> On Nov 18, 2019, at 12:27 PM, Alfredo Luque
>> > > > <alfredo.luque@airbnb.com.INVALID> wrote:
>> > > >>
>> > > >> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen
/
>> Epyc)
>> > > > would
>> > > >> definitely make sense as a requirement. It seems odd to classify
>> that
>> > as
>> > > > a
>> > > >> “nonstandard” use case.
>> > > >>
>> > > >> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
>> > > >> sskalic@amazon.com.invalid) wrote:
>> > > >>
>> > > >> Thanks Patric & team for your work over the years to make
MXNet
>> fast
>> > > with
>> > > >> MKLDNN!
>> > > >>
>> > > >> I think it would be great to make MKLDNN enabled by default. We
>> will
>> > > need
>> > > >> to continue producing variants without MKLDNN for those who don’t
>> want
>> > > it
>> > > >> (Marco enumerated some use cases). How do you propose to identify
>> the
>> > > pip
>> > > >> wheels with/without MKLDNN? Previously we had: mxnet-mkl and
>> > > > mxnet-cu101mkl
>> > > >> with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN
>> what
>> > do
>> > > > you
>> > > >> propose we call the build without MKLDNN? mxnet-nomkl?
>> > > >>
>> > > >> Thanks!
>> > > >> Sam
>> > > >>
>> > > >>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu <
>> > marco.g.abreu@gmail.com>
>> > > >> wrote:
>> > > >>>
>> > > >>> Hi Patric,
>> > > >>>
>> > > >>> First of all, thanks a lot to you and your team for all the
effort
>> on
>> > > >> MXNet
>> > > >>> and mkldnn!
>> > > >>>
>> > > >>> Generally I'm inclined towards your proposal, but I'm thinking
>> about
>> > > the
>> > > >>> non-standard use cases:
>> > > >>> - AMD CPU
>> > > >>> - ARM CPU
>> > > >>> - Windows
>> > > >>> - GPU and MKLDNN enabled
>> > > >>> - Fully reproducible results (medical and financial sector
>> requested
>> > > > that
>> > > >>> and we have some flags for cuda)
>> > > >>>
>> > > >>> Is mkldnn fully compatible with these use cases? If not, what
>> would
>> > > >> happen?
>> > > >>> If yes, do we have performance numbers?
>> > > >>>
>> > > >>> Best regards,
>> > > >>> Marco
>> > > >>>
>> > > >>> Zhao, Patric <patric.zhao@intel.com> schrieb am Mo.,
18. Nov.
>> 2019,
>> > > >> 14:00:
>> > > >>>
>> > > >>>> Hi MXNet community,
>> > > >>>>
>> > > >>>> From the first MKLDNN backend integrated in release 1.2,
the
>> > community
>> > > >> is
>> > > >>>> continuously improving the quality and performance of
MKLDNN CPU
>> > > >> backend.
>> > > >>>> Nowadays, the MKLDNN backend is widely used for the inference,
>> > > >> especially
>> > > >>>> for INT8 inference, and we got lots of very positive feedbacks
>> from
>> > > >> MXNet
>> > > >>>> users.
>> > > >>>>
>> > > >>>> Achieved milestones as below:
>> > > >>>>
>> > > >>>> - MKLDNN integrated into Apache MXNet from release 1.2,
Feb, 2018
>> > [1]
>> > > >>>> - MKLDNN backend as default CPU backend from source building,
>> Jan,
>> > > 2019
>> > > >> [2]
>> > > >>>> - MKLDNN subgraph optimization as default for the inference,
Jul,
>> > 2019
>> > > >> [3]
>> > > >>>> - MKLDNN major version upgrade in release 1.6, Oct, 2019
[4]
>> > > >>>>
>> > > >>>> To make more successful and technical leadership for Apache
MXNet
>> in
>> > > > the
>> > > >>>> industry, I propose to make MKLDNN as default CPU backend
in all
>> > > binary
>> > > >>>> distribution from the next release.
>> > > >>>> The new milestone includes:
>> > > >>>>
>> > > >>>> - Static link MKLDNN library in the binary avoiding the
mismatch
>> > > > version
>> > > >>>> in the runtime [5]
>> > > >>>> - Make nightly build with MKLDNN default from master pre
1.7
>> release
>> > > >>>> - Binary distribution with MKLDNN default from 1.7 release.
>> > > >>>>
>> > > >>>> What will be changed:
>> > > >>>>
>> > > >>>> - mxnet and mxnet-cuXX binary will be built with MKLDNN=1
>> > > >>>> - mxnet-mkl and mxnet-cuXXmkl will be not changed in the
minor
>> > release
>> > > >>>> (1.x) and plan to remove in next major release (2.0)
>> > > >>>>
>> > > >>>> Suggestions and comments are highly appreciated.
>> > > >>>>
>> > > >>>> Thanks,
>> > > >>>>
>> > > >>>> --Patric
>> > > >>>>
>> > > >>>>
>> > > >>>> [1] https://github.com/apache/incubator-mxnet/pull/9677
>> > > >>>> [2]
>> > > >>>>
>> > > >>
>> > > >
>> > >
>> >
>>
>> https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c262959101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
>> > > >>>> [3] https://github.com/apache/incubator-mxnet/pull/15518
>> > > >>>> [4]
>> > > >>>>
>> > > >>
>> > > >
>> > >
>> >
>>
>> https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
>> > > >>>> [5] https://github.com/apache/incubator-mxnet/pull/16731
>> > > >>>>
>> > > >>
>> > > >> —
>> > > >> Alfredo Luque
>> > > >> Software Engineer
>> > > >> Machine Learning Infrastructure
>> > > >> Airbnb
>> > > >> San Francisco, CA
>> > > >
>> > > > —
>> > > > Alfredo Luque
>> > > > Software Engineer
>> > > > Machine Learning Infrastructure
>> > > > Airbnb
>> > > > San Francisco, CA
>> > >
>> > >
>> >
>>
>> —
>> Alfredo Luque
>> Software Engineer
>> Machine Learning Infrastructure
>> Airbnb
>> San Francisco, CA
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message