mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lv, Tao A" <tao.a...@intel.com>
Subject RE: Include MKLDNN into default mxnet pip package
Date Wed, 28 Nov 2018 14:04:29 GMT
Hi Hagay, thank you for bringing these questions together. I also summarized my opinions here
for you easy to check.

- Make MKL-DNN default in MXNet pip package
[Tao]: My suggestion is to make MKL-DNN default on the master branch firstly after 1.4.0 releasing
branch is cut off. That will help MKL-DNN backend to be widely used and tested by MXNet users
who are building MXNet from source. It will also help to expose issues of MKL-DNN backend
in the next releasing cycle. We can decide whether to make it default in pip package for 1.5.0
release according to the feedback from the community. For 1.4.0 release, we can still have
MKL-DNN in the mxnet-mkl package.

- What the story is like when there's no AVX instructions present on CPUs. Do we get an illegal
instruction error, or does it fallback gracefully?
[Tao]: MKL-DNN has optimizations for every ISA starting with SSE4.2 and there is a list for
those platforms which are officially supported by MKL-DNN: https://github.com/intel/mkl-dnn#system-requirements.
It should fallback if AVX is not supported. Most of computation intensive kernels in MKL-DNN
are JITed. So they are supposed to generate code according to the platform during runtime
and should not have any illegal instruction. For non-JIT code in MKL-DNN, same as other code
in MXNet, it will generate instructions according to the options/flags of compiler. We can
set -DARCH_OPT_FLAGS when build MKL-DNN to avoid optimization for compiling machine. That's
exactly what we are doing for MKL-DNN build in MXNet. Even without MKL-DNN, I noticed there
were issues about illegal instructions of MXNet when users import the pip package on a lower
end machine which probably only supports SSE.

- Are there any outstanding issues when MKLDNN is enabled?
[Tao]: I don’t know any at this time except the LSTM regression which hopefully will be
fixed soon. I notice the fix has been pushed to MKL-DNN master branch. But if we decide to
depend on release version only, we need wait for the release process of MKL-DNN finishing.
If anyone knows other issues about MKL-DNN backend, feel free to let me know. :)

- MKLDNN is a submodule dependency, are we pulling the latest commit or releases? If not we
should move to releases before we make it a default
[Tao]: I don't have strong resistance to release version. But if you want to make a rule for
MXNet that a submodule should depend on a release version, please take all the submodules
into consideration. For MKL-DNN, my concern is: If the master (development) branch of MXNet
relies on a bleeding edge commit from MKL-DNN master branch, when MXNet comes to release,
we need revert many changes in MXNet if MKL-DNN will not have a new release at that time,
since we need fallback the dependency to a previous release version. That might mess up or
slow down the development and release of MXNet. To avoid that, we always need negotiate with
MKL-DNN team for the release pace before every release. Please propose a solution for this
situation and make a plan how to apply it to all submodules.

- MKLDNN versioning mechanism
[Tao]: Copied MKL-DNN manager’s words here:
"That's valid request and I would expect that as the software matures more and more applications
will rely on stable versions. I would expect that for MXNet there is a stable branch that
would rely on stable MKL-DNN and development branch that would rely on master. 
MKL-DNN relies on semantic versioning. We do maintain a release branches in addition to master
that can be used to release patches. In particular we are planning v0.17.1 this week to deliver
a fix for reorders that you requested. This works in the following way:
* master contains the latest development (typically the next release)
* rls-v0.17 contains v0.17 and will be used to create minor releases (v0.17.1 and so on)"
I’m happy to see that MKL-DNN will have a patch release for the LSTM regression issue.

-tao


-----Original Message-----
From: Hagay Lupesko [mailto:lupesko@gmail.com] 
Sent: Wednesday, November 28, 2018 4:22 PM
To: dev@mxnet.incubator.apache.org
Subject: Re: Include MKLDNN into default mxnet pip package

Hey all,

I'm also supportive of making MKLDNN the default build for MXNet, but there were a few questions
asked in the thread that I am not sure were answered.
Would be great if Alex and others who worked on MKLDNN and that are proposing it to be the
default can answer them clearly:
- What the story is like when there's no AVX instructions present on CPUs.
Do we get an illegal instruction error, or does it fallback gracefully?
(asked by Kellen)
- Are there any outstanding issues when MKLDNN is enabled? (asked by Naveen)
- MKLDNN is a submodule dependency, are we pulling the latest commit or releases? If not we
should move to releases before we make it a default
(Naveen)
  There was a discussion about MKLDNN version used by MXNet, and would be great if it can
be summarized.

Hagay



On Tue, Nov 27, 2018 at 6:21 PM Lv, Tao A <tao.a.lv@intel.com> wrote:

> Hi Anirudh, please find the statements from MKL-DNN manager for the 
> versioning mechanism of MKL-DNN library as below:
>
> "That's valid request and I would expect that as the software matures 
> more and more applications will rely on stable versions. I would 
> expect that for MXNet there is a stable branch that would rely on 
> stable MKL-DNN and development branch that would rely on master.
>
> MKL-DNN relies on semantic versioning. We do maintain a release 
> branches in addition to master that can be used to release patches. In 
> particular we are planning v0.17.1 this week to deliver a fix for 
> reorders that you requested. This works in the following way:
> * master contains the latest development (typically the next release)
> * rls-v0.17 contains v0.17 and will be used to create minor releases
> (v0.17.1 and so on)"
>
> I also restate my initial concern here: If the master (development) 
> branch of MXNet relies on a bleeding edge commit from MKL-DNN master 
> branch, when MXNet comes to release, we need revert many changes in 
> MXNet if MKL-DNN will not have a new release at that time, since we 
> need fallback the dependency to a previous release version. That might 
> mess up or slow down the development and release of MXNet. To avoid 
> that, we always need negotiate with MKL-DNN team for the release pace before every release.
>
> If you have any other questions about MKL-DNN's versioning, feel free 
> to let me know. If you want to change and re-define the dependency 
> behavior of MKL-DNN, please propose a solution for my concern and start a vote for that.
>
> Thanks,
> -tao
>
>
> -----Original Message-----
> From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> Sent: Wednesday, November 28, 2018 8:26 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
>
> Hi Tao,
>
> I was suggesting we can start using a release tag from mkldnn for 
> major and minor releases of mxnet starting with 1.4.0. But this would 
> require a versioning mechanism similar to semver for MKLDNN and  
> MKLDNN to do patch release to backport the bug fixes/regressions. I 
> dont know if this is going to happen anytime soon (It would be nice if 
> you can obtain some timeline from MKLDNN team on this). As long as the 
> PIP still has two different packages for mkl and without mkl my vote is +1 for adding
it as a default.
>
> Anirudh
>
>
> On Tue, Nov 27, 2018 at 5:04 AM Lv, Tao A <tao.a.lv@intel.com> wrote:
>
> > Hi Anirudh,
> >
> > Just to confirm, you're focusing on the 1.4.0 release of MXNet and 
> > want to have a release version of MKL-DNN there, right? Or do you 
> > mean all the development in the future should base on the release 
> > version of
> MKL-DNN?
> > For the former one, I think 0.17 release of MKL-DNN is a good choice.
> > But it will not have fix for the LSTM regression mentioned in 
> > previous
> email.
> >
> > I'm talking about the versioning mechanism with MKL-DNN maintainers 
> > and will be back to you if I get any response. But from the 
> > releasing history of MKL-DNN, I cannot find any evidence about patch release.
> >
> > -tao
> >
> > -----Original Message-----
> > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > Sent: Tuesday, November 27, 2018 6:16 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: Include MKLDNN into default mxnet pip package
> >
> > Hi Tao,
> >
> > I agree with Steffen that we can start with a stable release for 
> > MKLDNN for 1.4.0. For your suggestion on using 0.17, can you provide 
> > info on what versioning mechanism MKLDNN uses. Once a MKLDNN release 
> > is out and there are some regressions found like the LSTM 
> > regression, would it be possible to do a patch release for it or 
> > maintain a release
> branch for it ?
> >
> > Anirudh
> >
> > On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A <tao.a.lv@intel.com> wrote:
> >
> > > Hi Steffen,
> > >
> > > I think all the commits on MKL-DNN master branch are well tested 
> > > for MKL-DNN development team. If we really want to have a release 
> > > commit in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
> > >
> > > Thank you,
> > > Tao
> > >
> > > Sent from my iPhone
> > >
> > > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel 
> > > > <steffenrochel@gmail.com>
> > > wrote:
> > > >
> > > > +1 to make MKL-DNN default.
> > > > I'm tracking
> > > > https://github.com/apache/incubator-mxnet/issues/13369
> > > > as open issue to be addressed for 1.4.0 I do agree that we 
> > > > should move to a model to include released
> > > dependencies
> > > > instead of just taking bleeding edge snapshots.
> > > > However, speed of development is important as well.
> > > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> > > development
> > > > team provide us with a well tested tag/commit id to include in
> > > > 1.4.0 release?
> > > > Steffen
> > > >
> > > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A <tao.a.lv@intel.com>
> > wrote:
> > > >>
> > > >> Thanks for the information, Kellen and Naveen.
> > > >>
> > > >> Better than onnx-tensorrt, MKL-DNN has already provided 
> > > >> versioning and release tags. My concern is that as MKL-DNN is 
> > > >> still under intensive development, if it has a new feature or 
> > > >> bug fix on its master branch,
> > > do we
> > > >> really want to wait for next release to get it supported in MXNet?
> > > >>
> > > >> Take the LSTM regression as an example, probably MKL-DNN will 
> > > >> give a fix or improvement on its master branch soon, do we need 
> > > >> to wait for 0.18 release to get it fixed for mxnet user? AFAIK, 
> > > >> tensorflow is also using normal commit id, not release, as the 
> > > >> dependency for
> > MKL-DNN.
> > > >>
> > > >> Regarding the LSTM regression, we are using internal JIRA 
> > > >> tickets rather than github issues to track the defects of 
> > > >> MKL-DNN. But I agree with
> > > you,
> > > >> we need update the progress of it in Alex's issue.
> > > >>
> > > >> Thanks,
> > > >> -tao
> > > >>
> > > >> -----Original Message-----
> > > >> From: kellen sunderland [mailto:kellen.sunderland@gmail.com]
> > > >> Sent: Thursday, November 22, 2018 10:55 AM
> > > >> To: dev@mxnet.incubator.apache.org
> > > >> Subject: Re: Include MKLDNN into default mxnet pip package
> > > >>
> > > >> Agree with your point about other repos also not being based on
> > > versioning
> > > >> Tao.  I would point out that I've given some that I've worked 
> > > >> with
> > > similar
> > > >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
> > > >>
> > > >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy 
> > > >>> <mnnaveen@gmail.com>
> > > wrote:
> > > >>>
> > > >>> Tao,
> > > >>>
> > > >>> You are right there are many submodules in 3rd party. We have

> > > >>> to start somewhere and I believe this one is a good candidate

> > > >>> to start
> > with.
> > > >>> This is not to cater to release of MXNet or to tie them with 
> > > >>> the releases of the submodules but instead to pick only stable

> > > >>> releases and not to pick up bleeding edge commits from the tip

> > > >>> of the master, this gives us confidence in the submodule that

> > > >>> MXNet users are depending on that especially if we make MKLDNN

> > > >>> the
> default.
> > > >>>
> > > >>> Good to know it is known already as a regression.Alex has 
> > > >>> created this issue 
> > > >>> https://github.com/apache/incubator-mxnet/issues/13369,
> > > >>> please add details and link the corresponding issue in 
> > > >>> MKLDNN(I couldn't
> > > find).
> > > >>>
> > > >>> -Naveen
> > > >>>
> > > >>>> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A 
> > > >>>> <tao.a.lv@intel.com>
> > wrote:
> > > >>>>
> > > >>>> Here are my answers for the questions from Kellen and Naveen

> > > >>>> about MKL-DNN. It doesn't mean that I'm supportive for making

> > > >>>> MKL-DNN default here.
> > > >>>>
> > > >>>> @Kellen,
> > > >>>>
> > > >>>> FYI, here is a list for those platforms which are officially

> > > >>>> supported by MKL-DNN.
> > > >>>> https://github.com/intel/mkl-dnn#system-requirements
> > > >>>>
> > > >>>> Most of computation intensive kernels in MKL-DNN are JITed.

> > > >>>> So they are supposed to generate code according to the 
> > > >>>> platform during runtime. For non-JIT code in MKL-DNN, same
as 
> > > >>>> other code in MXNet, it will generate instructions according

> > > >>>> to the options/flags of compiler. We can set -DARCH_OPT_FLAGS

> > > >>>> when build MKL-DNN to avoid optimization for compiling machine.
> > > >>>> That's exactly what we are doing
> > > >> for MKL-DNN build in MXNet.
> > > >>> Even
> > > >>>> without MKL-DNN, I noticed there were issues about illegal

> > > >>>> instructions
> > > >>> of
> > > >>>> MXNet when users import the pip package on a lower end 
> > > >>>> machine which probably only supports SSE.
> > > >>>>
> > > >>>> @Naveen,
> > > >>>>
> > > >>>> The LSTM issue has already been identified as a regression

> > > >>>> from the
> > > >>> recent
> > > >>>> version of MKL-DNN. Hopefully it will be fixed soon with a

> > > >>>> new update of MKL-DNN.
> > > >>>>
> > > >>>> MXNet has many submodule dependencies under the 3rd party
folder.
> > > >>>> Seems
> > > >>> we
> > > >>>> don't require release versions for most of these dependencies.
> > > >>>> The
> > > >>> release
> > > >>>> period of MKL-DNN and MXNet are not matched very well. I 
> > > >>>> think it would
> > > >>> be
> > > >>>> a risk for MXNet release if it hardly depends on the release

> > > >>>> of a submodule, no need to say depends on the releases of
all
> submodules.
> > > >>>>
> > > >>>> -tao
> > > >>>>
> > > >>>> -----Original Message-----
> > > >>>> From: Naveen Swamy [mailto:mnnaveen@gmail.com]
> > > >>>> Sent: Thursday, November 22, 2018 9:08 AM
> > > >>>> To: dev@mxnet.incubator.apache.org
> > > >>>> Cc: dev@mxnet.apache.org
> > > >>>> Subject: Re: Include MKLDNN into default mxnet pip package
> > > >>>>
> > > >>>> Hi Alex,
> > > >>>>
> > > >>>> Thanks for promptly running the numbers on AMD and reporting
here.
> > > >>>>
> > > >>>> Can you please update the AMD numbers here for posterity
> > > >>>>
> > > >>> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+I
> > > >>> nt
> > > >>> el
> > > >>> +MKL
> > > >>> -DNN+-+Performance+Benchmarking
> > > >>>> ?
> > > >>>>
> > > >>>> are there any outstanding issues when MKLDNN is enabled? from

> > > >>>> my offline conversation I am briefly aware performance issues

> > > >>>> with LSTM, is there an GitHub issue for it?
> > > >>>>
> > > >>>> MKLDNN is a submodule dependency, are we pulling the latest

> > > >>>> commit or releases  ? If not we should move to releases 
> > > >>>> before we make it a
> > > >>> default.
> > > >>>> Ideally we should use platform specific distributions (-dev
> > > >>>> packages) at least we should rely on well tested releases.
> > > >>>>
> > > >>>>
> > > >>>> Thanks, Naveen
> > > >>>>
> > > >>>> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander
> > > >>> <alexzai@amazon.com.invalid
> > > >>>>>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> AMD benchmarks have been published. We are seeing a x15.8

> > > >>>>> speedup with
> > > >>>>> Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine.
> > > >>>>> With a smaller network (Mobilenet - batch size 32) the

> > > >>>>> speedup is more significant at x38.7. Let's have a vote
to 
> > > >>>>> see if the PR to have MKLDNN enabled by default
> > > >>>>> (https://github.com/apache/incubator-mxnet/pull/12591)
can 
> > > >>>>> be merged before 1.4.0 release.
> > > >>>>>
> > > >>>>> On 10/19/18, 9:17 AM, "Pedro Larroy"
> > > >>>>> <pedro.larroy.lists@gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>    I did  pip install mxnet-mkl==1.3.1b20181018 on an
AMD 
> > > >>>>> Ryzen 1950X and unit
> > > >>>>>    tests are passing.
> > > >>>>>
> > > >>>>>    Is this build using AVX512?  in /proc/cpuinfo I see
only "avx"
> > > >>> flag.
> > > >>>>>    There's no "avx2" like on recent intel cpus.
> > > >>>>>
> > > >>>>>    Pedro.
> > > >>>>>
> > > >>>>>    On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko 
> > > >>>>> <lupesko@gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Awesome collaborative effort across many contributors
and
> > > >>>> companies!
> > > >>>>>>
> > > >>>>>> The boost is impressive and for MXNet users to get
this
> > > >>>>> boost "out of the
> > > >>>>>> box" is a great benefit and makes MXNet an even better
choice.
> > > >>>>>>
> > > >>>>>> Alex - can you clarify whether there are any down
sides 
> > > >>>>>> with
> > > >>>>> regards to
> > > >>>>>> noon AVX-512 architectures, AMD CPUs, etc? Will it
> > > >>>>> gracefully fallback?
> > > >>>>>>
> > > >>>>>> Hagay
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Fri, Oct 19, 2018, 15:46 Sergio Fernández
> > > >>>>> <wikier@apache.org>
> > > >>>>> wrote:
> > > >>>>>>
> > > >>>>>>> If there is no downside on platforms not supporting
AVX512
> > > >>>>> instructions,
> > > >>>>>>> then +1
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Wed, Oct 17, 2018, 14:10 Alex Zai <azai91@gmail.com>
> > > >> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hey all,
> > > >>>>>>>> We have been working hard these past few months
to
> > > >>>>> integrate
> > > >>>> and
> > > >>>>>>> stabilize
> > > >>>>>>>> Intel’s MKLDNN deep learning CPU accelerator
into Mxnet
> > > >>>>> and have made
> > > >>>>>>>> incredible progress. On CPUs with AVX512 instructions
> > > >>>>> (such as
> > > >>>>> c5.18x)
> > > >>>>>> we
> > > >>>>>>>> have seen performance increase up to 12x and
on other
> > > >>>>> platforms (Macs,
> > > >>>>>>>> AVX2) we seen a speedup of 1.5+. Full list
of benchmarks
> > > >>>>> can be found
> > > >>>>>>> here
> > > >>>>>>>> (
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>> https://cwiki.apache.org/confluence/pages/viewpage.action?page
> > > >>> Id
> > > >>> =9
> > > >>> 5650
> > > >>> 764
> > > >>>>>>>> and https://github.com/apache/incubator-mxnet/pull/12591
> > > >> ).
> > > >>>>>>>>
> > > >>>>>>>> Currently, using this accelerator requires
the developer
> > > >>>>> to either pip
> > > >>>>>>>> install the mxnet-mkl version of mxnet or
to build it
> > > >>>>> themselves from
> > > >>>>>>>> source. Given that we should try to provide
the best
> > > >>>>> performance "out
> > > >>>>>> of
> > > >>>>>>>> the box” with mxnet we should include this
in the
> > > >>>>> default
> > > >>>> build.
> > > >>>>> The
> > > >>>>>>> mkldnn
> > > >>>>>>>> library is included with in the pip package
build so it
> > > >>>>> does
> > > >>>> not
> > > >>>>>> require
> > > >>>>>>> an
> > > >>>>>>>> external dependency.
> > > >>>>>>>>
> > > >>>>>>>> There were concerns that MKLDNN could cause
regressions
> > > >>>>> on certain
> > > >>>>>>>> platforms (as it did with the tensorflow version
a while
> > > >>>>> back); but we
> > > >>>>>>>> added a env flag (MXNET_MKLDNN_ENABLED) that
allows
> > > >>>>> users to turn of
> > > >>>>>> this
> > > >>>>>>>> feature during runtime. Please bring up any
other
> > > >>>>> concerns you may have
> > > >>>>>>> and
> > > >>>>>>>> your thoughts on including this accelerator
in the
> > > >>>>> default
> > > >>>> build.
> > > >>>>>>>>
> > > >>>>>>>> Best,
> > > >>>>>>>> Alex
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
>
Mime
View raw message