mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lv, Tao A" <tao.a...@intel.com>
Subject RE: Include MKLDNN into default mxnet pip package
Date Tue, 27 Nov 2018 13:03:50 GMT
Hi Anirudh,

Just to confirm, you're focusing on the 1.4.0 release of MXNet and want to have a release
version of MKL-DNN there, right? Or do you mean all the development in the future should base
on the release version of MKL-DNN? For the former one, I think 0.17 release of MKL-DNN is
a good choice. But it will not have fix for the LSTM regression mentioned in previous email.

I'm talking about the versioning mechanism with MKL-DNN maintainers and will be back to you
if I get any response. But from the releasing history of MKL-DNN, I cannot find any evidence
about patch release.

-tao

-----Original Message-----
From: Anirudh Subramanian [mailto:anirudh2290@gmail.com] 
Sent: Tuesday, November 27, 2018 6:16 AM
To: dev@mxnet.incubator.apache.org
Subject: Re: Include MKLDNN into default mxnet pip package

Hi Tao,

I agree with Steffen that we can start with a stable release for MKLDNN for 1.4.0. For your
suggestion on using 0.17, can you provide info on what versioning mechanism MKLDNN uses. Once
a MKLDNN release is out and there are some regressions found like the LSTM regression, would
it be possible to do a patch release for it or maintain a release branch for it ?

Anirudh

On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A <tao.a.lv@intel.com> wrote:

> Hi Steffen,
>
> I think all the commits on MKL-DNN master branch are well tested for 
> MKL-DNN development team. If we really want to have a release commit 
> in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
>
> Thank you,
> Tao
>
> Sent from my iPhone
>
> > On Nov 26, 2018, at 8:09 AM, Steffen Rochel 
> > <steffenrochel@gmail.com>
> wrote:
> >
> > +1 to make MKL-DNN default.
> > I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369 
> > as open issue to be addressed for 1.4.0 I do agree that we should 
> > move to a model to include released
> dependencies
> > instead of just taking bleeding edge snapshots.
> > However, speed of development is important as well.
> > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> development
> > team provide us with a well tested tag/commit id to include in 1.4.0 
> > release?
> > Steffen
> >
> >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A <tao.a.lv@intel.com> wrote:
> >>
> >> Thanks for the information, Kellen and Naveen.
> >>
> >> Better than onnx-tensorrt, MKL-DNN has already provided versioning 
> >> and release tags. My concern is that as MKL-DNN is still under 
> >> intensive development, if it has a new feature or bug fix on its 
> >> master branch,
> do we
> >> really want to wait for next release to get it supported in MXNet?
> >>
> >> Take the LSTM regression as an example, probably MKL-DNN will give 
> >> a fix or improvement on its master branch soon, do we need to wait 
> >> for 0.18 release to get it fixed for mxnet user? AFAIK, tensorflow 
> >> is also using normal commit id, not release, as the dependency for MKL-DNN.
> >>
> >> Regarding the LSTM regression, we are using internal JIRA tickets 
> >> rather than github issues to track the defects of MKL-DNN. But I 
> >> agree with
> you,
> >> we need update the progress of it in Alex's issue.
> >>
> >> Thanks,
> >> -tao
> >>
> >> -----Original Message-----
> >> From: kellen sunderland [mailto:kellen.sunderland@gmail.com]
> >> Sent: Thursday, November 22, 2018 10:55 AM
> >> To: dev@mxnet.incubator.apache.org
> >> Subject: Re: Include MKLDNN into default mxnet pip package
> >>
> >> Agree with your point about other repos also not being based on
> versioning
> >> Tao.  I would point out that I've given some that I've worked with
> similar
> >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
> >>
> >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy <mnnaveen@gmail.com>
> wrote:
> >>>
> >>> Tao,
> >>>
> >>> You are right there are many submodules in 3rd party. We have to 
> >>> start somewhere and I believe this one is a good candidate to start with.
> >>> This is not to cater to release of MXNet or to tie them with the 
> >>> releases of the submodules but instead to pick only stable 
> >>> releases and not to pick up bleeding edge commits from the tip of 
> >>> the master, this gives us confidence in the submodule that MXNet 
> >>> users are depending on that especially if we make MKLDNN the default.
> >>>
> >>> Good to know it is known already as a regression.Alex has created 
> >>> this issue https://github.com/apache/incubator-mxnet/issues/13369, 
> >>> please add details and link the corresponding issue in MKLDNN(I 
> >>> couldn't
> find).
> >>>
> >>> -Naveen
> >>>
> >>>> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A <tao.a.lv@intel.com>
wrote:
> >>>>
> >>>> Here are my answers for the questions from Kellen and Naveen 
> >>>> about MKL-DNN. It doesn't mean that I'm supportive for making 
> >>>> MKL-DNN default here.
> >>>>
> >>>> @Kellen,
> >>>>
> >>>> FYI, here is a list for those platforms which are officially 
> >>>> supported by MKL-DNN.
> >>>> https://github.com/intel/mkl-dnn#system-requirements
> >>>>
> >>>> Most of computation intensive kernels in MKL-DNN are JITed. So 
> >>>> they are supposed to generate code according to the platform 
> >>>> during runtime. For non-JIT code in MKL-DNN, same as other code 
> >>>> in MXNet, it will generate instructions according to the 
> >>>> options/flags of compiler. We can set -DARCH_OPT_FLAGS when build 
> >>>> MKL-DNN to avoid optimization for compiling machine. That's 
> >>>> exactly what we are doing
> >> for MKL-DNN build in MXNet.
> >>> Even
> >>>> without MKL-DNN, I noticed there were issues about illegal 
> >>>> instructions
> >>> of
> >>>> MXNet when users import the pip package on a lower end machine 
> >>>> which probably only supports SSE.
> >>>>
> >>>> @Naveen,
> >>>>
> >>>> The LSTM issue has already been identified as a regression from 
> >>>> the
> >>> recent
> >>>> version of MKL-DNN. Hopefully it will be fixed soon with a new 
> >>>> update of MKL-DNN.
> >>>>
> >>>> MXNet has many submodule dependencies under the 3rd party folder.
> >>>> Seems
> >>> we
> >>>> don't require release versions for most of these dependencies. 
> >>>> The
> >>> release
> >>>> period of MKL-DNN and MXNet are not matched very well. I think it 
> >>>> would
> >>> be
> >>>> a risk for MXNet release if it hardly depends on the release of a 
> >>>> submodule, no need to say depends on the releases of all submodules.
> >>>>
> >>>> -tao
> >>>>
> >>>> -----Original Message-----
> >>>> From: Naveen Swamy [mailto:mnnaveen@gmail.com]
> >>>> Sent: Thursday, November 22, 2018 9:08 AM
> >>>> To: dev@mxnet.incubator.apache.org
> >>>> Cc: dev@mxnet.apache.org
> >>>> Subject: Re: Include MKLDNN into default mxnet pip package
> >>>>
> >>>> Hi Alex,
> >>>>
> >>>> Thanks for promptly running the numbers on AMD and reporting here.
> >>>>
> >>>> Can you please update the AMD numbers here for posterity
> >>>>
> >>> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel
> >>> +MKL
> >>> -DNN+-+Performance+Benchmarking
> >>>> ?
> >>>>
> >>>> are there any outstanding issues when MKLDNN is enabled? from my 
> >>>> offline conversation I am briefly aware performance issues with 
> >>>> LSTM, is there an GitHub issue for it?
> >>>>
> >>>> MKLDNN is a submodule dependency, are we pulling the latest 
> >>>> commit or releases  ? If not we should move to releases before we 
> >>>> make it a
> >>> default.
> >>>> Ideally we should use platform specific distributions (-dev
> >>>> packages) at least we should rely on well tested releases.
> >>>>
> >>>>
> >>>> Thanks, Naveen
> >>>>
> >>>> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander
> >>> <alexzai@amazon.com.invalid
> >>>>>
> >>>> wrote:
> >>>>
> >>>>> AMD benchmarks have been published. We are seeing a x15.8 
> >>>>> speedup with
> >>>>> Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With

> >>>>> a smaller network (Mobilenet - batch size 32) the speedup is 
> >>>>> more significant at x38.7. Let's have a vote to see if the PR to

> >>>>> have MKLDNN enabled by default
> >>>>> (https://github.com/apache/incubator-mxnet/pull/12591) can be 
> >>>>> merged before 1.4.0 release.
> >>>>>
> >>>>> On 10/19/18, 9:17 AM, "Pedro Larroy"
> >>>>> <pedro.larroy.lists@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>    I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen

> >>>>> 1950X and unit
> >>>>>    tests are passing.
> >>>>>
> >>>>>    Is this build using AVX512?  in /proc/cpuinfo I see only "avx"
> >>> flag.
> >>>>>    There's no "avx2" like on recent intel cpus.
> >>>>>
> >>>>>    Pedro.
> >>>>>
> >>>>>    On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko 
> >>>>> <lupesko@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Awesome collaborative effort across many contributors and
> >>>> companies!
> >>>>>>
> >>>>>> The boost is impressive and for MXNet users to get this
> >>>>> boost "out of the
> >>>>>> box" is a great benefit and makes MXNet an even better choice.
> >>>>>>
> >>>>>> Alex - can you clarify whether there are any down sides with
> >>>>> regards to
> >>>>>> noon AVX-512 architectures, AMD CPUs, etc? Will it
> >>>>> gracefully fallback?
> >>>>>>
> >>>>>> Hagay
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Oct 19, 2018, 15:46 Sergio Fernández
> >>>>> <wikier@apache.org>
> >>>>> wrote:
> >>>>>>
> >>>>>>> If there is no downside on platforms not supporting AVX512
> >>>>> instructions,
> >>>>>>> then +1
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Oct 17, 2018, 14:10 Alex Zai <azai91@gmail.com>
> >> wrote:
> >>>>>>>
> >>>>>>>> Hey all,
> >>>>>>>> We have been working hard these past few months to
> >>>>> integrate
> >>>> and
> >>>>>>> stabilize
> >>>>>>>> Intel’s MKLDNN deep learning CPU accelerator into
Mxnet
> >>>>> and have made
> >>>>>>>> incredible progress. On CPUs with AVX512 instructions
> >>>>> (such as
> >>>>> c5.18x)
> >>>>>> we
> >>>>>>>> have seen performance increase up to 12x and on other
> >>>>> platforms (Macs,
> >>>>>>>> AVX2) we seen a speedup of 1.5+. Full list of benchmarks
> >>>>> can be found
> >>>>>>> here
> >>>>>>>> (
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=9
> >>> 5650
> >>> 764
> >>>>>>>> and https://github.com/apache/incubator-mxnet/pull/12591
> >> ).
> >>>>>>>>
> >>>>>>>> Currently, using this accelerator requires the developer
> >>>>> to either pip
> >>>>>>>> install the mxnet-mkl version of mxnet or to build it
> >>>>> themselves from
> >>>>>>>> source. Given that we should try to provide the best
> >>>>> performance "out
> >>>>>> of
> >>>>>>>> the box” with mxnet we should include this in the
> >>>>> default
> >>>> build.
> >>>>> The
> >>>>>>> mkldnn
> >>>>>>>> library is included with in the pip package build so
it
> >>>>> does
> >>>> not
> >>>>>> require
> >>>>>>> an
> >>>>>>>> external dependency.
> >>>>>>>>
> >>>>>>>> There were concerns that MKLDNN could cause regressions
> >>>>> on certain
> >>>>>>>> platforms (as it did with the tensorflow version a while
> >>>>> back); but we
> >>>>>>>> added a env flag (MXNET_MKLDNN_ENABLED) that allows
> >>>>> users to turn of
> >>>>>> this
> >>>>>>>> feature during runtime. Please bring up any other
> >>>>> concerns you may have
> >>>>>>> and
> >>>>>>>> your thoughts on including this accelerator in the
> >>>>> default
> >>>> build.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Alex
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
>
Mime
View raw message