mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steffen Rochel <steffenroc...@gmail.com>
Subject Re: Include MKLDNN into default mxnet pip package
Date Mon, 26 Nov 2018 00:01:43 GMT
+1 to make MKL-DNN default.
I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369 as
open issue to be addressed for 1.4.0
I do agree that we should move to a model to include released dependencies
instead of just taking bleeding edge snapshots.
However, speed of development is important as well.
As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN development
team provide us with a well tested tag/commit id to include in 1.4.0
release?
Steffen

On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A <tao.a.lv@intel.com> wrote:

> Thanks for the information, Kellen and Naveen.
>
> Better than onnx-tensorrt, MKL-DNN has already provided versioning and
> release tags. My concern is that as MKL-DNN is still under intensive
> development, if it has a new feature or bug fix on its master branch, do we
> really want to wait for next release to get it supported in MXNet?
>
> Take the LSTM regression as an example, probably MKL-DNN will give a fix
> or improvement on its master branch soon, do we need to wait for 0.18
> release to get it fixed for mxnet user? AFAIK, tensorflow is also using
> normal commit id, not release, as the dependency for MKL-DNN.
>
> Regarding the LSTM regression, we are using internal JIRA tickets rather
> than github issues to track the defects of MKL-DNN. But I agree with you,
> we need update the progress of it in Alex's issue.
>
> Thanks,
> -tao
>
> -----Original Message-----
> From: kellen sunderland [mailto:kellen.sunderland@gmail.com]
> Sent: Thursday, November 22, 2018 10:55 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
>
> Agree with your point about other repos also not being based on versioning
> Tao.  I would point out that I've given some that I've worked with similar
> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
>
> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy <mnnaveen@gmail.com> wrote:
>
> > Tao,
> >
> > You are right there are many submodules in 3rd party. We have to start
> > somewhere and I believe this one is a good candidate to start with.
> > This is not to cater to release of MXNet or to tie them with the
> > releases of the submodules but instead to pick only stable releases
> > and not to pick up bleeding edge commits from the tip of the master,
> > this gives us confidence in the submodule that MXNet users are
> > depending on that especially if we make MKLDNN the default.
> >
> > Good to know it is known already as a regression.Alex has created this
> > issue https://github.com/apache/incubator-mxnet/issues/13369, please
> > add details and link the corresponding issue in MKLDNN(I couldn't find).
> >
> > -Naveen
> >
> > On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A <tao.a.lv@intel.com> wrote:
> >
> > > Here are my answers for the questions from Kellen and Naveen about
> > > MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN
> > > default here.
> > >
> > > @Kellen,
> > >
> > > FYI, here is a list for those platforms which are officially
> > > supported by MKL-DNN.
> > > https://github.com/intel/mkl-dnn#system-requirements
> > >
> > > Most of computation intensive kernels in MKL-DNN are JITed. So they
> > > are supposed to generate code according to the platform during
> > > runtime. For non-JIT code in MKL-DNN, same as other code in MXNet,
> > > it will generate instructions according to the options/flags of
> > > compiler. We can set -DARCH_OPT_FLAGS when build MKL-DNN to avoid
> > > optimization for compiling machine. That's exactly what we are doing
> for MKL-DNN build in MXNet.
> > Even
> > > without MKL-DNN, I noticed there were issues about illegal
> > > instructions
> > of
> > > MXNet when users import the pip package on a lower end machine which
> > > probably only supports SSE.
> > >
> > > @Naveen,
> > >
> > > The LSTM issue has already been identified as a regression from the
> > recent
> > > version of MKL-DNN. Hopefully it will be fixed soon with a new
> > > update of MKL-DNN.
> > >
> > > MXNet has many submodule dependencies under the 3rd party folder.
> > > Seems
> > we
> > > don't require release versions for most of these dependencies. The
> > release
> > > period of MKL-DNN and MXNet are not matched very well. I think it
> > > would
> > be
> > > a risk for MXNet release if it hardly depends on the release of a
> > > submodule, no need to say depends on the releases of all submodules.
> > >
> > > -tao
> > >
> > > -----Original Message-----
> > > From: Naveen Swamy [mailto:mnnaveen@gmail.com]
> > > Sent: Thursday, November 22, 2018 9:08 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Cc: dev@mxnet.apache.org
> > > Subject: Re: Include MKLDNN into default mxnet pip package
> > >
> > > Hi Alex,
> > >
> > > Thanks for promptly running the numbers on AMD and reporting here.
> > >
> > > Can you please update the AMD numbers here for posterity
> > >
> > https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL
> > -DNN+-+Performance+Benchmarking
> > > ?
> > >
> > > are there any outstanding issues when MKLDNN is enabled? from my
> > > offline conversation I am briefly aware performance issues with
> > > LSTM, is there an GitHub issue for it?
> > >
> > > MKLDNN is a submodule dependency, are we pulling the latest commit
> > > or releases  ? If not we should move to releases before we make it a
> > default.
> > > Ideally we should use platform specific distributions (-dev
> > > packages) at least we should rely on well tested releases.
> > >
> > >
> > > Thanks, Naveen
> > >
> > > On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander
> > <alexzai@amazon.com.invalid
> > > >
> > > wrote:
> > >
> > > > AMD benchmarks have been published. We are seeing a x15.8 speedup
> > > > with
> > > > Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a
> > > > smaller network (Mobilenet - batch size 32) the speedup is more
> > > > significant at x38.7. Let's have a vote to see if the PR to have
> > > > MKLDNN enabled by default
> > > > (https://github.com/apache/incubator-mxnet/pull/12591) can be
> > > > merged before 1.4.0 release.
> > > >
> > > > On 10/19/18, 9:17 AM, "Pedro Larroy"
> > > > <pedro.larroy.lists@gmail.com>
> > > > wrote:
> > > >
> > > >     I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen
> > > > 1950X and unit
> > > >     tests are passing.
> > > >
> > > >     Is this build using AVX512?  in /proc/cpuinfo I see only "avx"
> > flag.
> > > >     There's no "avx2" like on recent intel cpus.
> > > >
> > > >     Pedro.
> > > >
> > > >     On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko
> > > > <lupesko@gmail.com>
> > > > wrote:
> > > >
> > > >     > Awesome collaborative effort across many contributors and
> > > companies!
> > > >     >
> > > >     > The boost is impressive and for MXNet users to get this
> > > > boost "out of the
> > > >     > box" is a great benefit and makes MXNet an even better choice.
> > > >     >
> > > >     > Alex - can you clarify whether there are any down sides with
> > > > regards to
> > > >     > noon AVX-512 architectures, AMD CPUs, etc? Will it
> > > > gracefully fallback?
> > > >     >
> > > >     > Hagay
> > > >     >
> > > >     >
> > > >     > On Fri, Oct 19, 2018, 15:46 Sergio Fernández
> > > > <wikier@apache.org>
> > > > wrote:
> > > >     >
> > > >     > > If there is no downside on platforms not supporting AVX512
> > > > instructions,
> > > >     > > then +1
> > > >     > >
> > > >     > >
> > > >     > > On Wed, Oct 17, 2018, 14:10 Alex Zai <azai91@gmail.com>
> wrote:
> > > >     > >
> > > >     > > > Hey all,
> > > >     > > > We have been working hard these past few months to
> > > > integrate
> > > and
> > > >     > > stabilize
> > > >     > > > Intel’s MKLDNN deep learning CPU accelerator into
Mxnet
> > > > and have made
> > > >     > > > incredible progress. On CPUs with AVX512 instructions
> > > > (such as
> > > > c5.18x)
> > > >     > we
> > > >     > > > have seen performance increase up to 12x and on other
> > > > platforms (Macs,
> > > >     > > > AVX2) we seen a speedup of 1.5+. Full list of benchmarks
> > > > can be found
> > > >     > > here
> > > >     > > > (
> > > >     > > >
> > > >     > >
> > > >     >
> > > >
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650
> > 764
> > > >     > > >  and https://github.com/apache/incubator-mxnet/pull/12591
> ).
> > > >     > > >
> > > >     > > > Currently, using this accelerator requires the developer
> > > > to either pip
> > > >     > > > install the mxnet-mkl version of mxnet or to build
it
> > > > themselves from
> > > >     > > > source. Given that we should try to provide the best
> > > > performance "out
> > > >     > of
> > > >     > > > the box” with mxnet we should include this in the
> > > > default
> > > build.
> > > > The
> > > >     > > mkldnn
> > > >     > > > library is included with in the pip package build so
it
> > > > does
> > > not
> > > >     > require
> > > >     > > an
> > > >     > > > external dependency.
> > > >     > > >
> > > >     > > > There were concerns that MKLDNN could cause regressions
> > > > on certain
> > > >     > > > platforms (as it did with the tensorflow version a
while
> > > > back); but we
> > > >     > > > added a env flag (MXNET_MKLDNN_ENABLED) that allows
> > > > users to turn of
> > > >     > this
> > > >     > > > feature during runtime. Please bring up any other
> > > > concerns you may have
> > > >     > > and
> > > >     > > > your thoughts on including this accelerator in the
> > > > default
> > > build.
> > > >     > > >
> > > >     > > > Best,
> > > >     > > > Alex
> > > >     > > >
> > > >     > >
> > > >     >
> > > >
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message