mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Junru Shao <junrushao1...@gmail.com>
Subject Re: [MXNET 2.0 Wishlist] [DISCUSS] Refine the InferStorageType and memory planning pass
Date Wed, 10 Apr 2019 04:19:02 GMT
+1 for this proposal. Probably this is doable prior to 2.0?

While I totally agree with Tianqi that in the sense of a compiler, we
should make layout transformation a separate pass, I would love to mention
that it will be non-trivial engineering effort given the fact that our
current NNVM does not have a pass manager for optionally applying passes.
Moreover, I believe Tao's proposal is somehow equivalent to adding a new
pass in NNVM (but one with the same name).

By the way, making MKLDNN as an accelerator is a nice proposal, which I
guess could be a wish for MXNet 2.0.

On Tue, Apr 9, 2019 at 8:39 PM Zhao, Patric <patric.zhao@intel.com> wrote:

> BTW,  "maintainability, testability and readability"  is always our design
> goal from starting point of MKL-DNN integration :)
>
> > -----Original Message-----
> > From: Lv, Tao A [mailto:tao.a.lv@intel.com]
> > Sent: Wednesday, April 10, 2019 11:03 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: RE: [MXNET 2.0 Wishlist] [DISCUSS] Refine the InferStorageType
> and
> > memory planning pass
> >
> >
> > Thank you Tianqi and Sam for the kind suggestions.
> >
> > @Tianqi,
> >
> > Can you please point me to the code of this pass or do you think anyone
> > from TVM community can help to educate me on this? I'm very happy to
> > learn from that.
> >
> > Just one note, we are not only doing layout transformation but also want
> to
> > have more memory for layout transformation.
> > For example, (N=32, C=3, H=256, W=256) will be padded to (N=32, C=16,
> > H=256, W=256) on channel dimension then convert (N=32, C=16, H=256,
> > W=256) to nchw16c so we can leverage corresponding optimal computation
> > kernels.
> > That's why we also need changes to the memory planning pass.
> >
> >
> > @Sam,
> >
> > Yes, definitely we're treating MKL-DNN as an accelerator on CPU.
> Previously
> > we used it to accelerate certain critical operators in MXNet in certain
> > situations, eg. FP32 convolution/deconvolution/fullyConnected, etc. But
> > along with the evolving of both MXNet and MKL-DNN, we started to do more
> > which might not supported by MXNet in original CPU implementation, such
> > as quantization and graph fusion. So MKL-DNN backend is also changing
> from
> > a simple `accelerator` to a `default` backend on CPU. And I totally
> agree with
> > you that we need think more about the software architecture for
> > maintainability, testability and readability - that's why I sent out
> this proposal
> > to get more ideas from the community.
> >
> >
> > -tao
> >
> > -----Original Message-----
> > From: Skalicky, Sam [mailto:sskalic@amazon.com.INVALID]
> > Sent: Wednesday, April 10, 2019 2:24 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: [MXNET 2.0 Wishlist] [DISCUSS] Refine the InferStorageType
> and
> > memory planning pass
> >
> > I agree with Tianqi. We should let MKLDNN partitipate in memory planning
> > by first having a separate NNVM pass and then using that info in the
> regular
> > memory planning phase.
> >
> > Its starting to sound like MKLDNN should be treated like an accelerator
> rather
> > than an operator library. As it has explicit needs and can provide
> acceleration
> > when given extra capabilities in MXNet like having input to the memory
> > planning NNVM pass. It also has special tensor formatting needs and
> > conversions that could be best architected in another way than they
> > currently are.
> >
> > We need to think about how we want to architect this for maintainability,
> > testability, and readability.
> >
> > Sam
> >
> >
> > > On Apr 9, 2019, at 11:11 AM, Tianqi Chen <tqchen@cs.washington.edu>
> > wrote:
> > >
> > > The layout transformation should really be a separate optimization
> > > pass rather than memory planning. As is done in the TVM stack. If we
> > > want to do a clean slate solution, I would recommend looking into that
> > instead.
> > >
> > > TIanqi
> > >
> > > On Tue, Apr 9, 2019 at 1:46 AM Lv, Tao A <tao.a.lv@intel.com> wrote:
> > >
> > >>
> > >>
> > >> Hi dev,
> > >>
> > >>
> > >>
> > >> As we're discussing the roadmap for MXNet 2.0, I would like to start
> > >> a thread about refining the InferStorageType and memory planning pass
> > >> in MXNet and hope it can happen as a part of the 2.0 release.
> > >>
> > >>
> > >>
> > >> Thanks to @eric-haibin-lin, part of the proposal has already been
> > >> discussed in issue #13598 [1].
> > >>
> > >>
> > >>
> > >> As mentioned in the description of issue #13598, there are several
> > >> drawbacks of the existing flow. Please allow me to quote them here:
> > >> *        the selection of MKL/CPU/GPU/CUDNN implementation happens
> > after
> > >> graph attribute inference and memory planning, memory planning is
> > >> thus not aware of the implementation that will be used for execution
> > >> in the future, which may result in sub-optimal result. For example,
> > >> the memory inplace option may vary depending on the accelerator
> > >> backend (the new version of CUDNN enables x/dx inplace for
> > _backward_conv).
> > >> *        some sparse operator need to access dtype/shape information
> to
> > >> decide which implementation to invoke for execution, and whether to
> > >> perform fallback. This information is not yet exposed in the existing
> > >> infer storage type interface.
> > >>
> > >>
> > >>
> > >> Besides, the existing memory planning pass calculates and afterwards
> > >> allocates memory strictly according to the input/output tensor shapes
> > >> (which can be got from operators' arithmetic formulas through
> > InferShape).
> > >> That's not true anymore when we come to accelerators like MKL-DNN on
> > >> CPU which wants to pad input/output tensor to optimal formats (eg.
> > >> nchw16c) according to hardware architecture. It also can be described
> > >> as shape + stride. As many of you know, MKL-DNN shows great
> > >> performance on these optimal formats which is blocked by the vector
> > length of AVX512 or AVX2.
> > >> It's very natural for us to pad on the channel dimension for those
> > >> inputs/outputs which IC or OC is not multiples of vector length and
> > >> leverage optimal kernels for blocked formats. Unfortunately this
> > >> cannot be implemented without changing the logic in the memory
> > planning pass.
> > >> Currently we always fallback to slow reference kernels for both
> > >> convolution [1] and deconvolution [2].
> > >>
> > >>
> > >>
> > >> AFAIK, the padding feature of MKL-DNN has already been used in
> > >> TensorFlow and other frameworks. We also found that, without
> > >> supporting this feature, many other new features from MKL-DNN cannot
> > >> be applied to MXNet,  such as the deconvolution primitive, winograd,
> etc.
> > >>
> > >>
> > >>
> > >> Changes for this proposal can be divided into following parts:
> > >> 1.      Following the proposal in issue #13598, we need add new
> > >> InferStorageTypeEx functions to operators which need to do dispatch
> > >> in a more fine-grained way. This also need the InfereStorage pass can
> > >> handle the new -Ex function as what we did for FCompute and
> > FComputeEx.
> > >> 2.      Attach more information to the computation graph/node, eg.
> > >> accelerator specific information. Currently we add `IsMKLDNN`
> > >> directly during operator registration if MXNET_USE_MKLDNN == 1. It
> > >> looks simple and rude to me.
> > >> 3.      Do memory planning according to more information: topology,
> > >> shapes, data types, in-place options and more accurate accelerator
> > >> information (accelerator path, memory size requirements,
> > >> accelerator-wise attributes).
> > >> 4.      Improve MKL-DNN operators so they can work on those well
> planned
> > >> memory which may be larger than the arithmetic requirements and work
> > >> with optimal kernels. Also, with more accurate dispatching in
> > >> InferStorageTypeEx, there is no need for us to write complicated
> > >> fallback logic in MKL-DNN operators.
> > >> 5.      If users feel uncomfortable with more memory usage, we can
> disable
> > >> this feature by environmental variables.
> > >>
> > >>
> > >>
> > >> Since the memory planning pass is implemented in NNVM, so we also
> > >> need support from TVM community.
> > >>
> > >>
> > >>
> > >> Please let me know what do you think. Thank you.
> > >>
> > >>
> > >>
> > >> -tao
> > >>
> > >>
> > >>
> > >> [1] https://github.com/apache/incubator-mxnet/issues/13598
> > >>
> > >> [2]
> > >> https://github.com/apache/incubator-
> > mxnet/blob/master/src/operator/nn
> > >> /mkldnn/mkldnn_convolution.cc#L194
> > >>
> > >> [3]
> > >> https://github.com/apache/incubator-
> > mxnet/blob/master/src/operator/nn
> > >> /mkldnn/mkldnn_deconvolution.cc#L55
> > >>
> > >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message