mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lv, Tao A" <>
Subject [MXNET 2.0 Wishlist] [DISCUSS] Refine the InferStorageType and memory planning pass
Date Tue, 09 Apr 2019 08:46:31 GMT

Hi dev,

As we're discussing the roadmap for MXNet 2.0, I would like to start a thread about refining
the InferStorageType and memory planning pass in MXNet and hope it can happen as a part of
the 2.0 release.

Thanks to @eric-haibin-lin, part of the proposal has already been discussed in issue #13598

As mentioned in the description of issue #13598, there are several drawbacks of the existing
flow. Please allow me to quote them here:
*        the selection of MKL/CPU/GPU/CUDNN implementation happens after graph attribute inference
and memory planning, memory planning is thus not aware of the implementation that will be
used for execution in the future, which may result in sub-optimal result. For example, the
memory inplace option may vary depending on the accelerator backend (the new version of CUDNN
enables x/dx inplace for _backward_conv).
*        some sparse operator need to access dtype/shape information to decide which implementation
to invoke for execution, and whether to perform fallback. This information is not yet exposed
in the existing infer storage type interface.

Besides, the existing memory planning pass calculates and afterwards allocates memory strictly
according to the input/output tensor shapes (which can be got from operators' arithmetic formulas
through InferShape). That's not true anymore when we come to accelerators like MKL-DNN on
CPU which wants to pad input/output tensor to optimal formats (eg. nchw16c) according to hardware
architecture. It also can be described as shape + stride. As many of you know, MKL-DNN shows
great performance on these optimal formats which is blocked by the vector length of AVX512
or AVX2. It's very natural for us to pad on the channel dimension for those inputs/outputs
which IC or OC is not multiples of vector length and leverage optimal kernels for blocked
formats. Unfortunately this cannot be implemented without changing the logic in the memory
planning pass. Currently we always fallback to slow reference kernels for both convolution
[1] and deconvolution [2].

AFAIK, the padding feature of MKL-DNN has already been used in TensorFlow and other frameworks.
We also found that, without supporting this feature, many other new features from MKL-DNN
cannot be applied to MXNet,  such as the deconvolution primitive, winograd, etc.

Changes for this proposal can be divided into following parts:
1.      Following the proposal in issue #13598, we need add new InferStorageTypeEx functions
to operators which need to do dispatch in a more fine-grained way. This also need the InfereStorage
pass can handle the new -Ex function as what we did for FCompute and FComputeEx.
2.      Attach more information to the computation graph/node, eg. accelerator specific information.
Currently we add `IsMKLDNN` directly during operator registration if MXNET_USE_MKLDNN == 1.
It looks simple and rude to me.
3.      Do memory planning according to more information: topology, shapes, data types, in-place
options and more accurate accelerator information (accelerator path, memory size requirements,
accelerator-wise attributes).
4.      Improve MKL-DNN operators so they can work on those well planned memory which may
be larger than the arithmetic requirements and work with optimal kernels. Also, with more
accurate dispatching in InferStorageTypeEx, there is no need for us to write complicated fallback
logic in MKL-DNN operators.
5.      If users feel uncomfortable with more memory usage, we can disable this feature by
environmental variables.

Since the memory planning pass is implemented in NNVM, so we also need support from TVM community.

Please let me know what do you think. Thank you.





  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message