mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng, Da" <dzz...@amazon.com>
Subject Re: A proposal for unified integration with external acceleration libraries
Date Mon, 04 Jun 2018 18:52:10 GMT
Hi Tao,

Thanks for your feedbacks.

For your questions:
1. This subgraph strategy is just a mechanism for integration with external libraries. We
can use it if it provides benefits. It seems to me that CuDNN doesn't benefit much from this
strategy. Although NHWC might be non-default, this layout just interprets dimensions of an
array differently, which is very different from MKLDNN formats. The meaning of dimensions
makes sense for only a few operators, so any operator that doesn't need to interpret dimensions
can run on the arrays without any modification. It doesn't seem to me that it's necessary
to isolate CuDNN operators from any other MXNet operators.

2. Imperative Gluon doesn't have subgraph. We can potentially consider an operator as a subgraph,
so the strategy still works for Imperative Gluon. However, the question is why we want to
make it work for imperative Gluon. Imperative Gluon is mainly used for debugging and doesn't
care about performance much, while majority of the acceleration libraries I mentioned in the
proposal is for accelerating inference and model serving. MKLDNN is probably the only exception.
In the imperative gluon mode, we can have MKLDNN operators always output arrays with the default
format.

3. You are absolutely right. The subgraph strategy can't avoid data conversion when conversion
is needed. Currently, if the operators can understand both default and MKLDNN NDArrays, it
works fine and we have spent a lot of time making this work well. However, the current MKLDNN
backend can't handle well the interaction between the MKLDNN operators and the non-MKLDNN
operators. This isn't just simply conversion between default NDArrays and MKLDNN NDArrays.
To make this work, our choices are to 
* make all operators (the ones that use FComputeEx) to understand MKLDNN NDArray. This isn't
scalable. There will be a lot of modifications on the operators. In the future, we might have
more backends and we need to do the same for other backends.
* have the executor to recognize MKLDNN operators and perform data conversion. This makes
the executor complex and needs to understand all backends.
* use the subgraph strategy to isolate MKLDNN operators. This is preferred for MKLDNN because
the subgraph strategy is useful for many purposes (e.g., integration with acceleration libraries,
dynamic shape inference, etc). We don't need to do much to make the subgraph strategy work
well with MKLDNN as well and keep the executor simple and easy to maintain.
Another problem for the current implementation is that MKLDNN NDArrays are subject to the
default memory planning of MXNet (this means an MKLDNN NDArray is reused in a computation
graph). This problem caused a few bugs in the past and the fixes made the executor complex.
The subgraph strategy can solve this problem in a cleaner way by using a different memory
planning inside the MKLDNN subgraph (e.g., disable NDArray reuse inside the subgraph).

Best,
Da
 
´╗┐On 6/3/18, 10:28 PM, "Lv, Tao A" <tao.a.lv@intel.com> wrote:

    
    Hi Da and other developers,
    
    It's a great idea to limit external acceleration libs into certain scope and subgraph.
I am not quite familiar with TVM and TensorRT's design. But from the side of MKL-DNN backend,
here are my concerns on this proposal:
    
    1. Is subgraph for all third party acceleration libraries or just for those have different
data layouts? I guess cudnn are also using non-default data layout (say NHWC) for int8. So
does cudnn path also need follow this proposal? Since I notice that cudnn is not mentioned
in the proposal.
    2. Would subgraph break the execution of imperative gluon interfaces? If we don't apply
subgraph to imperative gluon, does that mean imperative gluon models cannot benefit from any
acceleration libraries?
    3. Currently, most issues of mkldnn backend are from the interchange between mxnet default
ndarray and mkldnn memory. Even after subgraph is applied to mkldnn backend, there will still
have some fallback processes for those inputs which are not supported by mkldnn or those inputs
which are view of other tensors. So we still need deal with the layout transformation between
mkldnn specific layouts and mxnet default layout. We cannot avoid these with the current design
of subgraph.
    
    For pushing mkldnn backend from 'experimental' to 'GA' in 1.3 release, we are working
intensively to add more unit tests and improve the stability of it. Hopefully, these fixes
and tests will upstream or be merged soon. Meanwhile, we are also trying to figure out how
to improve the subgraph solution for properly addressing current issues and better extendibility
in the future.
    
    Any comments and suggestions will be highly appreciated. Thanks.
    
    -tao
    
    -----Original Message-----
    From: Zheng, Da [mailto:dzzhen@amazon.com] 
    Sent: Saturday, June 2, 2018 4:38 AM
    To: dev@mxnet.incubator.apache.org
    Subject: A proposal for unified integration with external acceleration libraries
    
    Hello all,
    
    We would like to propose a new mechanism that unifies the integration with most of the
external acceleration libraries, including TVM, MKLDNN, TensorRT and more. The main idea is
to integrate with the external libraries in the level of subgraphs instead of operators.
    There are a few reasons in favor of the new integration:
    
      *   Integration in the level of operators mixes the external library operators, such
as MKLDNN, with MXNet operators and makes the implementation of the executor overcomplicated.
We now have to deal with a lot of unexpected issues. (the executor needs to carefully deal
with data format conversion between different operators; the operators of external libraries
are subject to the same memory planning like other MXNet operaotrs, etc).
      *   External libraries need to reconstruct the computation graph for better performance
(e.g., operator fusion). Integration in the level of subgraphs allows external libraries to
perform arbitrary graph transformation and computation.
    
    The proposal below provides both the design and the API for constructing subgraphs and
executing subgraphs.
    https://cwiki.apache.org/confluence/display/MXNET/Unified+integration+with+external+acceleration+libraries
    
    Please let me know if you have any comments on this design and API.
    
    Thanks,
    Da
    

Mime
View raw message