mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhao, Patric" <patric.z...@intel.com>
Subject Intel Plan for the contribution to MXNET
Date Thu, 01 Feb 2018 04:08:16 GMT
Hi MXNET developers,

We are from Intel Software and Service Group (SSG) and working on the performance optimization
for MXNET on Intel Architecture (IA).

Let me do a simple introduction about our ongoing projects.

Any suggestions and comments are highly appreciated.


1)      MKL-DNN integration with new NNVM interface

We have designed a new interface of MKL-DNN by NNVM with Zheng-Da together.

The new implementation shows the better performance and flexibility than old MKL engine.



The PR is under review (https://github.com/apache/incubator-mxnet/pull/8302) and very thanks
for your great comments in the thread :)

After the PR is merged, we will push more MKL-DNN related features and performance optimization
strategies, such as fused conv + relu OP for the inference.



2)      INT8 inference

MKL-DNN also provides the int8 calculations such as for conv, relu,  pooling which can improve
the inference performance a lot within very slight accuracy drop (like <1%).

Currently, we have implemented quantization, de-quantization, and some computing Ops in local
branch.

Our latest implementation is aligned with this PR (https://github.com/apache/incubator-mxnet/pull/9552)
and passed the unit test.



For a simple network (conv+relu+flatten+FC+softmax) with mnist dataset, we got very similar
inference accuracy (FP32,98.06% .vs. INT8, 97.93%).

We will update a summary of our solution in this PR soon.



I hope both CPU and GPU can be compatible and share the common code base together. So, I think
we need more discussion in the PR :)



3)      RNN implementations

Currently, there is no CPU implementation for mx.sym.rnn and the python implementation is
really slower.

We are working on resolving this issue from two aspects.:

-          Provide the C/C++ level implementation, registering by FCompute<cpu> (GPU
code should be moved to NNVM as well).

We plan to PR the LSTM/GRU in the March and our initial results as below, FYI
            Size :N = 12, T = 1600, I = 161, H = 1760 (from the first layer of deep speech
2)
Forward

mx.sym.gru binded Intel GRU C(s)

Native mx.rnn.GRUCell(s)

SKX 6148, 2 socket

1.32

72.7




-          Provide the MKL-DNN RNN interface (under development, https://github.com/intel/mkl-dnn/issues/46),
registering by FComputeEx<cpu>

The higher performance RNN is under development by MKL-DNN team. And we will merge it when
it's ready.

I think the CPU user can get further performance boost by MKL-DNN library.

     Thanks in advance!

     BR,

    -- Patric


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message