mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Wu <wujun....@gmail.com>
Subject Re: Intel Plan for the contribution to MXNET
Date Thu, 01 Feb 2018 04:40:29 GMT
Hi Patric,

Thanks for the contribution. It’s great to see actions on developing INT8
inference for CPU! I have a few questions and hope to have your answers.

1. When you said your work is aligned with PR9552
<https://github.com/apache/incubator-mxnet/pull/9552>, did you mean you
used quantization+calibration flows developed in that PR for benchmarking
inferences?
2. In you MNIST benchmark, what operators are quantized?
3. Is the MNIST quantized model calibrated?
4. Is the inference accuracy of INT8 produced by the *calibrated* quantized
model, or just quantized model without calibration?
4. What are the throughputs of FP32 model and INT8 model for inference,
respectively?

Thanks,
Jun


On Wed, Jan 31, 2018 at 8:08 PM, Zhao, Patric <patric.zhao@intel.com> wrote:

> Hi MXNET developers,
>
> We are from Intel Software and Service Group (SSG) and working on the
> performance optimization for MXNET on Intel Architecture (IA).
>
> Let me do a simple introduction about our ongoing projects.
>
> Any suggestions and comments are highly appreciated.
>
>
> 1)      MKL-DNN integration with new NNVM interface
>
> We have designed a new interface of MKL-DNN by NNVM with Zheng-Da together.
>
> The new implementation shows the better performance and flexibility than
> old MKL engine.
>
>
>
> The PR is under review (https://github.com/apache/
> incubator-mxnet/pull/8302) and very thanks for your great comments in the
> thread :)
>
> After the PR is merged, we will push more MKL-DNN related features and
> performance optimization strategies, such as fused conv + relu OP for the
> inference.
>
>
>
> 2)      INT8 inference
>
> MKL-DNN also provides the int8 calculations such as for conv, relu,
> pooling which can improve the inference performance a lot within very
> slight accuracy drop (like <1%).
>
> Currently, we have implemented quantization, de-quantization, and some
> computing Ops in local branch.
>
> Our latest implementation is aligned with this PR (
> https://github.com/apache/incubator-mxnet/pull/9552) and passed the unit
> test.
>
>
>
> For a simple network (conv+relu+flatten+FC+softmax) with mnist dataset, we
> got very similar inference accuracy (FP32,98.06% .vs. INT8, 97.93%).
>
> We will update a summary of our solution in this PR soon.
>
>
>
> I hope both CPU and GPU can be compatible and share the common code base
> together. So, I think we need more discussion in the PR :)
>
>
>
> 3)      RNN implementations
>
> Currently, there is no CPU implementation for mx.sym.rnn and the python
> implementation is really slower.
>
> We are working on resolving this issue from two aspects.:
>
> -          Provide the C/C++ level implementation, registering by
> FCompute<cpu> (GPU code should be moved to NNVM as well).
>
> We plan to PR the LSTM/GRU in the March and our initial results as below,
> FYI
>             Size :N = 12, T = 1600, I = 161, H = 1760 (from the first
> layer of deep speech 2)
> Forward
>
> mx.sym.gru binded Intel GRU C(s)
>
> Native mx.rnn.GRUCell(s)
>
> SKX 6148, 2 socket
>
> 1.32
>
> 72.7
>
>
>
>
> -          Provide the MKL-DNN RNN interface (under development,
> https://github.com/intel/mkl-dnn/issues/46), registering by
> FComputeEx<cpu>
>
> The higher performance RNN is under development by MKL-DNN team. And we
> will merge it when it's ready.
>
> I think the CPU user can get further performance boost by MKL-DNN library.
>
>      Thanks in advance!
>
>      BR,
>
>     -- Patric
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message