mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Wu <wujun....@gmail.com>
Subject Re: Intel Plan for the contribution to MXNET
Date Thu, 01 Feb 2018 06:14:12 GMT
Great. Let's coordinate to keep our efforts aligned.

On Wed, Jan 31, 2018 at 9:51 PM, Zhao, Patric <patric.zhao@intel.com> wrote:

> Thanks, Jun, please see my comments inline.
>
>
>
> Wenting and Jin will follow up the tasks in the PR.
>
>
>
> *From:* Jun Wu [mailto:wujun.nju@gmail.com]
> *Sent:* Thursday, February 1, 2018 12:40 PM
> *To:* dev@mxnet.incubator.apache.org
> *Cc:* Ye, Jason Y <jason.y.ye@intel.com>; Lv, Tao A <tao.a.lv@intel.com>;
> Jiang, Wenting <wenting.jiang@intel.com>; Zhao, Patric <
> patric.zhao@intel.com>
> *Subject:* Re: Intel Plan for the contribution to MXNET
>
>
>
> Hi Patric,
>
>
>
> Thanks for the contribution. It’s great to see actions on developing INT8
> inference for CPU! I have a few questions and hope to have your answers.
>
>
>
> 1.      When you said your work is aligned with PR9552
> <https://github.com/apache/incubator-mxnet/pull/9552>, did you mean you
> used quantization+calibration flows developed in that PR for benchmarking
> inferences?
>
> [Patric] The benchmark accuracy is based on MKLDNN and ziheng’s old
> quantization branch.
>
> Now we have merged to master (based on #8302) with
> quantization+calibration PR for int8 development, will show you the
> accuracy and performance soon.
>
>
>
> 2.      In you MNIST benchmark, what operators are quantized?
>
> [Patric] Conv, relu and flatten are quantized in our mnist benchmark
> (conv+relu+flatten+FC+softmax).
>
> Besides, MKLDNN supports pooling, concat and fused(conv with relu/elem/bn)
> int8 ops.
>
>
>
> 3.      Is the MNIST quantized model calibrated?
>
> [Patric] Not yet, we did the experiment on ziheng’s old quantization
> branch, now we are moving to branch of quantization+calibration PR.
>
>
>
> 4.      Is the inference accuracy of INT8 produced by the *calibrated*
> quantized model, or just quantized model without calibration?
>
> [Patric] Without calibration
>
>
>
> 5.      What are the throughputs of FP32 model and INT8 model for
> inference, respectively?
>
> [Patric] In this stage, we are mainly focus on the accuracy and algorithm.
> The performance fine tune is on the way J
>
>
>
> Thanks,
>
> Jun
>
>
>
> On Wed, Jan 31, 2018 at 8:08 PM, Zhao, Patric <patric.zhao@intel.com>
> wrote:
>
> Hi MXNET developers,
>
> We are from Intel Software and Service Group (SSG) and working on the
> performance optimization for MXNET on Intel Architecture (IA).
>
> Let me do a simple introduction about our ongoing projects.
>
> Any suggestions and comments are highly appreciated.
>
>
> 1)      MKL-DNN integration with new NNVM interface
>
> We have designed a new interface of MKL-DNN by NNVM with Zheng-Da together.
>
> The new implementation shows the better performance and flexibility than
> old MKL engine.
>
>
>
> The PR is under review (https://github.com/apache/
> incubator-mxnet/pull/8302) and very thanks for your great comments in the
> thread :)
>
> After the PR is merged, we will push more MKL-DNN related features and
> performance optimization strategies, such as fused conv + relu OP for the
> inference.
>
>
>
> 2)      INT8 inference
>
> MKL-DNN also provides the int8 calculations such as for conv, relu,
> pooling which can improve the inference performance a lot within very
> slight accuracy drop (like <1%).
>
> Currently, we have implemented quantization, de-quantization, and some
> computing Ops in local branch.
>
> Our latest implementation is aligned with this PR (
> https://github.com/apache/incubator-mxnet/pull/9552) and passed the unit
> test.
>
>
>
> For a simple network (conv+relu+flatten+FC+softmax) with mnist dataset, we
> got very similar inference accuracy (FP32,98.06% .vs. INT8, 97.93%).
>
> We will update a summary of our solution in this PR soon.
>
>
>
> I hope both CPU and GPU can be compatible and share the common code base
> together. So, I think we need more discussion in the PR :)
>
>
>
> 3)      RNN implementations
>
> Currently, there is no CPU implementation for mx.sym.rnn and the python
> implementation is really slower.
>
> We are working on resolving this issue from two aspects.:
>
> -          Provide the C/C++ level implementation, registering by
> FCompute<cpu> (GPU code should be moved to NNVM as well).
>
> We plan to PR the LSTM/GRU in the March and our initial results as below,
> FYI
>             Size :N = 12, T = 1600, I = 161, H = 1760 (from the first
> layer of deep speech 2)
> Forward
>
> mx.sym.gru binded Intel GRU C(s)
>
> Native mx.rnn.GRUCell(s)
>
> SKX 6148, 2 socket
>
> 1.32
>
> 72.7
>
>
>
>
> -          Provide the MKL-DNN RNN interface (under development,
> https://github.com/intel/mkl-dnn/issues/46), registering by
> FComputeEx<cpu>
>
> The higher performance RNN is under development by MKL-DNN team. And we
> will merge it when it's ready.
>
> I think the CPU user can get further performance boost by MKL-DNN library.
>
>      Thanks in advance!
>
>      BR,
>
>     -- Patric
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message