mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco de Abreu <marco.g.ab...@googlemail.com.INVALID>
Subject Re: MKLDNN performance in CI
Date Fri, 23 Nov 2018 02:44:01 GMT
Sure, good idea! https://github.com/apache/incubator-mxnet/pull/13379

-Marco

On Fri, Nov 23, 2018 at 3:38 AM Zhao, Patric <patric.zhao@intel.com> wrote:

> Thanks, it should be the most time-consuming parts.
>
> @Marco, could you try to disable this env and see the performance again?
>
> > -----Original Message-----
> > From: Lv, Tao A [mailto:tao.a.lv@intel.com]
> > Sent: Friday, November 23, 2018 10:26 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: RE: MKLDNN performance in CI
> >
> > I think yes, except the cpp test.
> >
> > -----Original Message-----
> > From: Zhao, Patric [mailto:patric.zhao@intel.com]
> > Sent: Friday, November 23, 2018 10:06 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: RE: MKLDNN performance in CI
> >
> > Good point, Tao!
> > Is this env enabled in all MKL-DNN CI?
> >
> > > -----Original Message-----
> > > From: Lv, Tao A [mailto:tao.a.lv@intel.com]
> > > Sent: Friday, November 23, 2018 9:53 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: RE: MKLDNN performance in CI
> > >
> > > Thanks for bringing this up, Marco. It's really weird since most of
> > > those tests listed in "worth noting" are not related to mkldnn backend.
> > >
> > > I can understand that some tests for mkldnn operator may be slower
> > > because MXNET_MKLDNN_DEBUG is enabled in the CI:
> > > https://github.com/apache/incubator-
> > > mxnet/blob/master/ci/docker/runtime_functions.sh#L713
> > >
> > > -----Original Message-----
> > > From: Marco de Abreu [mailto:marco.g.abreu@googlemail.com.INVALID]
> > > Sent: Friday, November 23, 2018 9:22 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: MKLDNN performance in CI
> > >
> > > Hello,
> > >
> > > I have noticed that our Python tests have been increasing in duration
> > recently.
> > > In order to analyse this further, I created the PR [1] which allows to
> > > record test durations. Please note that I did not dive deep on these
> > > numbers and that they have to be taken with a grain of salt since
> > > slaves have varying resource utilizations.
> > >
> > > Please have a look at the two following logs:
> > > Python3 CPU MKLDNN:
> > > http://jenkins.mxnet-ci.amazon-
> > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> > > validation/pipelines/unix-cpu/branches/PR-
> > > 13377/runs/2/nodes/155/steps/409/log/?start=0
> > > Python3 CPU Openblas:
> > > http://jenkins.mxnet-ci.amazon-
> > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> > > validation/pipelines/unix-cpu/branches/PR-
> > > 13377/runs/2/nodes/152/steps/398/log/?start=0
> > >
> > > If you scroll to the end (note that there are multiple test stages and
> > > summaries being printed in these logs), you will find the following
> > > statements:
> > >
> > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
> > > Python3 CPU Openblas: "Ran 702 tests in 2158.458s"
> > >
> > > This shows that the MKLDNN is generally being about 40% slower than
> > > the Openblas backend. If we go into the details, we can see that some
> > > tests are significantly slower:
> > >
> > > Python3 CPU MKLDNN:
> > >
> > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79%
> > > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success]
> > > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62%
> > > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45%
> > > >test_operator.test_pick: 74.4041s [success] 2.39%
> > > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38%
> > > >test_random.test_negative_binomial_generator: 72.1751s [success]
> > > >1.84%
> > > >test_operator.test_psroipooling: 55.9432s [success] 1.78%
> > > >test_random.test_poisson_generator: 54.0104s [success] 1.72%
> > > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success]
> > > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41%
> > > >test_random.test_random: 42.8712s [success] 1.03%
> > > >test_operator.test_layer_norm: 31.1242s
> > >
> > >
> > > Python3 CPU Openblas:
> > > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s
> > > > [success] 4.34% test_random.test_shuffle: 93.3157s [success] 4.31%
> > > > test_random.test_negative_binomial_generator: 92.6899s [success]
> > > > 3.78%
> > > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s  [success]
> > > > 3.30% test_operator.test_psroipooling: 70.9090s  [success] 3.20%
> > > > test_random.test_poisson_generator: 68.7500s  [success] 3.10%
> > > > test_metric_perf.test_metric_performance: 66.6085s  [success] 2.79%
> > > > test_operator.test_layer_norm: 59.9566s  [success] 2.66%
> > > > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s  [success]
> > > > 2.62% test_operator.test_pick: 56.2312s  [success] 2.60%
> > > > test_random.test_random: 55.8920s  [success] 2.19%
> > > > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96%
> > > > test_contrib_control_flow.test_cond: 20.6908s
> > >
> > > Tests worth noting:
> > > - test_random.test_shuffle: 700% increase - but I don't know how this
> > > may be related to MKLDNN. Are we doing random number generation in
> > > either of those backends?
> > > - test_sparse_operator.test_elemwise_binary_ops: 700% increase
> > > - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and
> > > to be expect :)
> > > - test_operator.test_broadcast_binary_op: 80% increase
> > > - test_contrib_control_flow.test_cond: 250% increase
> > > - test_operator.test_layer_norm: 50% decrease - nice!
> > >
> > > As I have stated previously, these numbers might not mean anything
> > > since the CI is not a benchmarking environment (sorry if these are
> > > false negatives), but I thought it might be worth mentioning so Intel
> > > could follow up and dive deeper.
> > >
> > > Does anybody here create 1:1 operator comparisons (e.g. running
> > > layer_norm in the different backends to compare the performance) who
> > > could provide us with those numbers?
> > >
> > > Best regards,
> > > Marco
> > >
> > > [1]: https://github.com/apache/incubator-mxnet/pull/13377
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message