mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lv, Tao A" <>
Subject RE: MKLDNN performance in CI
Date Fri, 23 Nov 2018 01:52:44 GMT
Thanks for bringing this up, Marco. It's really weird since most of those tests listed in "worth
noting" are not related to mkldnn backend.

I can understand that some tests for mkldnn operator may be slower because MXNET_MKLDNN_DEBUG
is enabled in the CI:

-----Original Message-----
From: Marco de Abreu [] 
Sent: Friday, November 23, 2018 9:22 AM
Subject: MKLDNN performance in CI


I have noticed that our Python tests have been increasing in duration recently. In order to
analyse this further, I created the PR [1] which allows to record test durations. Please note
that I did not dive deep on these numbers and that they have to be taken with a grain of salt
since slaves have varying resource utilizations.

Please have a look at the two following logs:
Python3 CPU Openblas:

If you scroll to the end (note that there are multiple test stages and summaries being printed
in these logs), you will find the following

Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
Python3 CPU Openblas: "Ran 702 tests in 2158.458s"

This shows that the MKLDNN is generally being about 40% slower than the Openblas backend.
If we go into the details, we can see that some tests are significantly slower:


>[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% 
>test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] 
>10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% 
>test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% 
>test_operator.test_pick: 74.4041s [success] 2.39% 
>test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% 
>test_random.test_negative_binomial_generator: 72.1751s [success] 1.84% 
>test_operator.test_psroipooling: 55.9432s [success] 1.78% 
>test_random.test_poisson_generator: 54.0104s [success] 1.72% 
>test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] 
>1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% 
>test_random.test_random: 42.8712s [success] 1.03% 
>test_operator.test_layer_norm: 31.1242s

Python3 CPU Openblas:
> [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s [success] 
> 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% 
> test_random.test_negative_binomial_generator: 92.6899s [success] 3.78% 
> test_sparse_operator.test_elemwise_binary_ops: 81.2048s  [success] 
> 3.30% test_operator.test_psroipooling: 70.9090s  [success] 3.20% 
> test_random.test_poisson_generator: 68.7500s  [success] 3.10% 
> test_metric_perf.test_metric_performance: 66.6085s  [success] 2.79% 
> test_operator.test_layer_norm: 59.9566s  [success] 2.66% 
> test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s  [success] 
> 2.62% test_operator.test_pick: 56.2312s  [success] 2.60% 
> test_random.test_random: 55.8920s  [success] 2.19% 
> test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96% 
> test_contrib_control_flow.test_cond: 20.6908s

Tests worth noting:
- test_random.test_shuffle: 700% increase - but I don't know how this may be related to MKLDNN.
Are we doing random number generation in either of those backends?
- test_sparse_operator.test_elemwise_binary_ops: 700% increase
- test_gluon_model_zoo.test_models: 40% decrease - that's awesome and to be expect :)
- test_operator.test_broadcast_binary_op: 80% increase
- test_contrib_control_flow.test_cond: 250% increase
- test_operator.test_layer_norm: 50% decrease - nice!

As I have stated previously, these numbers might not mean anything since the CI is not a benchmarking
environment (sorry if these are false negatives), but I thought it might be worth mentioning
so Intel could follow up and dive deeper.

Does anybody here create 1:1 operator comparisons (e.g. running layer_norm in the different
backends to compare the performance) who could provide us with those numbers?

Best regards,

View raw message