mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Larroy <pedro.larroy.li...@gmail.com>
Subject Re: OMP
Date Wed, 19 Jun 2019 17:34:29 GMT
+1 Would be best to have a controlled environment so we can reason
about how MXNet is being built and what libraries are linked. I'm
happy to help here. I would think docker won't have a big impact on
the measurement or distort the results much.


On Wed, Jun 19, 2019 at 10:28 AM kellen sunderland
<kellen.sunderland@gmail.com> wrote:
>
> I've also quite often seen two versions of OpenMP linked.  I think we can
> all agree we probably want to avoid linking in two libraries that do
> effectively the same thing.
>
> The performance questions should be fairly straight forward to demonstrate
> right?  Could we just collaborate on a few minimal Dockerfiles that show
> (or don't show) Intel OpenMP performance speedups with the workloads Chris
> is referencing?
>
> On Wed, Jun 19, 2019 at 4:44 AM Tsukrov, Stanislav <
> stanislav.tsukrov@gmail.com> wrote:
>
> > Hi, Chris!
> >
> > Stas here - I've gathered that performance data.
> > Sure thing, I can be wrong, but please elaborate a bit on what we are
> > missing.
> > Be assured, intentional misdirection was never a case.
> >
> > Thanks a lot for being constructive.
> >
> > > Turning Intel OMP on and off (and MKL as well, since it tends to pull in
> > omp, depending which one is linked in).
> >
> > We never ever considered turning MKL off. We are on the same page here -
> > MKL is crucial for the performance.
> > Why should we? There's a GOMP-linked version of MKL, that we can use.
> >
> > What we did - we measured, if using compilers default OpenMP
> > implementation instead of referenced source code distribution of OpenMP
> > makes anything slower.
> > We have found the impact to be hardly measurable.
> > The difference between GOMP and iOMP is <5% on our benchmarks, most of the
> > time less than that.
> >
> > We just suggest to simplify the build of mxnet, by removing the
> > unnecessary dependency.
> >
> > During that we discovered for example the following amazing issue:
> > https://github.com/apache/incubator-mxnet/issues/14087
> >
> > Best Regards
> >
> > Stas
> >
> > ´╗┐On 18.06.19, 18:24, "Chris Olivier" <cjolivier01@gmail.com> wrote:
> >
> >     I am very reluctant to feed the trolls again, and this will be teh last
> >     time I address Pedro or Anton on the subject, but since I think the
> > numbers
> >     being presented are incorrect (either by te builders not really
> >     understanding what they are building, or possibly intentional
> > misdirection):
> >
> >     Turning Intel OMP on and off (and MKL as well, since it tends to pull
> > in
> >     omp, depending which one is linked in).
> >     There is a HUGE difference.  This is consistent with my experience
> > before
> >     when it was added.
> >
> >
> >     default mnist:
> >
> >     python ../example/image-classification/train_mnist.py
> >     INFO:root:start with arguments Namespace(add_stn=False, batch_size=64,
> >     disp_batches=100, dtype='float32', gc_threshold=0.5, gc_type='none',
> >     gpus=None, image_shape='1, 28, 28', initializer='default',
> >     kv_store='device', load_epoch=None, loss='', lr=0.05, lr_factor=0.1,
> >     lr_step_epochs='10', macrobatch_size=0, model_prefix=None, mom=0.9,
> >     monitor=0, network='mlp', num_classes=10, num_epochs=20,
> >     num_examples=60000, num_layers=None, optimizer='sgd',
> >     profile_server_suffix='', profile_worker_suffix='', save_period=1,
> >     test_io=0, top_k=0, warmup_epochs=5, warmup_strategy='linear',
> > wd=0.0001)
> >
> >     INTEL OMP:
> >
> >     ldd libmxnet.so | grep omp
> >             libomp.so =>
> >     /home/chris/src/mxnet/cmake_omp/3rdparty/openmp/runtime/src/libomp.so
> >     (0x00007f978fde7000)
> >
> >     :root:Epoch[0] Batch [0-100]        Speed: 31548.09 samples/sec
> >     accuracy=0.780012
> >     INFO:root:Epoch[0] Batch [100-200]      Speed: 16073.21 samples/sec
> >     accuracy=0.920469
> >     INFO:root:Epoch[0] Batch [200-300]      Speed: 19075.91 samples/sec
> >     accuracy=0.928281
> >     INFO:root:Epoch[0] Batch [300-400]      Speed: 23211.36 samples/sec
> >     accuracy=0.942813
> >     INFO:root:Epoch[0] Batch [400-500]      Speed: 22139.79 samples/sec
> >     accuracy=0.938750
> >     INFO:root:Epoch[0] Batch [500-600]      Speed: 23225.52 samples/sec
> >     accuracy=0.946562
> >     INFO:root:Epoch[0] Batch [600-700]      Speed: 19547.41 samples/sec
> >     accuracy=0.953281
> >     INFO:root:Epoch[0] Batch [700-800]      Speed: 24111.73 samples/sec
> >     accuracy=0.951562
> >     INFO:root:Epoch[0] Batch [800-900]      Speed: 13959.88 samples/sec
> >     accuracy=0.957500
> >     INFO:root:Epoch[0] Train-accuracy=0.925423
> >     INFO:root:Epoch[0] Time cost=3.806
> >     INFO:root:Epoch[0] Validation-accuracy=0.962580
> >     INFO:root:Epoch[1] Batch [0-100]        Speed: 24560.21 samples/sec
> >     accuracy=0.968131
> >     INFO:root:Epoch[1] Batch [100-200]      Speed: 23457.03 samples/sec
> >     accuracy=0.966250
> >
> >
> >     LIBGOMP:
> >
> >     ldd libmxnet.so | grep omp
> >             libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> >     (0x00007f25c25dd000)
> >
> >     INFO:root:Epoch[0] Batch [0-100]        Speed: 1731.01 samples/sec
> >      accuracy=0.782488
> >     INFO:root:Epoch[0] Batch [100-200]      Speed: 3551.32 samples/sec
> >      accuracy=0.907813
> >     INFO:root:Epoch[0] Batch [200-300]      Speed: 1991.00 samples/sec
> >      accuracy=0.927188
> >     INFO:root:Epoch[0] Batch [300-400]      Speed: 2175.45 samples/sec
> >      accuracy=0.937969
> >     INFO:root:Epoch[0] Batch [400-500]      Speed: 1644.95 samples/sec
> >      accuracy=0.942187
> >     INFO:root:Epoch[0] Batch [500-600]      Speed: 6444.58 samples/sec
> >      accuracy=0.950156
> >     INFO:root:Epoch[0] Batch [600-700]      Speed: 7842.16 samples/sec
> >      accuracy=0.947969
> >     INFO:root:Epoch[0] Batch [700-800]      Speed: 9412.07 samples/sec
> >      accuracy=0.953750
> >     INFO:root:Epoch[0] Batch [800-900]      Speed: 12707.58 samples/sec
> >     accuracy=0.953125
> >
> >     That being said, there's other issued beyond speed.  The DEFAULT build
> > from
> >     makefile (not CMake) uses Intel OMP mkl (I showed before) and
> > mysteriously
> >     it has no issues?  This seems highly suspicious.  All I see is a lot of
> >     hand-waving and conjecture and pointing to StackOverflow posts made by
> >     people who may be of questionable pedigree to begin with.  This smells
> > of a
> >     Pedro-ego-fight rather than one of purely technical merit.  Also, if
> > one
> >     knows how OMP works,  they would be very suspicious of the
> > "intermittent
> >     hangs" claim -- that's probably just broken race conditions elsewhere
> > until
> >     proven differently.  It'd tend freeze on the first use if something is
> >     wrong (try using libgomp after a fork and see), since worker threads"
> >     wouldn't be assigned/joined properly.  IntelOMP is faster, but also has
> >     other advantages, such as allowing OMP after a fork.
> >
> >     I actually addressed a lot of issues and ask for clarification in the
> >     original PR's way back when, but they're all just ignored.
> >
> >     -Chris
> >
> >
> >
> >

Mime
View raw message