mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Olivier <cjolivie...@gmail.com>
Subject OMP
Date Tue, 18 Jun 2019 16:24:11 GMT
I am very reluctant to feed the trolls again, and this will be teh last
time I address Pedro or Anton on the subject, but since I think the numbers
being presented are incorrect (either by te builders not really
understanding what they are building, or possibly intentional misdirection):

Turning Intel OMP on and off (and MKL as well, since it tends to pull in
omp, depending which one is linked in).
There is a HUGE difference.  This is consistent with my experience before
when it was added.


default mnist:

python ../example/image-classification/train_mnist.py
INFO:root:start with arguments Namespace(add_stn=False, batch_size=64,
disp_batches=100, dtype='float32', gc_threshold=0.5, gc_type='none',
gpus=None, image_shape='1, 28, 28', initializer='default',
kv_store='device', load_epoch=None, loss='', lr=0.05, lr_factor=0.1,
lr_step_epochs='10', macrobatch_size=0, model_prefix=None, mom=0.9,
monitor=0, network='mlp', num_classes=10, num_epochs=20,
num_examples=60000, num_layers=None, optimizer='sgd',
profile_server_suffix='', profile_worker_suffix='', save_period=1,
test_io=0, top_k=0, warmup_epochs=5, warmup_strategy='linear', wd=0.0001)

INTEL OMP:

ldd libmxnet.so | grep omp
        libomp.so =>
/home/chris/src/mxnet/cmake_omp/3rdparty/openmp/runtime/src/libomp.so
(0x00007f978fde7000)

:root:Epoch[0] Batch [0-100]        Speed: 31548.09 samples/sec
accuracy=0.780012
INFO:root:Epoch[0] Batch [100-200]      Speed: 16073.21 samples/sec
accuracy=0.920469
INFO:root:Epoch[0] Batch [200-300]      Speed: 19075.91 samples/sec
accuracy=0.928281
INFO:root:Epoch[0] Batch [300-400]      Speed: 23211.36 samples/sec
accuracy=0.942813
INFO:root:Epoch[0] Batch [400-500]      Speed: 22139.79 samples/sec
accuracy=0.938750
INFO:root:Epoch[0] Batch [500-600]      Speed: 23225.52 samples/sec
accuracy=0.946562
INFO:root:Epoch[0] Batch [600-700]      Speed: 19547.41 samples/sec
accuracy=0.953281
INFO:root:Epoch[0] Batch [700-800]      Speed: 24111.73 samples/sec
accuracy=0.951562
INFO:root:Epoch[0] Batch [800-900]      Speed: 13959.88 samples/sec
accuracy=0.957500
INFO:root:Epoch[0] Train-accuracy=0.925423
INFO:root:Epoch[0] Time cost=3.806
INFO:root:Epoch[0] Validation-accuracy=0.962580
INFO:root:Epoch[1] Batch [0-100]        Speed: 24560.21 samples/sec
accuracy=0.968131
INFO:root:Epoch[1] Batch [100-200]      Speed: 23457.03 samples/sec
accuracy=0.966250


LIBGOMP:

ldd libmxnet.so | grep omp
        libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
(0x00007f25c25dd000)

INFO:root:Epoch[0] Batch [0-100]        Speed: 1731.01 samples/sec
 accuracy=0.782488
INFO:root:Epoch[0] Batch [100-200]      Speed: 3551.32 samples/sec
 accuracy=0.907813
INFO:root:Epoch[0] Batch [200-300]      Speed: 1991.00 samples/sec
 accuracy=0.927188
INFO:root:Epoch[0] Batch [300-400]      Speed: 2175.45 samples/sec
 accuracy=0.937969
INFO:root:Epoch[0] Batch [400-500]      Speed: 1644.95 samples/sec
 accuracy=0.942187
INFO:root:Epoch[0] Batch [500-600]      Speed: 6444.58 samples/sec
 accuracy=0.950156
INFO:root:Epoch[0] Batch [600-700]      Speed: 7842.16 samples/sec
 accuracy=0.947969
INFO:root:Epoch[0] Batch [700-800]      Speed: 9412.07 samples/sec
 accuracy=0.953750
INFO:root:Epoch[0] Batch [800-900]      Speed: 12707.58 samples/sec
accuracy=0.953125

That being said, there's other issued beyond speed.  The DEFAULT build from
makefile (not CMake) uses Intel OMP mkl (I showed before) and mysteriously
it has no issues?  This seems highly suspicious.  All I see is a lot of
hand-waving and conjecture and pointing to StackOverflow posts made by
people who may be of questionable pedigree to begin with.  This smells of a
Pedro-ego-fight rather than one of purely technical merit.  Also, if one
knows how OMP works,  they would be very suspicious of the "intermittent
hangs" claim -- that's probably just broken race conditions elsewhere until
proven differently.  It'd tend freeze on the first use if something is
wrong (try using libgomp after a fork and see), since worker threads"
wouldn't be assigned/joined properly.  IntelOMP is faster, but also has
other advantages, such as allowing OMP after a fork.

I actually addressed a lot of issues and ask for clarification in the
original PR's way back when, but they're all just ignored.

-Chris

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message