mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Larroy <pedro.larroy.li...@gmail.com>
Subject Re: OMP
Date Tue, 18 Jun 2019 18:35:50 GMT
First of all, thanks for following up on this topic and not swiping
the problem under the rug. You might very well be right and have some
numbers which corroborate your findings, this might be something to
celebrate. Before continuing our technical discussion I would like to
take a step back and remind you of the code of conduct, since I think
the way your are handling the communication about this issue is not
conductive for a healthy community, It is also not a good leadership
example from a respected engineer and Apache PMC member.

We are all trying to do the best we can for the project and not
everyone is an expert on everything. There are technical decisions
made long ago, sometimes lacking proper documentation and
justifications which even if they are right, constitute technical debt
as it takes a big effort to reverse-engineer or deep dive to
understand all the ramifications which are non-obvious. I called a
vote to clarify the issue and have an opportunity to move a long
standing problem that remains unaddressed and unclear, this is not
trolling, nothing personal nor against anyone nor their work.

I actually just know the basics about OpenMP, so this is hardly about
ego, as it's also not my contribution, I tried to help by providing
some benchmarks requested since I felt the original contributors gave
up trying to help. After we provided info and benchmarks one after
another, you closed the PR in a way that was not well understood.

If there's a flaw on the benchmark you are right to point it out.  If
someone doesn't have time or willingness to coach contributors or
properly explain why a PR is not doing the right thing or document
your technical contributions in a way that we can all align behind and
understand the tradeoffs they shouldn't be exercising the power to
close PRs.

Please take some time to read the code of conduct:

https://www.apache.org/foundation/policies/conduct

There's also other materials about building healthy communities:
https://www.jonobacon.com/books/artofcommunity/

Since we don't all share your particular sense of humor I would
suggest to be prudent, have politeness, patience explaining your
technical decisions and refrain from calling other people's names or
using ad-hominem, as well as assuming good intentions.

I suggested to you before in a private channel to have your findings
and benchmarks documented in the wiki so we can have constructive
conversations and help contributors improve the existing issues with
OpenMP, people come and go to projects, so you can't assume that
everyone knows the reasons why something was done some way two years
ago, also the reasons might change with time.


Pedro.

On Tue, Jun 18, 2019 at 9:24 AM Chris Olivier <cjolivier01@gmail.com> wrote:
>
> I am very reluctant to feed the trolls again, and this will be teh last
> time I address Pedro or Anton on the subject, but since I think the numbers
> being presented are incorrect (either by te builders not really
> understanding what they are building, or possibly intentional misdirection):
>
> Turning Intel OMP on and off (and MKL as well, since it tends to pull in
> omp, depending which one is linked in).
> There is a HUGE difference.  This is consistent with my experience before
> when it was added.
>
>
> default mnist:
>
> python ../example/image-classification/train_mnist.py
> INFO:root:start with arguments Namespace(add_stn=False, batch_size=64,
> disp_batches=100, dtype='float32', gc_threshold=0.5, gc_type='none',
> gpus=None, image_shape='1, 28, 28', initializer='default',
> kv_store='device', load_epoch=None, loss='', lr=0.05, lr_factor=0.1,
> lr_step_epochs='10', macrobatch_size=0, model_prefix=None, mom=0.9,
> monitor=0, network='mlp', num_classes=10, num_epochs=20,
> num_examples=60000, num_layers=None, optimizer='sgd',
> profile_server_suffix='', profile_worker_suffix='', save_period=1,
> test_io=0, top_k=0, warmup_epochs=5, warmup_strategy='linear', wd=0.0001)
>
> INTEL OMP:
>
> ldd libmxnet.so | grep omp
>         libomp.so =>
> /home/chris/src/mxnet/cmake_omp/3rdparty/openmp/runtime/src/libomp.so
> (0x00007f978fde7000)
>
> :root:Epoch[0] Batch [0-100]        Speed: 31548.09 samples/sec
> accuracy=0.780012
> INFO:root:Epoch[0] Batch [100-200]      Speed: 16073.21 samples/sec
> accuracy=0.920469
> INFO:root:Epoch[0] Batch [200-300]      Speed: 19075.91 samples/sec
> accuracy=0.928281
> INFO:root:Epoch[0] Batch [300-400]      Speed: 23211.36 samples/sec
> accuracy=0.942813
> INFO:root:Epoch[0] Batch [400-500]      Speed: 22139.79 samples/sec
> accuracy=0.938750
> INFO:root:Epoch[0] Batch [500-600]      Speed: 23225.52 samples/sec
> accuracy=0.946562
> INFO:root:Epoch[0] Batch [600-700]      Speed: 19547.41 samples/sec
> accuracy=0.953281
> INFO:root:Epoch[0] Batch [700-800]      Speed: 24111.73 samples/sec
> accuracy=0.951562
> INFO:root:Epoch[0] Batch [800-900]      Speed: 13959.88 samples/sec
> accuracy=0.957500
> INFO:root:Epoch[0] Train-accuracy=0.925423
> INFO:root:Epoch[0] Time cost=3.806
> INFO:root:Epoch[0] Validation-accuracy=0.962580
> INFO:root:Epoch[1] Batch [0-100]        Speed: 24560.21 samples/sec
> accuracy=0.968131
> INFO:root:Epoch[1] Batch [100-200]      Speed: 23457.03 samples/sec
> accuracy=0.966250
>
>
> LIBGOMP:
>
> ldd libmxnet.so | grep omp
>         libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> (0x00007f25c25dd000)
>
> INFO:root:Epoch[0] Batch [0-100]        Speed: 1731.01 samples/sec
>  accuracy=0.782488
> INFO:root:Epoch[0] Batch [100-200]      Speed: 3551.32 samples/sec
>  accuracy=0.907813
> INFO:root:Epoch[0] Batch [200-300]      Speed: 1991.00 samples/sec
>  accuracy=0.927188
> INFO:root:Epoch[0] Batch [300-400]      Speed: 2175.45 samples/sec
>  accuracy=0.937969
> INFO:root:Epoch[0] Batch [400-500]      Speed: 1644.95 samples/sec
>  accuracy=0.942187
> INFO:root:Epoch[0] Batch [500-600]      Speed: 6444.58 samples/sec
>  accuracy=0.950156
> INFO:root:Epoch[0] Batch [600-700]      Speed: 7842.16 samples/sec
>  accuracy=0.947969
> INFO:root:Epoch[0] Batch [700-800]      Speed: 9412.07 samples/sec
>  accuracy=0.953750
> INFO:root:Epoch[0] Batch [800-900]      Speed: 12707.58 samples/sec
> accuracy=0.953125
>
> That being said, there's other issued beyond speed.  The DEFAULT build from
> makefile (not CMake) uses Intel OMP mkl (I showed before) and mysteriously
> it has no issues?  This seems highly suspicious.  All I see is a lot of
> hand-waving and conjecture and pointing to StackOverflow posts made by
> people who may be of questionable pedigree to begin with.  This smells of a
> Pedro-ego-fight rather than one of purely technical merit.  Also, if one
> knows how OMP works,  they would be very suspicious of the "intermittent
> hangs" claim -- that's probably just broken race conditions elsewhere until
> proven differently.  It'd tend freeze on the first use if something is
> wrong (try using libgomp after a fork and see), since worker threads"
> wouldn't be assigned/joined properly.  IntelOMP is faster, but also has
> other advantages, such as allowing OMP after a fork.
>
> I actually addressed a lot of issues and ask for clarification in the
> original PR's way back when, but they're all just ignored.
>
> -Chris

Mime
View raw message