mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kellen sunderland <kellen.sunderl...@gmail.com>
Subject Re: OMP
Date Thu, 20 Jun 2019 00:35:01 GMT
"if you’re linking in two then you’re doing something wrong." Correct,
that's one thing I believe we've got consensus on.  So let's call that out
as a bug to be fixed.

Let's move forward with some reproducible numbers and then discuss the pros
/ cons of which particular OMP implementation we should use.

On Wed, Jun 19, 2019 at 3:06 PM Pedro Larroy <pedro.larroy.lists@gmail.com>
wrote:

> Hi Chris
>
> I would ask you to have a bit of patience and help us with your
> experience in this matter. Nobody is ignoring anything, I think we are
> individually gathering feedbacks and trying to understand the multiple
> contributions done to this topic including yours, then go step by
> step, understand what is going on and run experiments and report back
> to the list or the corresponding github item. It was suggested by
> Kellen to prepare some containers, this takes effort.
>
> Regarding your final comment, most of us also have many other things
> to do and responsibilities even if our daytime jobs might involve
> MXNet in some form or another. I think that's part of the privilege
> and responsibility of working close with an open source project and
> the magic of collaboration across organizations. Let's all be patient
> and take some time to understand and reason about this topic which is
> not simple. Since we decided to step back and gather more data let's
> take time and do it properly.
>
> Personally I hope to find time to look again into this issue before
> the end of the week.
>
> Thanks.
>
> Pedro.
>
> On Wed, Jun 19, 2019 at 2:43 PM Chris Olivier <cjolivier01@apache.org>
> wrote:
> >
> > if you’re linking in two then you’re doing something wrong. You can see
> by
> > my email yesterday that only one is linked in. This is also the case with
> > the mkl version built by the Makefile — only the Intel OMP library is
> used
> > (no libgomp).
> >
> > That being said, Do you have clear evidence that using Intel OMP is both
> > problematic and the situation isn’t fixable?  The burden of proof is on
> the
> > ones requesting the change — it is not my responsibility to justify the
> > current state.  There must be something “terrible” and unfixable to
> justify
> > a change.  I have seen no proof of this in all this time.
> >
> > On a side note, I mentioned a couple of things in my email yesterday that
> > still are not being responded to (they were also ignored in the last
> > incarnation of this “discussion” — I have much experience in this matter
> to
> > assume “discussion” is a waste of my time, seeing and I am not paid to
> > “work on” mxnet like y’all are).
> >
> > -C
> >
> >
> >
> >
> >
> >
> > On Wed, Jun 19, 2019 at 10:28 AM kellen sunderland <
> > kellen.sunderland@gmail.com> wrote:
> >
> > > I've also quite often seen two versions of OpenMP linked.  I think we
> can
> > > all agree we probably want to avoid linking in two libraries that do
> > > effectively the same thing.
> > >
> > > The performance questions should be fairly straight forward to
> demonstrate
> > > right?  Could we just collaborate on a few minimal Dockerfiles that
> show
> > > (or don't show) Intel OpenMP performance speedups with the workloads
> Chris
> > > is referencing?
> > >
> > > On Wed, Jun 19, 2019 at 4:44 AM Tsukrov, Stanislav <
> > > stanislav.tsukrov@gmail.com> wrote:
> > >
> > > > Hi, Chris!
> > > >
> > > > Stas here - I've gathered that performance data.
> > > > Sure thing, I can be wrong, but please elaborate a bit on what we are
> > > > missing.
> > > > Be assured, intentional misdirection was never a case.
> > > >
> > > > Thanks a lot for being constructive.
> > > >
> > > > > Turning Intel OMP on and off (and MKL as well, since it tends to
> pull
> > > in
> > > > omp, depending which one is linked in).
> > > >
> > > > We never ever considered turning MKL off. We are on the same page
> here -
> > > > MKL is crucial for the performance.
> > > > Why should we? There's a GOMP-linked version of MKL, that we can use.
> > > >
> > > > What we did - we measured, if using compilers default OpenMP
> > > > implementation instead of referenced source code distribution of
> OpenMP
> > > > makes anything slower.
> > > > We have found the impact to be hardly measurable.
> > > > The difference between GOMP and iOMP is <5% on our benchmarks, most
> of
> > > the
> > > > time less than that.
> > > >
> > > > We just suggest to simplify the build of mxnet, by removing the
> > > > unnecessary dependency.
> > > >
> > > > During that we discovered for example the following amazing issue:
> > > > https://github.com/apache/incubator-mxnet/issues/14087
> > > >
> > > > Best Regards
> > > >
> > > > Stas
> > > >
> > > > On 18.06.19, 18:24, "Chris Olivier" <cjolivier01@gmail.com> wrote:
> > > >
> > > >     I am very reluctant to feed the trolls again, and this will be
> teh
> > > last
> > > >     time I address Pedro or Anton on the subject, but since I think
> the
> > > > numbers
> > > >     being presented are incorrect (either by te builders not really
> > > >     understanding what they are building, or possibly intentional
> > > > misdirection):
> > > >
> > > >     Turning Intel OMP on and off (and MKL as well, since it tends to
> pull
> > > > in
> > > >     omp, depending which one is linked in).
> > > >     There is a HUGE difference.  This is consistent with my
> experience
> > > > before
> > > >     when it was added.
> > > >
> > > >
> > > >     default mnist:
> > > >
> > > >     python ../example/image-classification/train_mnist.py
> > > >     INFO:root:start with arguments Namespace(add_stn=False,
> > > batch_size=64,
> > > >     disp_batches=100, dtype='float32', gc_threshold=0.5,
> gc_type='none',
> > > >     gpus=None, image_shape='1, 28, 28', initializer='default',
> > > >     kv_store='device', load_epoch=None, loss='', lr=0.05,
> lr_factor=0.1,
> > > >     lr_step_epochs='10', macrobatch_size=0, model_prefix=None,
> mom=0.9,
> > > >     monitor=0, network='mlp', num_classes=10, num_epochs=20,
> > > >     num_examples=60000, num_layers=None, optimizer='sgd',
> > > >     profile_server_suffix='', profile_worker_suffix='',
> save_period=1,
> > > >     test_io=0, top_k=0, warmup_epochs=5, warmup_strategy='linear',
> > > > wd=0.0001)
> > > >
> > > >     INTEL OMP:
> > > >
> > > >     ldd libmxnet.so | grep omp
> > > >             libomp.so =>
> > > >
>  /home/chris/src/mxnet/cmake_omp/3rdparty/openmp/runtime/src/libomp.so
> > > >     (0x00007f978fde7000)
> > > >
> > > >     :root:Epoch[0] Batch [0-100]        Speed: 31548.09 samples/sec
> > > >     accuracy=0.780012
> > > >     INFO:root:Epoch[0] Batch [100-200]      Speed: 16073.21
> samples/sec
> > > >     accuracy=0.920469
> > > >     INFO:root:Epoch[0] Batch [200-300]      Speed: 19075.91
> samples/sec
> > > >     accuracy=0.928281
> > > >     INFO:root:Epoch[0] Batch [300-400]      Speed: 23211.36
> samples/sec
> > > >     accuracy=0.942813
> > > >     INFO:root:Epoch[0] Batch [400-500]      Speed: 22139.79
> samples/sec
> > > >     accuracy=0.938750
> > > >     INFO:root:Epoch[0] Batch [500-600]      Speed: 23225.52
> samples/sec
> > > >     accuracy=0.946562
> > > >     INFO:root:Epoch[0] Batch [600-700]      Speed: 19547.41
> samples/sec
> > > >     accuracy=0.953281
> > > >     INFO:root:Epoch[0] Batch [700-800]      Speed: 24111.73
> samples/sec
> > > >     accuracy=0.951562
> > > >     INFO:root:Epoch[0] Batch [800-900]      Speed: 13959.88
> samples/sec
> > > >     accuracy=0.957500
> > > >     INFO:root:Epoch[0] Train-accuracy=0.925423
> > > >     INFO:root:Epoch[0] Time cost=3.806
> > > >     INFO:root:Epoch[0] Validation-accuracy=0.962580
> > > >     INFO:root:Epoch[1] Batch [0-100]        Speed: 24560.21
> samples/sec
> > > >     accuracy=0.968131
> > > >     INFO:root:Epoch[1] Batch [100-200]      Speed: 23457.03
> samples/sec
> > > >     accuracy=0.966250
> > > >
> > > >
> > > >     LIBGOMP:
> > > >
> > > >     ldd libmxnet.so | grep omp
> > > >             libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> > > >     (0x00007f25c25dd000)
> > > >
> > > >     INFO:root:Epoch[0] Batch [0-100]        Speed: 1731.01
> samples/sec
> > > >      accuracy=0.782488
> > > >     INFO:root:Epoch[0] Batch [100-200]      Speed: 3551.32
> samples/sec
> > > >      accuracy=0.907813
> > > >     INFO:root:Epoch[0] Batch [200-300]      Speed: 1991.00
> samples/sec
> > > >      accuracy=0.927188
> > > >     INFO:root:Epoch[0] Batch [300-400]      Speed: 2175.45
> samples/sec
> > > >      accuracy=0.937969
> > > >     INFO:root:Epoch[0] Batch [400-500]      Speed: 1644.95
> samples/sec
> > > >      accuracy=0.942187
> > > >     INFO:root:Epoch[0] Batch [500-600]      Speed: 6444.58
> samples/sec
> > > >      accuracy=0.950156
> > > >     INFO:root:Epoch[0] Batch [600-700]      Speed: 7842.16
> samples/sec
> > > >      accuracy=0.947969
> > > >     INFO:root:Epoch[0] Batch [700-800]      Speed: 9412.07
> samples/sec
> > > >      accuracy=0.953750
> > > >     INFO:root:Epoch[0] Batch [800-900]      Speed: 12707.58
> samples/sec
> > > >     accuracy=0.953125
> > > >
> > > >     That being said, there's other issued beyond speed.  The DEFAULT
> > > build
> > > > from
> > > >     makefile (not CMake) uses Intel OMP mkl (I showed before) and
> > > > mysteriously
> > > >     it has no issues?  This seems highly suspicious.  All I see is a
> lot
> > > of
> > > >     hand-waving and conjecture and pointing to StackOverflow posts
> made
> > > by
> > > >     people who may be of questionable pedigree to begin with.  This
> > > smells
> > > > of a
> > > >     Pedro-ego-fight rather than one of purely technical merit.
> Also, if
> > > > one
> > > >     knows how OMP works,  they would be very suspicious of the
> > > > "intermittent
> > > >     hangs" claim -- that's probably just broken race conditions
> elsewhere
> > > > until
> > > >     proven differently.  It'd tend freeze on the first use if
> something
> > > is
> > > >     wrong (try using libgomp after a fork and see), since worker
> threads"
> > > >     wouldn't be assigned/joined properly.  IntelOMP is faster, but
> also
> > > has
> > > >     other advantages, such as allowing OMP after a fork.
> > > >
> > > >     I actually addressed a lot of issues and ask for clarification
> in the
> > > >     original PR's way back when, but they're all just ignored.
> > > >
> > > >     -Chris
> > > >
> > > >
> > > >
> > > >
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message