mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hagay Lupesko <lupe...@gmail.com>
Subject Re: Release blocker: non-determinstic forward in gluon
Date Mon, 30 Jul 2018 07:06:40 GMT
Thanks Tong for root-causing the issue!
Thanks Sheng for following up with an updated PyPi package.

What worries me is that we seem to build MXNet PyPi distribution packages
with a build config different than the CI where all of the tests are
running.
Looking here [1
<https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh>]
it seems that MXNet CI Ubuntu build uses libopenblas-dev v0.2.18, while
PyPi build for MXNet 1.2.1 used v0.3.2 (I would imaging PyPi distribution?)

Needless to say that if we don't make sure PyPi distribution is aligned
with the CI build, similar issues can happen again with other dependencies.
I'd think we want the build configs to be the same, or better yet have the
PyPi package be built from the output produced by the CI.
Thoughts?

[1]
https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh


On Fri, Jul 27, 2018 at 11:31 AM Sheng Zha <szha.pvg@gmail.com> wrote:

> Tong,
>
> That's great news. I'm glad that OpenBLAS people are responding so quickly.
> In that case it's probably a better idea to use that version instead. The
> latest OpenBLAS version brings many optimization for all kinds of hardware.
>
> -sz
>
> On Fri, Jul 27, 2018 at 11:10 AM, Tong He <hetong007@gmail.com> wrote:
>
> > Hi Sheng,
> >
> > I also opened an issue on OpenBLAS repo:
> > https://github.com/xianyi/OpenBLAS/issues/1700 .
> >
> > As informed that "0.3.2 should be released this weekend", I tested their
> > develope branch as well, and seems the new version has fixed the bug.
> >
> > Since OpenBLAS 0.3.2 could also have performance improvement, therefore I
> > propose to wait for OpenBLAS 0.3.2 for our pip post release.
> >
> >
> > Best regards,
> >
> > Tong He
> >
> > 2018-07-27 10:54 GMT-07:00 Sheng Zha <szha.pvg@gmail.com>:
> >
> > > Forgot to mention, the post release version is a pip package version.
> > >
> > > -sz
> > >
> > > > On Jul 27, 2018, at 10:42 AM, Sheng Zha <szha.pvg@gmail.com> wrote:
> > > >
> > > > In this case we can regard it as a release problem, which is usually
> > > what post release versions are for. It’s still the same release with
> > > different dependency, so there is no code change needed.
> > > >
> > > > -sz
> > > >
> > > >
> > > >> On Jul 27, 2018, at 8:31 AM, Steffen Rochel <
> steffenrochel@gmail.com>
> > > wrote:
> > > >>
> > > >> Hi Tong - thanks for root causing the problem.
> > > >> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be released
> as
> > > >> 1.2.2?
> > > >> Steffen
> > > >>
> > > >>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha <szha.pvg@gmail.com>
> > wrote:
> > > >>>
> > > >>> Dear users and developers of Apache MXNet (Incubating),
> > > >>>
> > > >>> Thanks to Tong's dedication, the root cause for this issue was
> > > identified
> > > >>> to be instability in OpenBLAS's latest stable version 0.3.1. For
> > > details,
> > > >>> see Tong's comment
> > > >>> <
> > > >>> https://github.com/apache/incubator-mxnet/issues/11853#
> > > issuecomment-408272772
> > > >>>>
> > > >>> .
> > > >>>
> > > >>> Since both the nightly build and the 1.2.1 wheels are affected,
we
> > > >>> recommend that we stay on OpenBLAS last known stable version 0.2.20
> > > that
> > > >>> we've been using. I will assume lazy consensus and prepare the
fix
> > > >>> (1.2.1.post0).
> > > >>>
> > > >>> -sz
> > > >>>
> > > >>>> On Tue, Jul 24, 2018 at 3:35 PM, Tong He <the@apache.org>
wrote:
> > > >>>>
> > > >>>> Recently there's an issue regarding the inconsistent result
from
> > gluon
> > > >>>> forward:
> > > >>>>
> > > >>>> https://github.com/apache/incubator-mxnet/issues/11853
> > > >>>>
> > > >>>> Given a constant input image and loaded pretrained parameters,
we
> > > expect
> > > >>> a
> > > >>>> deterministic output from arbitrary repeats of forwards. However
> > from
> > > the
> > > >>>> issue I see that the forwarded result is non-determinstic.
It is
> > > harmful
> > > >>> as
> > > >>>> it makes the results from experments/benchmarks/inference
> > > meaningless.
> > > >>>>
> > > >>>> Therefore I propose to block the 1.3 release before it gets
> > resolved.
> > > >>>>
> > > >>>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message