mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hagay Lupesko <lupe...@gmail.com>
Subject Re: Release blocker: non-determinstic forward in gluon
Date Tue, 31 Jul 2018 01:34:37 GMT
Thanks Pedro.
Good to know you think it is important as well. I hope the community can
review a proposal on the CWiki soon? that would be great...

On Mon, Jul 30, 2018 at 4:26 AM Pedro Larroy <pedro.larroy.lists@gmail.com>
wrote:

> Hi Hagay
>
> We are aware of this and we are working in this direction which as you
> point out, is more desirable.
> There's a huge amount of non-trivial work that has gone into building these
> distribution packages from Sheng which needs to be adapted for our CI
> system, and taken into consideration.
>
> Pedro.
>
>
> On Mon, Jul 30, 2018 at 9:07 AM Hagay Lupesko <lupesko@gmail.com> wrote:
>
> > Thanks Tong for root-causing the issue!
> > Thanks Sheng for following up with an updated PyPi package.
> >
> > What worries me is that we seem to build MXNet PyPi distribution packages
> > with a build config different than the CI where all of the tests are
> > running.
> > Looking here [1
> > <
> >
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh
> > >]
> > it seems that MXNet CI Ubuntu build uses libopenblas-dev v0.2.18, while
> > PyPi build for MXNet 1.2.1 used v0.3.2 (I would imaging PyPi
> distribution?)
> >
> > Needless to say that if we don't make sure PyPi distribution is aligned
> > with the CI build, similar issues can happen again with other
> dependencies.
> > I'd think we want the build configs to be the same, or better yet have
> the
> > PyPi package be built from the output produced by the CI.
> > Thoughts?
> >
> > [1]
> >
> >
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh
> >
> >
> > On Fri, Jul 27, 2018 at 11:31 AM Sheng Zha <szha.pvg@gmail.com> wrote:
> >
> > > Tong,
> > >
> > > That's great news. I'm glad that OpenBLAS people are responding so
> > quickly.
> > > In that case it's probably a better idea to use that version instead.
> The
> > > latest OpenBLAS version brings many optimization for all kinds of
> > hardware.
> > >
> > > -sz
> > >
> > > On Fri, Jul 27, 2018 at 11:10 AM, Tong He <hetong007@gmail.com> wrote:
> > >
> > > > Hi Sheng,
> > > >
> > > > I also opened an issue on OpenBLAS repo:
> > > > https://github.com/xianyi/OpenBLAS/issues/1700 .
> > > >
> > > > As informed that "0.3.2 should be released this weekend", I tested
> > their
> > > > develope branch as well, and seems the new version has fixed the bug.
> > > >
> > > > Since OpenBLAS 0.3.2 could also have performance improvement,
> > therefore I
> > > > propose to wait for OpenBLAS 0.3.2 for our pip post release.
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > Tong He
> > > >
> > > > 2018-07-27 10:54 GMT-07:00 Sheng Zha <szha.pvg@gmail.com>:
> > > >
> > > > > Forgot to mention, the post release version is a pip package
> version.
> > > > >
> > > > > -sz
> > > > >
> > > > > > On Jul 27, 2018, at 10:42 AM, Sheng Zha <szha.pvg@gmail.com>
> > wrote:
> > > > > >
> > > > > > In this case we can regard it as a release problem, which is
> > usually
> > > > > what post release versions are for. It’s still the same release
> with
> > > > > different dependency, so there is no code change needed.
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > >
> > > > > >> On Jul 27, 2018, at 8:31 AM, Steffen Rochel <
> > > steffenrochel@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >> Hi Tong - thanks for root causing the problem.
> > > > > >> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix
be
> > released
> > > as
> > > > > >> 1.2.2?
> > > > > >> Steffen
> > > > > >>
> > > > > >>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha <szha.pvg@gmail.com>
> > > > wrote:
> > > > > >>>
> > > > > >>> Dear users and developers of Apache MXNet (Incubating),
> > > > > >>>
> > > > > >>> Thanks to Tong's dedication, the root cause for this
issue was
> > > > > identified
> > > > > >>> to be instability in OpenBLAS's latest stable version
0.3.1.
> For
> > > > > details,
> > > > > >>> see Tong's comment
> > > > > >>> <
> > > > > >>> https://github.com/apache/incubator-mxnet/issues/11853#
> > > > > issuecomment-408272772
> > > > > >>>>
> > > > > >>> .
> > > > > >>>
> > > > > >>> Since both the nightly build and the 1.2.1 wheels are
affected,
> > we
> > > > > >>> recommend that we stay on OpenBLAS last known stable
version
> > 0.2.20
> > > > > that
> > > > > >>> we've been using. I will assume lazy consensus and prepare
the
> > fix
> > > > > >>> (1.2.1.post0).
> > > > > >>>
> > > > > >>> -sz
> > > > > >>>
> > > > > >>>> On Tue, Jul 24, 2018 at 3:35 PM, Tong He <the@apache.org>
> > wrote:
> > > > > >>>>
> > > > > >>>> Recently there's an issue regarding the inconsistent
result
> from
> > > > gluon
> > > > > >>>> forward:
> > > > > >>>>
> > > > > >>>> https://github.com/apache/incubator-mxnet/issues/11853
> > > > > >>>>
> > > > > >>>> Given a constant input image and loaded pretrained
parameters,
> > we
> > > > > expect
> > > > > >>> a
> > > > > >>>> deterministic output from arbitrary repeats of forwards.
> However
> > > > from
> > > > > the
> > > > > >>>> issue I see that the forwarded result is non-determinstic.
It
> is
> > > > > harmful
> > > > > >>> as
> > > > > >>>> it makes the results from experments/benchmarks/inference
> > > > > meaningless.
> > > > > >>>>
> > > > > >>>> Therefore I propose to block the 1.3 release before
it gets
> > > > resolved.
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message