mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshani Nagmote <roshaninagmo...@gmail.com>
Subject Re: [VOTE] Release MXNet version 1.3.0.RC0
Date Wed, 12 Sep 2018 16:32:30 GMT
Thanks everyone for testing and voting for the release. I am working with
Sheng to finalize and post the release. Announcement will follow soon.

Regards,
Roshani

On Mon, Sep 10, 2018 at 7:03 AM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Tracked down the issue referred to above and it's not a bug.   I'll update
> the ticket.
>
> Changing to +1.
>
> On Mon, Sep 10, 2018 at 3:00 PM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > -0.1
> >
> > There's one test failure I've run into (details below).  Following
> Indhu's
> > logic I don't think this should block the release as it's not relating
> to a
> > release feature introduced in this version.
> >
> > I'm trying to use the cpp-package examples as reference code for how to
> > run MXNet models from a native context. I'd like to run them with ASAN
> as a
> > sanity check for memory leaks and pointer errors.  I was continually
> > running into segfaults and crashes w/ and w/o ASAN.  A little googling
> > shows me that this issue has already been reported, and is related to
> > running tests on CPU, not to any changes I made:
> > https://github.com/apache/incubator-mxnet/issues/9814  Having what our
> > effectively our reference examples crash is not a good practice IMO.
> >
> > I also share some concerns around the fp16 failures.  I know developers
> > who are currently porting their models to Gluon who use fp16.  They'll be
> > disappointed with the error.
> >
> > In general though, release looks good.  Big thanks to Sheng and Roshani
> > for putting it together (and sorry for the late testing).
> >
> > -Kellen
> >
> >
> > On Fri, Sep 7, 2018 at 4:31 AM Anirudh <anirudh2290@gmail.com> wrote:
> >
> >> -1 Considering that using fp16 with gluon is much easier than the
> >> alternative where you need access to the model code, this fix is really
> >> useful. I understand the pain of doing mxnet release and appreciate
> >> Roshani
> >> and Shengs efforts, but this seems like something we should fix.
> >>
> >> On Thu, Sep 6, 2018, 4:57 PM Haibin Lin <haibin.lin.aws@gmail.com>
> wrote:
> >>
> >> > +1 built from source and passes dist_sync_kvstore test on Ubuntu.
> >> >
> >> > Best,
> >> > Haibin
> >> >
> >> > On Thu, Sep 6, 2018 at 1:32 PM Indhu <indhubharathi@gmail.com> wrote:
> >> >
> >> > > +1
> >> > >
> >> > > The release candidate looks good. I'm able to build and run basic
> >> models.
> >> > >
> >> > > One the FP16 issue:
> >> > >
> >> > > Like others have pointed out, releases on expensive in terms of time
> >> and
> >> > > effort. There needs to be a high and more objective bar on what
> >> qualifies
> >> > > as a release blocker to make sure we are not setting precedence for
> a
> >> lot
> >> > > of release blockers in future.
> >> > >
> >> > > I think a release blocker is justified only if there is a serious
> bug
> >> > > discovered in one of the features included in the release or if
> there
> >> is
> >> > a
> >> > > regression. Given FP16 supports is not a new feature claimed in this
> >> > > release and this is not a regression in this release candidate, I'm
> >> > > inclined to release this candidate and include the FP16 fix in a
> >> > subsequent
> >> > > release.
> >> > >
> >> > > Thanks,
> >> > > Indu
> >> > >
> >> > > On Wed, Sep 5, 2018 at 10:21 AM Aaron Markham <
> >> aaron.s.markham@gmail.com
> >> > >
> >> > > wrote:
> >> > >
> >> > > > 0 (non-binding) If we have a problem that blocks users, and a
> >> solution
> >> > in
> >> > > > hand... then we should fix it, but not at the expense of starting
> >> the
> >> > > > release cycle again just for one fix. Users can cherry pick or
> build
> >> > from
> >> > > > master if they want the fix right away, right? I'd change my mind
> >> to -1
> >> > > if
> >> > > > this wasn't the case, with good reason, and if the user impact was
> >> > > critical
> >> > > > to adoption or risks abandonment.
> >> > > >
> >> > > >
> >> > > > On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <
> >> > > roshaninagmote2@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > I believe everyone here is working hard to make MXNet a better
> >> > > framework
> >> > > > > for users. It's completely okay to have different opinions, we
> can
> >> > > decide
> >> > > > > together if this issue is a blocker or not after voting time is
> >> over.
> >> > > > >
> >> > > > > As I mentioned before, voting will end at 7 pm today. So there
> is
> >> > still
> >> > > > > time to test the release. If there are any other issues anyone
> >> > finds, I
> >> > > > > will be happy to start the process again and work on RC1. For
> >> now, I
> >> > > want
> >> > > > > to encourage everyone to utilize this time and vote. :)
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Roshani
> >> > > > >
> >> > > > > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
> >> > > > > sandeep.krishna98@gmail.com> wrote:
> >> > > > >
> >> > > > > >    1. As a Apache MXNet community member, I raised the concern
> >> of
> >> > > > broken
> >> > > > > >    functionality for the user. I explained and provided the
> data
> >> > > points
> >> > > > > on
> >> > > > > > the
> >> > > > > >    issue, workaround and why I think it is important. If after
> >> all
> >> > > > this,
> >> > > > > > you
> >> > > > > >    think my vote is biased on my employer just because a user
> I
> >> > > quoted
> >> > > > is
> >> > > > > > from
> >> > > > > >    Amazon, this is more concerning to me on my voting
> abilities.
> >> > > > > >    2. My -1 no where undermines the huge amount of effort that
> >> goes
> >> > > > > behind
> >> > > > > >    the scene for a release to happen. Great respect and
> >> recognition
> >> > > for
> >> > > > > >    everyone involved in all the releases of MXNet in the past
> >> and
> >> > > > this. I
> >> > > > > >    voted on my judgement of what may be good for the users of
> >> > MXNet.
> >> > > > > >    3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free
> >> to
> >> > > > decide
> >> > > > > >    and progress on the release as we already have >3 +1 in
> this
> >> > > thread.
> >> > > > > >
> >> > > > > >
> >> > > > > > Best,
> >> > > > > >
> >> > > > > > Sandeep
> >> > > > > >
> >> > > > > > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <
> >> > cjolivier01@gmail.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > btw, there are no vetoes on package releases:
> >> > > > > > >
> >> > > > > > > VOTES ON PACKAGE RELEASES
> >> > > > > > > <https://www.apache.org/foundation/voting.html#ReleaseVotes
> >
> >> > > > > > >
> >> > > > > > > Votes on whether a package is ready to be released use
> >> majority
> >> > > > > approval
> >> > > > > > > <
> >> > https://www.apache.org/foundation/glossary.html#MajorityApproval>
> >> > > > --
> >> > > > > > i.e.
> >> > > > > > > at least three PMC members must vote affirmatively for
> >> release,
> >> > and
> >> > > > > there
> >> > > > > > > must be more positive than negative votes.Releases may not
> be
> >> > > vetoed.
> >> > > > > > > Generally
> >> > > > > > > the community will cancel the release vote if anyone
> >> identifies
> >> > > > serious
> >> > > > > > > problems, but in most cases the ultimate decision, lies with
> >> the
> >> > > > > > individual
> >> > > > > > > serving as release manager. The specifics of the process may
> >> vary
> >> > > > from
> >> > > > > > > project to project, but the 'minimum quorum of three +1
> votes'
> >> > rule
> >> > > > is
> >> > > > > > > universal.
> >> > > > > > >
> >> > > > > > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <
> szha.pvg@gmail.com>
> >> > > wrote:
> >> > > > > > >
> >> > > > > > > > Thanks for sharing your opinions, Thomas. Your recognition
> >> and
> >> > > > > respect
> >> > > > > > of
> >> > > > > > > > people's efforts on preparing the release candidate are
> >> > certainly
> >> > > > > > > > appreciated.
> >> > > > > > > >
> >> > > > > > > > Now that the vote is set to fail thanks to the veto, there
> >> will
> >> > > be
> >> > > > > > plenty
> >> > > > > > > > of opportunities to include those bug fixes, including the
> >> one
> >> > > Zhi
> >> > > > > > > > mentioned [1], which was already merged in the master and
> >> yet
> >> > > chose
> >> > > > > not
> >> > > > > > > to
> >> > > > > > > > block this release with [2]. I will be happy to work with
> >> > Roshani
> >> > > > to
> >> > > > > > > > prepare another release candidate once ready.
> >> > > > > > > >
> >> > > > > > > > -sz
> >> > > > > > > >
> >> > > > > > > > [1]
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> >> > > > > > > > [2]
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> >> > > > > > > >
> >> > > > > > > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <
> >> > > > > > thomas.delteil1@gmail.com
> >> > > > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > -0
> >> > > > > > > > > (non-binding)
> >> > > > > > > > >
> >> > > > > > > > > If I may add some nuancing plus a personal data point as
> >> one
> >> > of
> >> > > > the
> >> > > > > > > users
> >> > > > > > > > > commenting in the bug report in question:
> >> > > > > > > > >
> >> > > > > > > > > - Performance vs. Basic functionality => I don't think
> >> high
> >> > > > > > performance
> >> > > > > > > > > use-cases and basic functionality are two obviously
> >> opposed
> >> > > > > concepts
> >> > > > > > > and
> >> > > > > > > > > see no contradiction in Hagay's and Sandeep's
> statements.
> >> > > > > > > > > Float16 support is feature of MXNet that provides more
> >> than
> >> > > twice
> >> > > > > the
> >> > > > > > > > > performance of Float32 on supported platforms, hence the
> >> high
> >> > > > > > > performance
> >> > > > > > > > > use-case. The bug is that the basic functionality of
> >> > reloading
> >> > > a
> >> > > > > > saved
> >> > > > > > > > > float16 models is currently broken.
> >> > > > > > > > >
> >> > > > > > > > > - This bug vs Other bugs => Contrary the vast majority
> of
> >> the
> >> > > 140
> >> > > > > > open
> >> > > > > > > > bugs
> >> > > > > > > > > that are mentioned above, I would put to Sandeep's
> credit
> >> > that
> >> > > > this
> >> > > > > > one
> >> > > > > > > > bug
> >> > > > > > > > > has a PR open that provides a fix for it. This would
> make
> >> it
> >> > a
> >> > > > > better
> >> > > > > > > > > candidate to get included in this release than a bug
> that
> >> has
> >> > > no
> >> > > > > fix
> >> > > > > > > > ready
> >> > > > > > > > > for it.
> >> > > > > > > > >
> >> > > > > > > > > - Personal datapoint: I recently did some
> experimentation
> >> > with
> >> > > > > > float16
> >> > > > > > > > [1]
> >> > > > > > > > > and actually coincidentally just published a video on
> >> > > optimizing
> >> > > > > > > > > performance for Gluon. Float16 conversion is one of the
> >> most,
> >> > > if
> >> > > > > not
> >> > > > > > > the
> >> > > > > > > > > most effective way to get performance out of MXNet [2].
> I
> >> > > believe
> >> > > > > > there
> >> > > > > > > > is
> >> > > > > > > > > a lot of value in publicizing more its use and hence
> >> making
> >> > > sure
> >> > > > at
> >> > > > > > > least
> >> > > > > > > > > the basic support for normal use-cases is present.
> >> > > > > > > > >
> >> > > > > > > > > Of course this needs to be balanced with the overhead of
> >> > > > preparing
> >> > > > > a
> >> > > > > > > new
> >> > > > > > > > > release candidate once the fixed is reviewed and merged,
> >> > which
> >> > > > > seems
> >> > > > > > to
> >> > > > > > > > be
> >> > > > > > > > > a lengthy and complex process in its own right, and the
> >> delay
> >> > > > with
> >> > > > > > > > > providing the other features present in 1.3 for users
> that
> >> > are
> >> > > > not
> >> > > > > > > > running
> >> > > > > > > > > off the nightly builds.
> >> > > > > > > > >
> >> > > > > > > > > All the best,
> >> > > > > > > > >
> >> > > > > > > > > Thomas
> >> > > > > > > > >
> >> > > > > > > > > [1]
> >> > > https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> >> > > > > > > > > [2]
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> >> > > > > > > > >
> >> > > > > > > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <
> >> szha.pvg@gmail.com>
> >> > a
> >> > > > > > écrit :
> >> > > > > > > > >
> >> > > > > > > > > > Sandeep,
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks for explaining your veto. We have open bugs
> that
> >> > > > impacted
> >> > > > > a
> >> > > > > > > lot
> >> > > > > > > > > more
> >> > > > > > > > > > than just 3 customers, just by referring to the number
> >> of
> >> > > > > > commenters
> >> > > > > > > on
> >> > > > > > > > > the
> >> > > > > > > > > > issue [1].
> >> > > > > > > > > >
> >> > > > > > > > > > You said that this is for "high performance use
> cases",
> >> > which
> >> > > > > > > > contradicts
> >> > > > > > > > > > with Hagay's assement that this is "basic
> functionality
> >> > > > broken".
> >> > > > > > > Given
> >> > > > > > > > > that
> >> > > > > > > > > > this is for advanced use cases of using half-precision
> >> > > > training,
> >> > > > > > why
> >> > > > > > > is
> >> > > > > > > > > it
> >> > > > > > > > > > so much more important than any other open bug
> reports,
> >> > that
> >> > > > for
> >> > > > > > this
> >> > > > > > > > > > specific bug fix, we have to delay the access of
> regular
> >> > > users
> >> > > > to
> >> > > > > > the
> >> > > > > > > > new
> >> > > > > > > > > > MXNet 1.3 release by at least another week?
> >> > > > > > > > > >
> >> > > > > > > > > > Honestly, I'm concerned that your vote is biased by
> >> Amazon
> >> > > > > > > involvement,
> >> > > > > > > > > > given that you quoted Amazon Rekognition.
> >> > > > > > > > > >
> >> > > > > > > > > > -sz
> >> > > > > > > > > >
> >> > > > > > > > > > [1]
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> >> > > > > > > > > >
> >> > > > > > > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> >> > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > My initial vote of “-0” was due to lack of info
> from a
> >> > user
> >> > > > who
> >> > > > > > had
> >> > > > > > > > > said,
> >> > > > > > > > > > > he overcame this issue for FP16 model.
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > However, suggested workaround [1] for the issue is
> not
> >> > > > straight
> >> > > > > > > > forward
> >> > > > > > > > > > and
> >> > > > > > > > > > > generally usable for all users. Also, issue is not
> >> simple
> >> > > and
> >> > > > > > > > isolated
> >> > > > > > > > > to
> >> > > > > > > > > > > be listed in the Release Notes as known issue with a
> >> > > > > workaround.
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Changing my vote to: "-1 (binding)" owing to the
> user
> >> > > impact
> >> > > > > [3]
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > @Sheng:
> >> > > > > > > > > > >
> >> > > > > > > > > > > 1. Agreed, bug existed from long time. However, FP16
> >> and
> >> > > such
> >> > > > > > > > > > optimizations
> >> > > > > > > > > > > were added later on. Followed by users [2] using
> this
> >> > > feature
> >> > > > > for
> >> > > > > > > > high
> >> > > > > > > > > > > performance use cases. It is not ok to measure
> >> severity
> >> > of
> >> > > > the
> >> > > > > > bug
> >> > > > > > > > > based
> >> > > > > > > > > > on
> >> > > > > > > > > > > its past existence, rather we can see who is
> impacted
> >> now
> >> > > and
> >> > > > > is
> >> > > > > > > it a
> >> > > > > > > > > > small
> >> > > > > > > > > > > subset with a simple workaround or large user
> >> impacting
> >> > > > issue.
> >> > > > > > > > > > >
> >> > > > > > > > > > > 2. Agreed bug was reported 7/21. However, I became
> >> aware
> >> > of
> >> > > > > this
> >> > > > > > > > issue
> >> > > > > > > > > on
> >> > > > > > > > > > > 08/29 and submitted the fix on 08/30. Also, I did
> >> bring
> >> > > this
> >> > > > to
> >> > > > > > the
> >> > > > > > > > > > notice
> >> > > > > > > > > > > of community, you and 1.3 release manager (Roshani)
> on
> >> > the
> >> > > > RC0
> >> > > > > > > > proposal
> >> > > > > > > > > > > thread. Also, I would focus on the issue and user
> >> impact
> >> > > than
> >> > > > > who
> >> > > > > > > > > > > identified and who is fixing the issue.
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Based on my discussion with 2 users, I think it is a
> >> > > > important
> >> > > > > > > > feature
> >> > > > > > > > > > for
> >> > > > > > > > > > > them to see in Apache MXNet v1.3.0.
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Best,
> >> > > > > > > > > > >
> >> > > > > > > > > > > Sandeep
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > [1] Workaround used by the user.
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > net_fp16 =
> >> > > > > > > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> >> > > > > > > > > > > ['data'])
> >> > > > > > > > > > >
> >> > > > > > > > > > > params_fp16 =
> mx.nd.load('resnet34_fp16-0000.params')
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > for k, v in params_fp16.items():
> >> > > > > > > > > > >
> >> > > > > > > > > > >     new_key = k.split(':')[1]
> >> > > > > > > > > > >
> >> > > > > > > > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > net_fp16.collect_params().load('resnet34_fp16-0000.params',
> >> > > > > ctx)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > [2] Amazon Rekognition
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > [3] User story: Train a model -> Cast it to FP16 ->
> >> Save
> >> > > the
> >> > > > > > model
> >> > > > > > > ->
> >> > > > > > > > > > Load
> >> > > > > > > > > > > back the model does not work. They have to cast
> every
> >> > > > parameter
> >> > > > > > > with
> >> > > > > > > > a
> >> > > > > > > > > > > workaround mentioned above [1].
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <
> >> > > > > lupesko@gmail.com>
> >> > > > > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Hi Sheng,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Addressing your questions:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > - "why this specific bug is more important than
> all
> >> the
> >> > > > other
> >> > > > > > > known
> >> > > > > > > > > > bugs,
> >> > > > > > > > > > > > that this becomes a release blocker"
> >> > > > > > > > > > > > I do not consider it to be more or less important
> >> than
> >> > > > other
> >> > > > > > > fixes.
> >> > > > > > > > > It
> >> > > > > > > > > > > can
> >> > > > > > > > > > > > be fixed and included in the release alongside the
> >> rest
> >> > > of
> >> > > > > the
> >> > > > > > > > > release
> >> > > > > > > > > > > > content, right?
> >> > > > > > > > > > > > From the description of the issue it seems
> important
> >> > > since
> >> > > > it
> >> > > > > > is
> >> > > > > > > > > > blocking
> >> > > > > > > > > > > > users from loading models that were previously
> >> trained
> >> > > and
> >> > > > > > saved.
> >> > > > > > > > > There
> >> > > > > > > > > > > is
> >> > > > > > > > > > > > nothing stopping the community from including this
> >> fix
> >> > > into
> >> > > > > > > 1.3.0,
> >> > > > > > > > > > > > alongside the rest of the features and fixes.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > - "The bug exists since SymbolBlock was
> introduced a
> >> > year
> >> > > > ago
> >> > > > > > and
> >> > > > > > > > has
> >> > > > > > > > > > > > survived at least three releases, so this is not a
> >> > > > > regression."
> >> > > > > > > > > > > > I do not think I said it is a regression. However,
> >> the
> >> > > > fact a
> >> > > > > > bug
> >> > > > > > > > > > existed
> >> > > > > > > > > > > > before, does not mean it is OK to release it
> rather
> >> > than
> >> > > > fix
> >> > > > > > it.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > - "Timeline-wise, this bug was reported on 7/21,
> but
> >> > was
> >> > > > not
> >> > > > > > > > reported
> >> > > > > > > > > > as
> >> > > > > > > > > > > > release-blocker in the release discussion thread
> >> until
> >> > > 8/31
> >> > > > > > [1].
> >> > > > > > > > > > Neither
> >> > > > > > > > > > > > its reporting as release-blocker nor its fix made
> it
> >> > for
> >> > > > the
> >> > > > > > 8/3
> >> > > > > > > > code
> >> > > > > > > > > > > > freeze."
> >> > > > > > > > > > > > You are right, would have been better to have this
> >> > > > identified
> >> > > > > > and
> >> > > > > > > > > fixed
> >> > > > > > > > > > > > earlier and included before code freeze.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > - "The PR is still not ready yet as it doesn't
> have
> >> > > > > approval."
> >> > > > > > > > > > > > I think it is waiting for your review.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > - "it would be great if you could provide some
> >> > additional
> >> > > > > > > reasoning
> >> > > > > > > > > > > besides
> >> > > > > > > > > > > > "X mentions the issue" or "fix was done by X""
> >> > > > > > > > > > > > I have. Repeating what I wrote in my previous
> email
> >> for
> >> > > > > > clarity:
> >> > > > > > > > > Basic
> >> > > > > > > > > > > > functionality broken: loading a model (albeit one
> >> that
> >> > > that
> >> > > > > was
> >> > > > > > > > saved
> >> > > > > > > > > > as
> >> > > > > > > > > > > > non FP32)
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > So, yes - this issue seems to have been out there
> >> for a
> >> > > > > while,
> >> > > > > > > > > somehow
> >> > > > > > > > > > > went
> >> > > > > > > > > > > > under the radar... but I think the key question is
> >> > > whether
> >> > > > > this
> >> > > > > > > > > blocks
> >> > > > > > > > > > a
> >> > > > > > > > > > > > basic functionality in MXNet. I believe so, hence
> >> my -1
> >> > > > vote.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Hagay
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <
> >> > > > szha.pvg@gmail.com
> >> > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > Hi Hagay and Sandeep,
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Could you help us understand why this specific
> >> bug is
> >> > > > more
> >> > > > > > > > > important
> >> > > > > > > > > > > than
> >> > > > > > > > > > > > > all the other known bugs, that this becomes a
> >> release
> >> > > > > > blocker?
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Some facts to consider:
> >> > > > > > > > > > > > > - The bug exists since SymbolBlock was
> introduced
> >> a
> >> > > year
> >> > > > > ago
> >> > > > > > > and
> >> > > > > > > > > has
> >> > > > > > > > > > > > > survived at least three releases, so this is
> not a
> >> > > > > > regression.
> >> > > > > > > > > > > > > - Timeline-wise, this bug was reported on 7/21,
> >> but
> >> > was
> >> > > > not
> >> > > > > > > > > reported
> >> > > > > > > > > > as
> >> > > > > > > > > > > > > release-blocker in the release discussion thread
> >> > until
> >> > > > 8/31
> >> > > > > > > [1].
> >> > > > > > > > > > > Neither
> >> > > > > > > > > > > > > its reporting as release-blocker nor its fix
> made
> >> it
> >> > > for
> >> > > > > the
> >> > > > > > > 8/3
> >> > > > > > > > > code
> >> > > > > > > > > > > > > freeze.
> >> > > > > > > > > > > > > - The PR is still not ready yet as it doesn't
> have
> >> > > > > approval.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Hagay, it would be great if you could provide
> some
> >> > > > > additional
> >> > > > > > > > > > reasoning
> >> > > > > > > > > > > > > besides "X mentions the issue" or "fix was done
> by
> >> > X".
> >> > > > > > Thanks.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > -sz
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > [1]
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
> >> > > > > > > lupesko@gmail.com
> >> > > > > > > > >
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Sandeep mentions the issue of an error when
> user
> >> > > tries
> >> > > > to
> >> > > > > > > load
> >> > > > > > > > > > model
> >> > > > > > > > > > > > > params
> >> > > > > > > > > > > > > > trained/saved as FP16.
> >> > > > > > > > > > > > > >
> >> > > https://github.com/apache/incubator-mxnet/issues/11849
> >> > > > > > > > > > > > > > The fix was done by Sandeep:
> >> > > > > > > > > > > > > >
> >> > https://github.com/apache/incubator-mxnet/pull/12412
> >> > > > and
> >> > > > > > is
> >> > > > > > > > > ready
> >> > > > > > > > > > to
> >> > > > > > > > > > > > be
> >> > > > > > > > > > > > > > cherry picked into the release branch.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > This seems like a release blocker to me:
> >> > > > > > > > > > > > > > - Basic functionality broken: loading a model
> >> > (albeit
> >> > > > one
> >> > > > > > > that
> >> > > > > > > > > that
> >> > > > > > > > > > > was
> >> > > > > > > > > > > > > > saved as non FP32)
> >> > > > > > > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
> >> > > > > > ThomasDelteil@
> >> > > > > > > )
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > -1 (non binding)
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Hagay
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep
> >> > > krishnamurthy <
> >> > > > > > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > "- 0"
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > I believe the bug #11849
> >> > > > > > > > > > > > > > > <
> >> > > > > https://github.com/apache/incubator-mxnet/issues/11849
> >> > > > > > >,
> >> > > > > > > > > unable
> >> > > > > > > > > > > to
> >> > > > > > > > > > > > > > import
> >> > > > > > > > > > > > > > > non-fp32 models into Gluon, fixed in this PR
> >> > #12412
> >> > > > > > > > > > > > > > > <
> >> > > > https://github.com/apache/incubator-mxnet/pull/12412>
> >> > > > > > is
> >> > > > > > > > > > > important
> >> > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > users. I would rather pick this fix in this
> >> > release
> >> > > > > than
> >> > > > > > > > plan a
> >> > > > > > > > > > > minor
> >> > > > > > > > > > > > > > > release later.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Best,
> >> > > > > > > > > > > > > > > Sandeep
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> >> > > > > > > > > > > > chohyu01@cs.washington.edu>
> >> > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Actually, the command "git clone
> --recursive
> >> > > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet
> >> -b
> >> > > > > > 1.3.0.rc0"
> >> > > > > > > > > works
> >> > > > > > > > > > > fine
> >> > > > > > > > > > > > > > now,
> >> > > > > > > > > > > > > > > > never mind.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho
> <
> >> > > > > > > > > > > > > chohyu01@cs.washington.edu>
> >> > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Unfortunately, MXNet was depending on a
> >> > branch
> >> > > of
> >> > > > > TVM
> >> > > > > > > > that
> >> > > > > > > > > is
> >> > > > > > > > > > > now
> >> > > > > > > > > > > > > > > > deleted.
> >> > > > > > > > > > > > > > > > > We will have to merge #12448
> >> > > > > > > > > > > > > > > > > <
> >> > > > > > https://github.com/apache/incubator-mxnet/pull/12448>
> >> > > > > > > > > > before
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > release.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Background: See dmlc/tvm#1394 <
> >> > > > > > > > > > > > > > https://github.com/dmlc/tvm/issues/1394
> >> > > > > > > > > > > > > > > >.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Philip.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin
> >> Meier <
> >> > > > > > > > > > > carinmeier@gmail.com
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >> Checked out the tag, built and tested
> the
> >> > > > Clojure
> >> > > > > > > > package.
> >> > > > > > > > > > +1
> >> > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM
> Roshani
> >> > > > Nagmote <
> >> > > > > > > > > > > > > > > > >> roshaninagmote2@gmail.com>
> >> > > > > > > > > > > > > > > > >> wrote:
> >> > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > > >> > Hi all,
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >> > I would like to propose a vote to
> >> release
> >> > > > Apache
> >> > > > > > > MXNet
> >> > > > > > > > > > > > > > (incubating)
> >> > > > > > > > > > > > > > > > >> version
> >> > > > > > > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now
> >> (Friday,
> >> > > Aug
> >> > > > > > 31st)
> >> > > > > > > > and
> >> > > > > > > > > > end
> >> > > > > > > > > > > at
> >> > > > > > > > > > > > > > 7:00
> >> > > > > > > > > > > > > > > PM
> >> > > > > > > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >> > Link to release notes:
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > https://github.com/apache/incubator-mxnet/releases
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> >> > > > > > > > > > > > > > > > >> > *
> >> > > > > > > > > > > > >
> >> > > > > > >
> >> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> >> > > > > > > > > > > > > > > > >> > <
> >> > > > > > > > > > > > >
> >> > > > > > >
> >> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> >> > > > > > > > > > > > > > > >0*
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >> > View this page, click on "Build from
> >> > > Source",
> >> > > > > and
> >> > > > > > > use
> >> > > > > > > > > the
> >> > > > > > > > > > > > source
> >> > > > > > > > > > > > > > > code
> >> > > > > > > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > https://mxnet.incubator.apache.org/install/index.html
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >> > Please remember to TEST first before
> >> > voting
> >> > > > > > > > accordingly:
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >> > +1 = approve
> >> > > > > > > > > > > > > > > > >> > +0 = no opinion
> >> > > > > > > > > > > > > > > > >> > -1 = disapprove (provide reason)
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >> > Thanks,
> >> > > > > > > > > > > > > > > > >> > Roshani
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > --
> >> > > > > > > > > > > > > > > Sandeep Krishnamurthy
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > --
> >> > > > > > > > > > > Sandeep Krishnamurthy
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > Sandeep Krishnamurthy
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message