mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kellen sunderland <kellen.sunderl...@gmail.com>
Subject Re: [VOTE] Release MXNet version 1.3.0.RC0
Date Mon, 10 Sep 2018 13:00:55 GMT
-0.1

There's one test failure I've run into (details below).  Following Indhu's
logic I don't think this should block the release as it's not relating to a
release feature introduced in this version.

I'm trying to use the cpp-package examples as reference code for how to run
MXNet models from a native context. I'd like to run them with ASAN as a
sanity check for memory leaks and pointer errors.  I was continually
running into segfaults and crashes w/ and w/o ASAN.  A little googling
shows me that this issue has already been reported, and is related to
running tests on CPU, not to any changes I made:
https://github.com/apache/incubator-mxnet/issues/9814  Having what our
effectively our reference examples crash is not a good practice IMO.

I also share some concerns around the fp16 failures.  I know developers who
are currently porting their models to Gluon who use fp16.  They'll be
disappointed with the error.

In general though, release looks good.  Big thanks to Sheng and Roshani for
putting it together (and sorry for the late testing).

-Kellen


On Fri, Sep 7, 2018 at 4:31 AM Anirudh <anirudh2290@gmail.com> wrote:

> -1 Considering that using fp16 with gluon is much easier than the
> alternative where you need access to the model code, this fix is really
> useful. I understand the pain of doing mxnet release and appreciate Roshani
> and Shengs efforts, but this seems like something we should fix.
>
> On Thu, Sep 6, 2018, 4:57 PM Haibin Lin <haibin.lin.aws@gmail.com> wrote:
>
> > +1 built from source and passes dist_sync_kvstore test on Ubuntu.
> >
> > Best,
> > Haibin
> >
> > On Thu, Sep 6, 2018 at 1:32 PM Indhu <indhubharathi@gmail.com> wrote:
> >
> > > +1
> > >
> > > The release candidate looks good. I'm able to build and run basic
> models.
> > >
> > > One the FP16 issue:
> > >
> > > Like others have pointed out, releases on expensive in terms of time
> and
> > > effort. There needs to be a high and more objective bar on what
> qualifies
> > > as a release blocker to make sure we are not setting precedence for a
> lot
> > > of release blockers in future.
> > >
> > > I think a release blocker is justified only if there is a serious bug
> > > discovered in one of the features included in the release or if there
> is
> > a
> > > regression. Given FP16 supports is not a new feature claimed in this
> > > release and this is not a regression in this release candidate, I'm
> > > inclined to release this candidate and include the FP16 fix in a
> > subsequent
> > > release.
> > >
> > > Thanks,
> > > Indu
> > >
> > > On Wed, Sep 5, 2018 at 10:21 AM Aaron Markham <
> aaron.s.markham@gmail.com
> > >
> > > wrote:
> > >
> > > > 0 (non-binding) If we have a problem that blocks users, and a
> solution
> > in
> > > > hand... then we should fix it, but not at the expense of starting the
> > > > release cycle again just for one fix. Users can cherry pick or build
> > from
> > > > master if they want the fix right away, right? I'd change my mind to
> -1
> > > if
> > > > this wasn't the case, with good reason, and if the user impact was
> > > critical
> > > > to adoption or risks abandonment.
> > > >
> > > >
> > > > On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <
> > > roshaninagmote2@gmail.com>
> > > > wrote:
> > > >
> > > > > I believe everyone here is working hard to make MXNet a better
> > > framework
> > > > > for users. It's completely okay to have different opinions, we can
> > > decide
> > > > > together if this issue is a blocker or not after voting time is
> over.
> > > > >
> > > > > As I mentioned before, voting will end at 7 pm today. So there is
> > still
> > > > > time to test the release. If there are any other issues anyone
> > finds, I
> > > > > will be happy to start the process again and work on RC1. For now,
> I
> > > want
> > > > > to encourage everyone to utilize this time and vote. :)
> > > > >
> > > > > Thanks,
> > > > > Roshani
> > > > >
> > > > > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
> > > > > sandeep.krishna98@gmail.com> wrote:
> > > > >
> > > > > >    1. As a Apache MXNet community member, I raised the concern of
> > > > broken
> > > > > >    functionality for the user. I explained and provided the data
> > > points
> > > > > on
> > > > > > the
> > > > > >    issue, workaround and why I think it is important. If after
> all
> > > > this,
> > > > > > you
> > > > > >    think my vote is biased on my employer just because a user I
> > > quoted
> > > > is
> > > > > > from
> > > > > >    Amazon, this is more concerning to me on my voting abilities.
> > > > > >    2. My -1 no where undermines the huge amount of effort that
> goes
> > > > > behind
> > > > > >    the scene for a release to happen. Great respect and
> recognition
> > > for
> > > > > >    everyone involved in all the releases of MXNet in the past and
> > > > this. I
> > > > > >    voted on my judgement of what may be good for the users of
> > MXNet.
> > > > > >    3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free to
> > > > decide
> > > > > >    and progress on the release as we already have >3 +1 in this
> > > thread.
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Sandeep
> > > > > >
> > > > > > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <
> > cjolivier01@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > btw, there are no vetoes on package releases:
> > > > > > >
> > > > > > > VOTES ON PACKAGE RELEASES
> > > > > > > <https://www.apache.org/foundation/voting.html#ReleaseVotes>
> > > > > > >
> > > > > > > Votes on whether a package is ready to be released use majority
> > > > > approval
> > > > > > > <
> > https://www.apache.org/foundation/glossary.html#MajorityApproval>
> > > > --
> > > > > > i.e.
> > > > > > > at least three PMC members must vote affirmatively for release,
> > and
> > > > > there
> > > > > > > must be more positive than negative votes.Releases may not be
> > > vetoed.
> > > > > > > Generally
> > > > > > > the community will cancel the release vote if anyone identifies
> > > > serious
> > > > > > > problems, but in most cases the ultimate decision, lies with
> the
> > > > > > individual
> > > > > > > serving as release manager. The specifics of the process may
> vary
> > > > from
> > > > > > > project to project, but the 'minimum quorum of three +1 votes'
> > rule
> > > > is
> > > > > > > universal.
> > > > > > >
> > > > > > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <szha.pvg@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Thanks for sharing your opinions, Thomas. Your recognition
> and
> > > > > respect
> > > > > > of
> > > > > > > > people's efforts on preparing the release candidate are
> > certainly
> > > > > > > > appreciated.
> > > > > > > >
> > > > > > > > Now that the vote is set to fail thanks to the veto, there
> will
> > > be
> > > > > > plenty
> > > > > > > > of opportunities to include those bug fixes, including the
> one
> > > Zhi
> > > > > > > > mentioned [1], which was already merged in the master and yet
> > > chose
> > > > > not
> > > > > > > to
> > > > > > > > block this release with [2]. I will be happy to work with
> > Roshani
> > > > to
> > > > > > > > prepare another release candidate once ready.
> > > > > > > >
> > > > > > > > -sz
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> > > > > > > > [2]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> > > > > > > >
> > > > > > > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <
> > > > > > thomas.delteil1@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > -0
> > > > > > > > > (non-binding)
> > > > > > > > >
> > > > > > > > > If I may add some nuancing plus a personal data point as
> one
> > of
> > > > the
> > > > > > > users
> > > > > > > > > commenting in the bug report in question:
> > > > > > > > >
> > > > > > > > > - Performance vs. Basic functionality => I don't think high
> > > > > > performance
> > > > > > > > > use-cases and basic functionality are two obviously opposed
> > > > > concepts
> > > > > > > and
> > > > > > > > > see no contradiction in Hagay's and Sandeep's statements.
> > > > > > > > > Float16 support is feature of MXNet that provides more than
> > > twice
> > > > > the
> > > > > > > > > performance of Float32 on supported platforms, hence the
> high
> > > > > > > performance
> > > > > > > > > use-case. The bug is that the basic functionality of
> > reloading
> > > a
> > > > > > saved
> > > > > > > > > float16 models is currently broken.
> > > > > > > > >
> > > > > > > > > - This bug vs Other bugs => Contrary the vast majority of
> the
> > > 140
> > > > > > open
> > > > > > > > bugs
> > > > > > > > > that are mentioned above, I would put to Sandeep's credit
> > that
> > > > this
> > > > > > one
> > > > > > > > bug
> > > > > > > > > has a PR open that provides a fix for it. This would make
> it
> > a
> > > > > better
> > > > > > > > > candidate to get included in this release than a bug that
> has
> > > no
> > > > > fix
> > > > > > > > ready
> > > > > > > > > for it.
> > > > > > > > >
> > > > > > > > > - Personal datapoint: I recently did some experimentation
> > with
> > > > > > float16
> > > > > > > > [1]
> > > > > > > > > and actually coincidentally just published a video on
> > > optimizing
> > > > > > > > > performance for Gluon. Float16 conversion is one of the
> most,
> > > if
> > > > > not
> > > > > > > the
> > > > > > > > > most effective way to get performance out of MXNet [2]. I
> > > believe
> > > > > > there
> > > > > > > > is
> > > > > > > > > a lot of value in publicizing more its use and hence making
> > > sure
> > > > at
> > > > > > > least
> > > > > > > > > the basic support for normal use-cases is present.
> > > > > > > > >
> > > > > > > > > Of course this needs to be balanced with the overhead of
> > > > preparing
> > > > > a
> > > > > > > new
> > > > > > > > > release candidate once the fixed is reviewed and merged,
> > which
> > > > > seems
> > > > > > to
> > > > > > > > be
> > > > > > > > > a lengthy and complex process in its own right, and the
> delay
> > > > with
> > > > > > > > > providing the other features present in 1.3 for users that
> > are
> > > > not
> > > > > > > > running
> > > > > > > > > off the nightly builds.
> > > > > > > > >
> > > > > > > > > All the best,
> > > > > > > > >
> > > > > > > > > Thomas
> > > > > > > > >
> > > > > > > > > [1]
> > > https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> > > > > > > > >
> > > > > > > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <
> szha.pvg@gmail.com>
> > a
> > > > > > écrit :
> > > > > > > > >
> > > > > > > > > > Sandeep,
> > > > > > > > > >
> > > > > > > > > > Thanks for explaining your veto. We have open bugs that
> > > > impacted
> > > > > a
> > > > > > > lot
> > > > > > > > > more
> > > > > > > > > > than just 3 customers, just by referring to the number of
> > > > > > commenters
> > > > > > > on
> > > > > > > > > the
> > > > > > > > > > issue [1].
> > > > > > > > > >
> > > > > > > > > > You said that this is for "high performance use cases",
> > which
> > > > > > > > contradicts
> > > > > > > > > > with Hagay's assement that this is "basic functionality
> > > > broken".
> > > > > > > Given
> > > > > > > > > that
> > > > > > > > > > this is for advanced use cases of using half-precision
> > > > training,
> > > > > > why
> > > > > > > is
> > > > > > > > > it
> > > > > > > > > > so much more important than any other open bug reports,
> > that
> > > > for
> > > > > > this
> > > > > > > > > > specific bug fix, we have to delay the access of regular
> > > users
> > > > to
> > > > > > the
> > > > > > > > new
> > > > > > > > > > MXNet 1.3 release by at least another week?
> > > > > > > > > >
> > > > > > > > > > Honestly, I'm concerned that your vote is biased by
> Amazon
> > > > > > > involvement,
> > > > > > > > > > given that you quoted Amazon Rekognition.
> > > > > > > > > >
> > > > > > > > > > -sz
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > My initial vote of “-0” was due to lack of info from a
> > user
> > > > who
> > > > > > had
> > > > > > > > > said,
> > > > > > > > > > > he overcame this issue for FP16 model.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > However, suggested workaround [1] for the issue is not
> > > > straight
> > > > > > > > forward
> > > > > > > > > > and
> > > > > > > > > > > generally usable for all users. Also, issue is not
> simple
> > > and
> > > > > > > > isolated
> > > > > > > > > to
> > > > > > > > > > > be listed in the Release Notes as known issue with a
> > > > > workaround.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Changing my vote to: "-1 (binding)" owing to the user
> > > impact
> > > > > [3]
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > @Sheng:
> > > > > > > > > > >
> > > > > > > > > > > 1. Agreed, bug existed from long time. However, FP16
> and
> > > such
> > > > > > > > > > optimizations
> > > > > > > > > > > were added later on. Followed by users [2] using this
> > > feature
> > > > > for
> > > > > > > > high
> > > > > > > > > > > performance use cases. It is not ok to measure severity
> > of
> > > > the
> > > > > > bug
> > > > > > > > > based
> > > > > > > > > > on
> > > > > > > > > > > its past existence, rather we can see who is impacted
> now
> > > and
> > > > > is
> > > > > > > it a
> > > > > > > > > > small
> > > > > > > > > > > subset with a simple workaround or large user impacting
> > > > issue.
> > > > > > > > > > >
> > > > > > > > > > > 2. Agreed bug was reported 7/21. However, I became
> aware
> > of
> > > > > this
> > > > > > > > issue
> > > > > > > > > on
> > > > > > > > > > > 08/29 and submitted the fix on 08/30. Also, I did bring
> > > this
> > > > to
> > > > > > the
> > > > > > > > > > notice
> > > > > > > > > > > of community, you and 1.3 release manager (Roshani) on
> > the
> > > > RC0
> > > > > > > > proposal
> > > > > > > > > > > thread. Also, I would focus on the issue and user
> impact
> > > than
> > > > > who
> > > > > > > > > > > identified and who is fixing the issue.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Based on my discussion with 2 users, I think it is a
> > > > important
> > > > > > > > feature
> > > > > > > > > > for
> > > > > > > > > > > them to see in Apache MXNet v1.3.0.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > >
> > > > > > > > > > > Sandeep
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > [1] Workaround used by the user.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > net_fp16 =
> > > > > > > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > > > > > > > > > > ['data'])
> > > > > > > > > > >
> > > > > > > > > > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > for k, v in params_fp16.items():
> > > > > > > > > > >
> > > > > > > > > > >     new_key = k.split(':')[1]
> > > > > > > > > > >
> > > > > > > > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > net_fp16.collect_params().load('resnet34_fp16-0000.params',
> > > > > ctx)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > [2] Amazon Rekognition
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > [3] User story: Train a model -> Cast it to FP16 ->
> Save
> > > the
> > > > > > model
> > > > > > > ->
> > > > > > > > > > Load
> > > > > > > > > > > back the model does not work. They have to cast every
> > > > parameter
> > > > > > > with
> > > > > > > > a
> > > > > > > > > > > workaround mentioned above [1].
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <
> > > > > lupesko@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Sheng,
> > > > > > > > > > > >
> > > > > > > > > > > > Addressing your questions:
> > > > > > > > > > > >
> > > > > > > > > > > > - "why this specific bug is more important than all
> the
> > > > other
> > > > > > > known
> > > > > > > > > > bugs,
> > > > > > > > > > > > that this becomes a release blocker"
> > > > > > > > > > > > I do not consider it to be more or less important
> than
> > > > other
> > > > > > > fixes.
> > > > > > > > > It
> > > > > > > > > > > can
> > > > > > > > > > > > be fixed and included in the release alongside the
> rest
> > > of
> > > > > the
> > > > > > > > > release
> > > > > > > > > > > > content, right?
> > > > > > > > > > > > From the description of the issue it seems important
> > > since
> > > > it
> > > > > > is
> > > > > > > > > > blocking
> > > > > > > > > > > > users from loading models that were previously
> trained
> > > and
> > > > > > saved.
> > > > > > > > > There
> > > > > > > > > > > is
> > > > > > > > > > > > nothing stopping the community from including this
> fix
> > > into
> > > > > > > 1.3.0,
> > > > > > > > > > > > alongside the rest of the features and fixes.
> > > > > > > > > > > >
> > > > > > > > > > > > - "The bug exists since SymbolBlock was introduced a
> > year
> > > > ago
> > > > > > and
> > > > > > > > has
> > > > > > > > > > > > survived at least three releases, so this is not a
> > > > > regression."
> > > > > > > > > > > > I do not think I said it is a regression. However,
> the
> > > > fact a
> > > > > > bug
> > > > > > > > > > existed
> > > > > > > > > > > > before, does not mean it is OK to release it rather
> > than
> > > > fix
> > > > > > it.
> > > > > > > > > > > >
> > > > > > > > > > > > - "Timeline-wise, this bug was reported on 7/21, but
> > was
> > > > not
> > > > > > > > reported
> > > > > > > > > > as
> > > > > > > > > > > > release-blocker in the release discussion thread
> until
> > > 8/31
> > > > > > [1].
> > > > > > > > > > Neither
> > > > > > > > > > > > its reporting as release-blocker nor its fix made it
> > for
> > > > the
> > > > > > 8/3
> > > > > > > > code
> > > > > > > > > > > > freeze."
> > > > > > > > > > > > You are right, would have been better to have this
> > > > identified
> > > > > > and
> > > > > > > > > fixed
> > > > > > > > > > > > earlier and included before code freeze.
> > > > > > > > > > > >
> > > > > > > > > > > > - "The PR is still not ready yet as it doesn't have
> > > > > approval."
> > > > > > > > > > > > I think it is waiting for your review.
> > > > > > > > > > > >
> > > > > > > > > > > > - "it would be great if you could provide some
> > additional
> > > > > > > reasoning
> > > > > > > > > > > besides
> > > > > > > > > > > > "X mentions the issue" or "fix was done by X""
> > > > > > > > > > > > I have. Repeating what I wrote in my previous email
> for
> > > > > > clarity:
> > > > > > > > > Basic
> > > > > > > > > > > > functionality broken: loading a model (albeit one
> that
> > > that
> > > > > was
> > > > > > > > saved
> > > > > > > > > > as
> > > > > > > > > > > > non FP32)
> > > > > > > > > > > >
> > > > > > > > > > > > So, yes - this issue seems to have been out there
> for a
> > > > > while,
> > > > > > > > > somehow
> > > > > > > > > > > went
> > > > > > > > > > > > under the radar... but I think the key question is
> > > whether
> > > > > this
> > > > > > > > > blocks
> > > > > > > > > > a
> > > > > > > > > > > > basic functionality in MXNet. I believe so, hence my
> -1
> > > > vote.
> > > > > > > > > > > >
> > > > > > > > > > > > Hagay
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <
> > > > szha.pvg@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Hagay and Sandeep,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Could you help us understand why this specific bug
> is
> > > > more
> > > > > > > > > important
> > > > > > > > > > > than
> > > > > > > > > > > > > all the other known bugs, that this becomes a
> release
> > > > > > blocker?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Some facts to consider:
> > > > > > > > > > > > > - The bug exists since SymbolBlock was introduced a
> > > year
> > > > > ago
> > > > > > > and
> > > > > > > > > has
> > > > > > > > > > > > > survived at least three releases, so this is not a
> > > > > > regression.
> > > > > > > > > > > > > - Timeline-wise, this bug was reported on 7/21, but
> > was
> > > > not
> > > > > > > > > reported
> > > > > > > > > > as
> > > > > > > > > > > > > release-blocker in the release discussion thread
> > until
> > > > 8/31
> > > > > > > [1].
> > > > > > > > > > > Neither
> > > > > > > > > > > > > its reporting as release-blocker nor its fix made
> it
> > > for
> > > > > the
> > > > > > > 8/3
> > > > > > > > > code
> > > > > > > > > > > > > freeze.
> > > > > > > > > > > > > - The PR is still not ready yet as it doesn't have
> > > > > approval.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hagay, it would be great if you could provide some
> > > > > additional
> > > > > > > > > > reasoning
> > > > > > > > > > > > > besides "X mentions the issue" or "fix was done by
> > X".
> > > > > > Thanks.
> > > > > > > > > > > > >
> > > > > > > > > > > > > -sz
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
> > > > > > > lupesko@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Sandeep mentions the issue of an error when user
> > > tries
> > > > to
> > > > > > > load
> > > > > > > > > > model
> > > > > > > > > > > > > params
> > > > > > > > > > > > > > trained/saved as FP16.
> > > > > > > > > > > > > >
> > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > > > > > > > > > The fix was done by Sandeep:
> > > > > > > > > > > > > >
> > https://github.com/apache/incubator-mxnet/pull/12412
> > > > and
> > > > > > is
> > > > > > > > > ready
> > > > > > > > > > to
> > > > > > > > > > > > be
> > > > > > > > > > > > > > cherry picked into the release branch.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This seems like a release blocker to me:
> > > > > > > > > > > > > > - Basic functionality broken: loading a model
> > (albeit
> > > > one
> > > > > > > that
> > > > > > > > > that
> > > > > > > > > > > was
> > > > > > > > > > > > > > saved as non FP32)
> > > > > > > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
> > > > > > ThomasDelteil@
> > > > > > > )
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -1 (non binding)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hagay
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep
> > > krishnamurthy <
> > > > > > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > "- 0"
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I believe the bug #11849
> > > > > > > > > > > > > > > <
> > > > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > >,
> > > > > > > > > unable
> > > > > > > > > > > to
> > > > > > > > > > > > > > import
> > > > > > > > > > > > > > > non-fp32 models into Gluon, fixed in this PR
> > #12412
> > > > > > > > > > > > > > > <
> > > > https://github.com/apache/incubator-mxnet/pull/12412>
> > > > > > is
> > > > > > > > > > > important
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > users. I would rather pick this fix in this
> > release
> > > > > than
> > > > > > > > plan a
> > > > > > > > > > > minor
> > > > > > > > > > > > > > > release later.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Sandeep
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > > > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Actually, the command "git clone --recursive
> > > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet -b
> > > > > > 1.3.0.rc0"
> > > > > > > > > works
> > > > > > > > > > > fine
> > > > > > > > > > > > > > now,
> > > > > > > > > > > > > > > > never mind.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > > > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Unfortunately, MXNet was depending on a
> > branch
> > > of
> > > > > TVM
> > > > > > > > that
> > > > > > > > > is
> > > > > > > > > > > now
> > > > > > > > > > > > > > > > deleted.
> > > > > > > > > > > > > > > > > We will have to merge #12448
> > > > > > > > > > > > > > > > > <
> > > > > > https://github.com/apache/incubator-mxnet/pull/12448>
> > > > > > > > > > before
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > release.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > > > > > > > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > > > > > > > > > > >.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Philip.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier
> <
> > > > > > > > > > > carinmeier@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >> Checked out the tag, built and tested the
> > > > Clojure
> > > > > > > > package.
> > > > > > > > > > +1
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani
> > > > Nagmote <
> > > > > > > > > > > > > > > > >> roshaninagmote2@gmail.com>
> > > > > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> > Hi all,
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > I would like to propose a vote to
> release
> > > > Apache
> > > > > > > MXNet
> > > > > > > > > > > > > > (incubating)
> > > > > > > > > > > > > > > > >> version
> > > > > > > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now
> (Friday,
> > > Aug
> > > > > > 31st)
> > > > > > > > and
> > > > > > > > > > end
> > > > > > > > > > > at
> > > > > > > > > > > > > > 7:00
> > > > > > > > > > > > > > > PM
> > > > > > > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > Link to release notes:
> > > > > > > > > > > > > > > > >> >
> > > > > > https://github.com/apache/incubator-mxnet/releases
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > > > > > > > > > > >> > *
> > > > > > > > > > > > >
> > > > > > >
> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > > > > > > > > > > >> > <
> > > > > > > > > > > > >
> > > > > > >
> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > > > > > > > > > > >0*
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > View this page, click on "Build from
> > > Source",
> > > > > and
> > > > > > > use
> > > > > > > > > the
> > > > > > > > > > > > source
> > > > > > > > > > > > > > > code
> > > > > > > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > > > > > > > > > > >> >
> > > > > > > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > Please remember to TEST first before
> > voting
> > > > > > > > accordingly:
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > +1 = approve
> > > > > > > > > > > > > > > > >> > +0 = no opinion
> > > > > > > > > > > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > > > > > > >> > Roshani
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sandeep Krishnamurthy
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message