mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kellen sunderland <kellen.sunderl...@gmail.com>
Subject Re: [VOTE] Release MXNet version 1.3.0.RC0
Date Mon, 10 Sep 2018 14:02:48 GMT
Tracked down the issue referred to above and it's not a bug.   I'll update
the ticket.

Changing to +1.

On Mon, Sep 10, 2018 at 3:00 PM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> -0.1
>
> There's one test failure I've run into (details below).  Following Indhu's
> logic I don't think this should block the release as it's not relating to a
> release feature introduced in this version.
>
> I'm trying to use the cpp-package examples as reference code for how to
> run MXNet models from a native context. I'd like to run them with ASAN as a
> sanity check for memory leaks and pointer errors.  I was continually
> running into segfaults and crashes w/ and w/o ASAN.  A little googling
> shows me that this issue has already been reported, and is related to
> running tests on CPU, not to any changes I made:
> https://github.com/apache/incubator-mxnet/issues/9814  Having what our
> effectively our reference examples crash is not a good practice IMO.
>
> I also share some concerns around the fp16 failures.  I know developers
> who are currently porting their models to Gluon who use fp16.  They'll be
> disappointed with the error.
>
> In general though, release looks good.  Big thanks to Sheng and Roshani
> for putting it together (and sorry for the late testing).
>
> -Kellen
>
>
> On Fri, Sep 7, 2018 at 4:31 AM Anirudh <anirudh2290@gmail.com> wrote:
>
>> -1 Considering that using fp16 with gluon is much easier than the
>> alternative where you need access to the model code, this fix is really
>> useful. I understand the pain of doing mxnet release and appreciate
>> Roshani
>> and Shengs efforts, but this seems like something we should fix.
>>
>> On Thu, Sep 6, 2018, 4:57 PM Haibin Lin <haibin.lin.aws@gmail.com> wrote:
>>
>> > +1 built from source and passes dist_sync_kvstore test on Ubuntu.
>> >
>> > Best,
>> > Haibin
>> >
>> > On Thu, Sep 6, 2018 at 1:32 PM Indhu <indhubharathi@gmail.com> wrote:
>> >
>> > > +1
>> > >
>> > > The release candidate looks good. I'm able to build and run basic
>> models.
>> > >
>> > > One the FP16 issue:
>> > >
>> > > Like others have pointed out, releases on expensive in terms of time
>> and
>> > > effort. There needs to be a high and more objective bar on what
>> qualifies
>> > > as a release blocker to make sure we are not setting precedence for a
>> lot
>> > > of release blockers in future.
>> > >
>> > > I think a release blocker is justified only if there is a serious bug
>> > > discovered in one of the features included in the release or if there
>> is
>> > a
>> > > regression. Given FP16 supports is not a new feature claimed in this
>> > > release and this is not a regression in this release candidate, I'm
>> > > inclined to release this candidate and include the FP16 fix in a
>> > subsequent
>> > > release.
>> > >
>> > > Thanks,
>> > > Indu
>> > >
>> > > On Wed, Sep 5, 2018 at 10:21 AM Aaron Markham <
>> aaron.s.markham@gmail.com
>> > >
>> > > wrote:
>> > >
>> > > > 0 (non-binding) If we have a problem that blocks users, and a
>> solution
>> > in
>> > > > hand... then we should fix it, but not at the expense of starting
>> the
>> > > > release cycle again just for one fix. Users can cherry pick or build
>> > from
>> > > > master if they want the fix right away, right? I'd change my mind
>> to -1
>> > > if
>> > > > this wasn't the case, with good reason, and if the user impact was
>> > > critical
>> > > > to adoption or risks abandonment.
>> > > >
>> > > >
>> > > > On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <
>> > > roshaninagmote2@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > I believe everyone here is working hard to make MXNet a better
>> > > framework
>> > > > > for users. It's completely okay to have different opinions, we can
>> > > decide
>> > > > > together if this issue is a blocker or not after voting time is
>> over.
>> > > > >
>> > > > > As I mentioned before, voting will end at 7 pm today. So there is
>> > still
>> > > > > time to test the release. If there are any other issues anyone
>> > finds, I
>> > > > > will be happy to start the process again and work on RC1. For
>> now, I
>> > > want
>> > > > > to encourage everyone to utilize this time and vote. :)
>> > > > >
>> > > > > Thanks,
>> > > > > Roshani
>> > > > >
>> > > > > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
>> > > > > sandeep.krishna98@gmail.com> wrote:
>> > > > >
>> > > > > >    1. As a Apache MXNet community member, I raised the concern
>> of
>> > > > broken
>> > > > > >    functionality for the user. I explained and provided the data
>> > > points
>> > > > > on
>> > > > > > the
>> > > > > >    issue, workaround and why I think it is important. If after
>> all
>> > > > this,
>> > > > > > you
>> > > > > >    think my vote is biased on my employer just because a user I
>> > > quoted
>> > > > is
>> > > > > > from
>> > > > > >    Amazon, this is more concerning to me on my voting abilities.
>> > > > > >    2. My -1 no where undermines the huge amount of effort that
>> goes
>> > > > > behind
>> > > > > >    the scene for a release to happen. Great respect and
>> recognition
>> > > for
>> > > > > >    everyone involved in all the releases of MXNet in the past
>> and
>> > > > this. I
>> > > > > >    voted on my judgement of what may be good for the users of
>> > MXNet.
>> > > > > >    3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free
>> to
>> > > > decide
>> > > > > >    and progress on the release as we already have >3 +1 in this
>> > > thread.
>> > > > > >
>> > > > > >
>> > > > > > Best,
>> > > > > >
>> > > > > > Sandeep
>> > > > > >
>> > > > > > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <
>> > cjolivier01@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > btw, there are no vetoes on package releases:
>> > > > > > >
>> > > > > > > VOTES ON PACKAGE RELEASES
>> > > > > > > <https://www.apache.org/foundation/voting.html#ReleaseVotes>
>> > > > > > >
>> > > > > > > Votes on whether a package is ready to be released use
>> majority
>> > > > > approval
>> > > > > > > <
>> > https://www.apache.org/foundation/glossary.html#MajorityApproval>
>> > > > --
>> > > > > > i.e.
>> > > > > > > at least three PMC members must vote affirmatively for
>> release,
>> > and
>> > > > > there
>> > > > > > > must be more positive than negative votes.Releases may not be
>> > > vetoed.
>> > > > > > > Generally
>> > > > > > > the community will cancel the release vote if anyone
>> identifies
>> > > > serious
>> > > > > > > problems, but in most cases the ultimate decision, lies with
>> the
>> > > > > > individual
>> > > > > > > serving as release manager. The specifics of the process may
>> vary
>> > > > from
>> > > > > > > project to project, but the 'minimum quorum of three +1 votes'
>> > rule
>> > > > is
>> > > > > > > universal.
>> > > > > > >
>> > > > > > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <szha.pvg@gmail.com>
>> > > wrote:
>> > > > > > >
>> > > > > > > > Thanks for sharing your opinions, Thomas. Your recognition
>> and
>> > > > > respect
>> > > > > > of
>> > > > > > > > people's efforts on preparing the release candidate are
>> > certainly
>> > > > > > > > appreciated.
>> > > > > > > >
>> > > > > > > > Now that the vote is set to fail thanks to the veto, there
>> will
>> > > be
>> > > > > > plenty
>> > > > > > > > of opportunities to include those bug fixes, including the
>> one
>> > > Zhi
>> > > > > > > > mentioned [1], which was already merged in the master and
>> yet
>> > > chose
>> > > > > not
>> > > > > > > to
>> > > > > > > > block this release with [2]. I will be happy to work with
>> > Roshani
>> > > > to
>> > > > > > > > prepare another release candidate once ready.
>> > > > > > > >
>> > > > > > > > -sz
>> > > > > > > >
>> > > > > > > > [1]
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
>> > > > > > > > [2]
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
>> > > > > > > >
>> > > > > > > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <
>> > > > > > thomas.delteil1@gmail.com
>> > > > > > > >
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > -0
>> > > > > > > > > (non-binding)
>> > > > > > > > >
>> > > > > > > > > If I may add some nuancing plus a personal data point as
>> one
>> > of
>> > > > the
>> > > > > > > users
>> > > > > > > > > commenting in the bug report in question:
>> > > > > > > > >
>> > > > > > > > > - Performance vs. Basic functionality => I don't think
>> high
>> > > > > > performance
>> > > > > > > > > use-cases and basic functionality are two obviously
>> opposed
>> > > > > concepts
>> > > > > > > and
>> > > > > > > > > see no contradiction in Hagay's and Sandeep's statements.
>> > > > > > > > > Float16 support is feature of MXNet that provides more
>> than
>> > > twice
>> > > > > the
>> > > > > > > > > performance of Float32 on supported platforms, hence the
>> high
>> > > > > > > performance
>> > > > > > > > > use-case. The bug is that the basic functionality of
>> > reloading
>> > > a
>> > > > > > saved
>> > > > > > > > > float16 models is currently broken.
>> > > > > > > > >
>> > > > > > > > > - This bug vs Other bugs => Contrary the vast majority of
>> the
>> > > 140
>> > > > > > open
>> > > > > > > > bugs
>> > > > > > > > > that are mentioned above, I would put to Sandeep's credit
>> > that
>> > > > this
>> > > > > > one
>> > > > > > > > bug
>> > > > > > > > > has a PR open that provides a fix for it. This would make
>> it
>> > a
>> > > > > better
>> > > > > > > > > candidate to get included in this release than a bug that
>> has
>> > > no
>> > > > > fix
>> > > > > > > > ready
>> > > > > > > > > for it.
>> > > > > > > > >
>> > > > > > > > > - Personal datapoint: I recently did some experimentation
>> > with
>> > > > > > float16
>> > > > > > > > [1]
>> > > > > > > > > and actually coincidentally just published a video on
>> > > optimizing
>> > > > > > > > > performance for Gluon. Float16 conversion is one of the
>> most,
>> > > if
>> > > > > not
>> > > > > > > the
>> > > > > > > > > most effective way to get performance out of MXNet [2]. I
>> > > believe
>> > > > > > there
>> > > > > > > > is
>> > > > > > > > > a lot of value in publicizing more its use and hence
>> making
>> > > sure
>> > > > at
>> > > > > > > least
>> > > > > > > > > the basic support for normal use-cases is present.
>> > > > > > > > >
>> > > > > > > > > Of course this needs to be balanced with the overhead of
>> > > > preparing
>> > > > > a
>> > > > > > > new
>> > > > > > > > > release candidate once the fixed is reviewed and merged,
>> > which
>> > > > > seems
>> > > > > > to
>> > > > > > > > be
>> > > > > > > > > a lengthy and complex process in its own right, and the
>> delay
>> > > > with
>> > > > > > > > > providing the other features present in 1.3 for users that
>> > are
>> > > > not
>> > > > > > > > running
>> > > > > > > > > off the nightly builds.
>> > > > > > > > >
>> > > > > > > > > All the best,
>> > > > > > > > >
>> > > > > > > > > Thomas
>> > > > > > > > >
>> > > > > > > > > [1]
>> > > https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
>> > > > > > > > > [2]
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
>> > > > > > > > >
>> > > > > > > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <
>> szha.pvg@gmail.com>
>> > a
>> > > > > > écrit :
>> > > > > > > > >
>> > > > > > > > > > Sandeep,
>> > > > > > > > > >
>> > > > > > > > > > Thanks for explaining your veto. We have open bugs that
>> > > > impacted
>> > > > > a
>> > > > > > > lot
>> > > > > > > > > more
>> > > > > > > > > > than just 3 customers, just by referring to the number
>> of
>> > > > > > commenters
>> > > > > > > on
>> > > > > > > > > the
>> > > > > > > > > > issue [1].
>> > > > > > > > > >
>> > > > > > > > > > You said that this is for "high performance use cases",
>> > which
>> > > > > > > > contradicts
>> > > > > > > > > > with Hagay's assement that this is "basic functionality
>> > > > broken".
>> > > > > > > Given
>> > > > > > > > > that
>> > > > > > > > > > this is for advanced use cases of using half-precision
>> > > > training,
>> > > > > > why
>> > > > > > > is
>> > > > > > > > > it
>> > > > > > > > > > so much more important than any other open bug reports,
>> > that
>> > > > for
>> > > > > > this
>> > > > > > > > > > specific bug fix, we have to delay the access of regular
>> > > users
>> > > > to
>> > > > > > the
>> > > > > > > > new
>> > > > > > > > > > MXNet 1.3 release by at least another week?
>> > > > > > > > > >
>> > > > > > > > > > Honestly, I'm concerned that your vote is biased by
>> Amazon
>> > > > > > > involvement,
>> > > > > > > > > > given that you quoted Amazon Rekognition.
>> > > > > > > > > >
>> > > > > > > > > > -sz
>> > > > > > > > > >
>> > > > > > > > > > [1]
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
>> > > > > > > > > >
>> > > > > > > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
>> > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
>> > > > > > > > > >
>> > > > > > > > > > > My initial vote of “-0” was due to lack of info from a
>> > user
>> > > > who
>> > > > > > had
>> > > > > > > > > said,
>> > > > > > > > > > > he overcame this issue for FP16 model.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > However, suggested workaround [1] for the issue is not
>> > > > straight
>> > > > > > > > forward
>> > > > > > > > > > and
>> > > > > > > > > > > generally usable for all users. Also, issue is not
>> simple
>> > > and
>> > > > > > > > isolated
>> > > > > > > > > to
>> > > > > > > > > > > be listed in the Release Notes as known issue with a
>> > > > > workaround.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Changing my vote to: "-1 (binding)" owing to the user
>> > > impact
>> > > > > [3]
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > @Sheng:
>> > > > > > > > > > >
>> > > > > > > > > > > 1. Agreed, bug existed from long time. However, FP16
>> and
>> > > such
>> > > > > > > > > > optimizations
>> > > > > > > > > > > were added later on. Followed by users [2] using this
>> > > feature
>> > > > > for
>> > > > > > > > high
>> > > > > > > > > > > performance use cases. It is not ok to measure
>> severity
>> > of
>> > > > the
>> > > > > > bug
>> > > > > > > > > based
>> > > > > > > > > > on
>> > > > > > > > > > > its past existence, rather we can see who is impacted
>> now
>> > > and
>> > > > > is
>> > > > > > > it a
>> > > > > > > > > > small
>> > > > > > > > > > > subset with a simple workaround or large user
>> impacting
>> > > > issue.
>> > > > > > > > > > >
>> > > > > > > > > > > 2. Agreed bug was reported 7/21. However, I became
>> aware
>> > of
>> > > > > this
>> > > > > > > > issue
>> > > > > > > > > on
>> > > > > > > > > > > 08/29 and submitted the fix on 08/30. Also, I did
>> bring
>> > > this
>> > > > to
>> > > > > > the
>> > > > > > > > > > notice
>> > > > > > > > > > > of community, you and 1.3 release manager (Roshani) on
>> > the
>> > > > RC0
>> > > > > > > > proposal
>> > > > > > > > > > > thread. Also, I would focus on the issue and user
>> impact
>> > > than
>> > > > > who
>> > > > > > > > > > > identified and who is fixing the issue.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Based on my discussion with 2 users, I think it is a
>> > > > important
>> > > > > > > > feature
>> > > > > > > > > > for
>> > > > > > > > > > > them to see in Apache MXNet v1.3.0.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Best,
>> > > > > > > > > > >
>> > > > > > > > > > > Sandeep
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > [1] Workaround used by the user.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > net_fp16 =
>> > > > > > > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
>> > > > > > > > > > > ['data'])
>> > > > > > > > > > >
>> > > > > > > > > > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > for k, v in params_fp16.items():
>> > > > > > > > > > >
>> > > > > > > > > > >     new_key = k.split(':')[1]
>> > > > > > > > > > >
>> > > > > > > > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > net_fp16.collect_params().load('resnet34_fp16-0000.params',
>> > > > > ctx)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > [2] Amazon Rekognition
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > [3] User story: Train a model -> Cast it to FP16 ->
>> Save
>> > > the
>> > > > > > model
>> > > > > > > ->
>> > > > > > > > > > Load
>> > > > > > > > > > > back the model does not work. They have to cast every
>> > > > parameter
>> > > > > > > with
>> > > > > > > > a
>> > > > > > > > > > > workaround mentioned above [1].
>> > > > > > > > > > >
>> > > > > > > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <
>> > > > > lupesko@gmail.com>
>> > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hi Sheng,
>> > > > > > > > > > > >
>> > > > > > > > > > > > Addressing your questions:
>> > > > > > > > > > > >
>> > > > > > > > > > > > - "why this specific bug is more important than all
>> the
>> > > > other
>> > > > > > > known
>> > > > > > > > > > bugs,
>> > > > > > > > > > > > that this becomes a release blocker"
>> > > > > > > > > > > > I do not consider it to be more or less important
>> than
>> > > > other
>> > > > > > > fixes.
>> > > > > > > > > It
>> > > > > > > > > > > can
>> > > > > > > > > > > > be fixed and included in the release alongside the
>> rest
>> > > of
>> > > > > the
>> > > > > > > > > release
>> > > > > > > > > > > > content, right?
>> > > > > > > > > > > > From the description of the issue it seems important
>> > > since
>> > > > it
>> > > > > > is
>> > > > > > > > > > blocking
>> > > > > > > > > > > > users from loading models that were previously
>> trained
>> > > and
>> > > > > > saved.
>> > > > > > > > > There
>> > > > > > > > > > > is
>> > > > > > > > > > > > nothing stopping the community from including this
>> fix
>> > > into
>> > > > > > > 1.3.0,
>> > > > > > > > > > > > alongside the rest of the features and fixes.
>> > > > > > > > > > > >
>> > > > > > > > > > > > - "The bug exists since SymbolBlock was introduced a
>> > year
>> > > > ago
>> > > > > > and
>> > > > > > > > has
>> > > > > > > > > > > > survived at least three releases, so this is not a
>> > > > > regression."
>> > > > > > > > > > > > I do not think I said it is a regression. However,
>> the
>> > > > fact a
>> > > > > > bug
>> > > > > > > > > > existed
>> > > > > > > > > > > > before, does not mean it is OK to release it rather
>> > than
>> > > > fix
>> > > > > > it.
>> > > > > > > > > > > >
>> > > > > > > > > > > > - "Timeline-wise, this bug was reported on 7/21, but
>> > was
>> > > > not
>> > > > > > > > reported
>> > > > > > > > > > as
>> > > > > > > > > > > > release-blocker in the release discussion thread
>> until
>> > > 8/31
>> > > > > > [1].
>> > > > > > > > > > Neither
>> > > > > > > > > > > > its reporting as release-blocker nor its fix made it
>> > for
>> > > > the
>> > > > > > 8/3
>> > > > > > > > code
>> > > > > > > > > > > > freeze."
>> > > > > > > > > > > > You are right, would have been better to have this
>> > > > identified
>> > > > > > and
>> > > > > > > > > fixed
>> > > > > > > > > > > > earlier and included before code freeze.
>> > > > > > > > > > > >
>> > > > > > > > > > > > - "The PR is still not ready yet as it doesn't have
>> > > > > approval."
>> > > > > > > > > > > > I think it is waiting for your review.
>> > > > > > > > > > > >
>> > > > > > > > > > > > - "it would be great if you could provide some
>> > additional
>> > > > > > > reasoning
>> > > > > > > > > > > besides
>> > > > > > > > > > > > "X mentions the issue" or "fix was done by X""
>> > > > > > > > > > > > I have. Repeating what I wrote in my previous email
>> for
>> > > > > > clarity:
>> > > > > > > > > Basic
>> > > > > > > > > > > > functionality broken: loading a model (albeit one
>> that
>> > > that
>> > > > > was
>> > > > > > > > saved
>> > > > > > > > > > as
>> > > > > > > > > > > > non FP32)
>> > > > > > > > > > > >
>> > > > > > > > > > > > So, yes - this issue seems to have been out there
>> for a
>> > > > > while,
>> > > > > > > > > somehow
>> > > > > > > > > > > went
>> > > > > > > > > > > > under the radar... but I think the key question is
>> > > whether
>> > > > > this
>> > > > > > > > > blocks
>> > > > > > > > > > a
>> > > > > > > > > > > > basic functionality in MXNet. I believe so, hence
>> my -1
>> > > > vote.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Hagay
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <
>> > > > szha.pvg@gmail.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Hi Hagay and Sandeep,
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Could you help us understand why this specific
>> bug is
>> > > > more
>> > > > > > > > > important
>> > > > > > > > > > > than
>> > > > > > > > > > > > > all the other known bugs, that this becomes a
>> release
>> > > > > > blocker?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Some facts to consider:
>> > > > > > > > > > > > > - The bug exists since SymbolBlock was introduced
>> a
>> > > year
>> > > > > ago
>> > > > > > > and
>> > > > > > > > > has
>> > > > > > > > > > > > > survived at least three releases, so this is not a
>> > > > > > regression.
>> > > > > > > > > > > > > - Timeline-wise, this bug was reported on 7/21,
>> but
>> > was
>> > > > not
>> > > > > > > > > reported
>> > > > > > > > > > as
>> > > > > > > > > > > > > release-blocker in the release discussion thread
>> > until
>> > > > 8/31
>> > > > > > > [1].
>> > > > > > > > > > > Neither
>> > > > > > > > > > > > > its reporting as release-blocker nor its fix made
>> it
>> > > for
>> > > > > the
>> > > > > > > 8/3
>> > > > > > > > > code
>> > > > > > > > > > > > > freeze.
>> > > > > > > > > > > > > - The PR is still not ready yet as it doesn't have
>> > > > > approval.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Hagay, it would be great if you could provide some
>> > > > > additional
>> > > > > > > > > > reasoning
>> > > > > > > > > > > > > besides "X mentions the issue" or "fix was done by
>> > X".
>> > > > > > Thanks.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > -sz
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > [1]
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
>> > > > > > > lupesko@gmail.com
>> > > > > > > > >
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Sandeep mentions the issue of an error when user
>> > > tries
>> > > > to
>> > > > > > > load
>> > > > > > > > > > model
>> > > > > > > > > > > > > params
>> > > > > > > > > > > > > > trained/saved as FP16.
>> > > > > > > > > > > > > >
>> > > https://github.com/apache/incubator-mxnet/issues/11849
>> > > > > > > > > > > > > > The fix was done by Sandeep:
>> > > > > > > > > > > > > >
>> > https://github.com/apache/incubator-mxnet/pull/12412
>> > > > and
>> > > > > > is
>> > > > > > > > > ready
>> > > > > > > > > > to
>> > > > > > > > > > > > be
>> > > > > > > > > > > > > > cherry picked into the release branch.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > This seems like a release blocker to me:
>> > > > > > > > > > > > > > - Basic functionality broken: loading a model
>> > (albeit
>> > > > one
>> > > > > > > that
>> > > > > > > > > that
>> > > > > > > > > > > was
>> > > > > > > > > > > > > > saved as non FP32)
>> > > > > > > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
>> > > > > > ThomasDelteil@
>> > > > > > > )
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > -1 (non binding)
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hagay
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep
>> > > krishnamurthy <
>> > > > > > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > "- 0"
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > I believe the bug #11849
>> > > > > > > > > > > > > > > <
>> > > > > https://github.com/apache/incubator-mxnet/issues/11849
>> > > > > > >,
>> > > > > > > > > unable
>> > > > > > > > > > > to
>> > > > > > > > > > > > > > import
>> > > > > > > > > > > > > > > non-fp32 models into Gluon, fixed in this PR
>> > #12412
>> > > > > > > > > > > > > > > <
>> > > > https://github.com/apache/incubator-mxnet/pull/12412>
>> > > > > > is
>> > > > > > > > > > > important
>> > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > users. I would rather pick this fix in this
>> > release
>> > > > > than
>> > > > > > > > plan a
>> > > > > > > > > > > minor
>> > > > > > > > > > > > > > > release later.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > > > Sandeep
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
>> > > > > > > > > > > > chohyu01@cs.washington.edu>
>> > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Actually, the command "git clone --recursive
>> > > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet
>> -b
>> > > > > > 1.3.0.rc0"
>> > > > > > > > > works
>> > > > > > > > > > > fine
>> > > > > > > > > > > > > > now,
>> > > > > > > > > > > > > > > > never mind.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
>> > > > > > > > > > > > > chohyu01@cs.washington.edu>
>> > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Unfortunately, MXNet was depending on a
>> > branch
>> > > of
>> > > > > TVM
>> > > > > > > > that
>> > > > > > > > > is
>> > > > > > > > > > > now
>> > > > > > > > > > > > > > > > deleted.
>> > > > > > > > > > > > > > > > > We will have to merge #12448
>> > > > > > > > > > > > > > > > > <
>> > > > > > https://github.com/apache/incubator-mxnet/pull/12448>
>> > > > > > > > > > before
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > release.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Background: See dmlc/tvm#1394 <
>> > > > > > > > > > > > > > https://github.com/dmlc/tvm/issues/1394
>> > > > > > > > > > > > > > > >.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Philip.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin
>> Meier <
>> > > > > > > > > > > carinmeier@gmail.com
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >> Checked out the tag, built and tested the
>> > > > Clojure
>> > > > > > > > package.
>> > > > > > > > > > +1
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani
>> > > > Nagmote <
>> > > > > > > > > > > > > > > > >> roshaninagmote2@gmail.com>
>> > > > > > > > > > > > > > > > >> wrote:
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >> > Hi all,
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >> > I would like to propose a vote to
>> release
>> > > > Apache
>> > > > > > > MXNet
>> > > > > > > > > > > > > > (incubating)
>> > > > > > > > > > > > > > > > >> version
>> > > > > > > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now
>> (Friday,
>> > > Aug
>> > > > > > 31st)
>> > > > > > > > and
>> > > > > > > > > > end
>> > > > > > > > > > > at
>> > > > > > > > > > > > > > 7:00
>> > > > > > > > > > > > > > > PM
>> > > > > > > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >> > Link to release notes:
>> > > > > > > > > > > > > > > > >> >
>> > > > > > https://github.com/apache/incubator-mxnet/releases
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
>> > > > > > > > > > > > > > > > >> > *
>> > > > > > > > > > > > >
>> > > > > > >
>> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
>> > > > > > > > > > > > > > > > >> > <
>> > > > > > > > > > > > >
>> > > > > > >
>> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
>> > > > > > > > > > > > > > > >0*
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >> > View this page, click on "Build from
>> > > Source",
>> > > > > and
>> > > > > > > use
>> > > > > > > > > the
>> > > > > > > > > > > > source
>> > > > > > > > > > > > > > > code
>> > > > > > > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > https://mxnet.incubator.apache.org/install/index.html
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >> > Please remember to TEST first before
>> > voting
>> > > > > > > > accordingly:
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >> > +1 = approve
>> > > > > > > > > > > > > > > > >> > +0 = no opinion
>> > > > > > > > > > > > > > > > >> > -1 = disapprove (provide reason)
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >> > Thanks,
>> > > > > > > > > > > > > > > > >> > Roshani
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > --
>> > > > > > > > > > > > > > > Sandeep Krishnamurthy
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > --
>> > > > > > > > > > > Sandeep Krishnamurthy
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Sandeep Krishnamurthy
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message