mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steffen Rochel <steffenroc...@gmail.com>
Subject Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release
Date Thu, 29 Nov 2018 22:28:20 GMT
All - Sergey has created v1.4.x branch and I opened first PR:
https://github.com/apache/incubator-mxnet/pull/13469

Please add critical - and only critical - bug fixes to v1.4.x branch and
add myself as approver.

Regards,
Steffen

On Thu, Nov 29, 2018 at 2:17 PM Lin Yuan <apeforest@gmail.com> wrote:

> https://github.com/apache/incubator-mxnet/pull/13452 is needed in 1.4.0 to
> support Horovod integration project.
>
> Thanks!
>
> Lin
>
>
> On Thu, Nov 29, 2018 at 1:40 PM Davydenko, Denis <
> dzianis.davydzenka@gmail.com> wrote:
>
> > I suggest to include this issue into tracked ones for the release:
> > https://github.com/apache/incubator-mxnet/issues/12255. It has proven to
> > be a problem with MXNet start up time and it will cause even more
> problems
> > down the line with Elastic Training, EIA where MXNet is a commodity
> rather
> > than statically running process. Also it already causes noticeable issues
> > with MMS (MXNet Model Server [1]). MMS users already noticed significant
> > lag with MMS start up time, especially on beefy instances like C5.18xl
> with
> > 72 vCPUs. MMS spins up multiple MXNet instances during its start up to
> > ensure full utilization of CPU or GPU resources on the host. By default
> it
> > spins up as many MXNet instances as there are cores (either CPU or GPU
> > cores) and the bigger the host the more MXNet instances are spun up. And
> > the more MXNet instances spun up - the more each instance takes time to
> > start. For example, on C5.4xl users reported waiting for as long as 2
> > minutes to have just 8 MXNet instances spun up with MXNet 1.3. Same
> efforts
> > with MXNet 1.1 take less than 0.5 sec.
> >
> > This is quite a significant regression in MXNet when it comes to start up
> > experience. I suggest to consider this as a blocker for 1.4.
> >
> > [1] https://github.com/awslabs/mxnet-model-server
> >
> > On 11/29/18, 12:51 PM, "Steffen Rochel" <steffenrochel@gmail.com>
> wrote:
> >
> >     added to 1.4.0 tracking list
> >     <
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> > >
> >     .
> >     Steffen
> >
> >     On Thu, Nov 29, 2018 at 9:32 AM Zheng, Da <dzzhen@amazon.com.invalid
> >
> > wrote:
> >
> >     > Hello Steffen,
> >     >
> >     > Can this bug be fixed in 1.4.0 release? It's a significant
> > performance
> >     > regression on sparse matrix multiplication.
> >     > https://github.com/apache/incubator-mxnet/issues/13449
> >     >
> >     > Thanks,
> >     > Da
> >     >
> >     > On 11/26/18, 6:42 AM, "Steffen Rochel" <steffenrochel@gmail.com>
> > wrote:
> >     >
> >     >     Dear MXNet community,
> >     >
> >     >     I will be the release manager for the upcoming Apache MXNet
> 1.4.0
> >     > release.
> >     >     Sergey Kolychev will be co-managing the release and providing
> > help
> >     > from the
> >     >     committers side.
> >     >     A release candidate will be cut on November 29, 2018 and voting
> > will
> >     > start
> >     >     December 7, 2018. Release notes have been drafted here [1]. If
> > you
> >     > have any
> >     >     additional features in progress and would like to include it in
> > this
> >     >     release, please assure they have been merged by November 27,
> > 2018.
> >     > Release
> >     >     schedule is available here [2].
> >     >
> >     >     Feel free to add any other comments/suggestions. Please help to
> > review
> >     > and
> >     >     merge outstanding PR's and resolve issues impacting the quality
> > of the
> >     >     1.4.0 release.
> >     >
> >     >     Regards,
> >     >
> >     >     Steffen
> >     >
> >     >     [1]
> >     >
> >     >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> >     >
> >     >     [2]
> >     >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> >     >
> >     >
> >     >
> >     >
> >     >     On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> >     >     kellen.sunderland@gmail.com> wrote:
> >     >
> >     >     > Spoke too soon[1], looks like others have been adding Turing
> > support
> >     > as
> >     >     > well (thanks to those helping with this).  I believe there's
> > still a
> >     > few
> >     >     > changes we'd have to make to claim support though (mshadow
> > CMake
> >     > changes,
> >     >     > PyPi package creation tweaks).
> >     >     >
> >     >     > 1:
> >     >     >
> >     >     >
> >     >
> >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> >     >     >
> >     >     > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> >     >     > kellen.sunderland@gmail.com> wrote:
> >     >     >
> >     >     > > Hey Steffen, I'd like to be able to merge this PR for
> > version 1.4:
> >     >     > > https://github.com/apache/incubator-mxnet/pull/13310 . It
> > fixes a
> >     >     > > regression in master which causes incorrect feature vectors
> > to be
> >     > output
> >     >     > > when using the TensorRT feature.  (Thanks to Nathalie for
> > helping
> >     > me
> >     >     > track
> >     >     > > down the root cause of the issue).   I'm currently blocked
> > on a CI
> >     > issue
> >     >     > I
> >     >     > > haven't seen before, but hope to have it resolved by EOW.
> >     >     > >
> >     >     > > One call-out I would make is that we currently don't
> support
> > Turing
> >     >     > > architecture (sm_75).  I've been slowly trying to add
> > support, but
> >     > I
> >     >     > don't
> >     >     > > think I'd have capacity to do this done by EOW.  Does
> anyone
> > feel
> >     >     > strongly
> >     >     > > we need this in the 1.4 release?  From my perspective this
> > will
> >     > already
> >     >     > be
> >     >     > > a strong release without it.
> >     >     > >
> >     >     > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> >     > steffenrochel@gmail.com>
> >     >     > > wrote:
> >     >     > >
> >     >     > >> Thanks Patrick, lets target to get the PR's merged this
> > week.
> >     >     > >>
> >     >     > >> Call for contributions from the community: Right now
we
> > have 10 PR
> >     >     > >> awaiting
> >     >     > >> merge
> >     >     > >> <
> >     >     > >>
> >     >     >
> >     >
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> >     >     > >> >
> >     >     > >> and
> >     >     > >> we have 61 open PR awaiting review.
> >     >     > >> <
> >     >     > >>
> >     >     >
> >     >
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> >     >     > >> >
> >     >     > >> I would appreciate if you all can help to review the
open
> > PR and
> >     > the
> >     >     > >> committers can drive the merge before code freeze for
> 1.4.0.
> >     >     > >>
> >     >     > >> The contributors on the Java API are making progress,
but
> > not all
> >     >     > >> performance issues are resolved. With some luck it should
> be
> >     > possible to
> >     >     > >> code freeze towards end of this week.
> >     >     > >>
> >     >     > >> Are there other critical features/bugs/PR you think need
> to
> > be
> >     > included
> >     >     > in
> >     >     > >> 1.4.0? If so, please communicate as soon as possible.
> >     >     > >>
> >     >     > >> Regards,
> >     >     > >> Steffen
> >     >     > >>
> >     >     > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> >     > patric.zhao@intel.com>
> >     >     > >> wrote:
> >     >     > >>
> >     >     > >> > Thanks, Steffen. I think there is NO open issue
to block
> > the
> >     > MKLDNN to
> >     >     > >> GA
> >     >     > >> > now.
> >     >     > >> >
> >     >     > >> > BTW, several quantization related PRs (#13297,#13260)
> are
> > under
> >     > the
> >     >     > >> review
> >     >     > >> > and I think it can be merged in this week.
> >     >     > >> >
> >     >     > >> > Thanks,
> >     >     > >> >
> >     >     > >> > --Patric
> >     >     > >> >
> >     >     > >> >
> >     >     > >> > > -----Original Message-----
> >     >     > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> >     >     > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> >     >     > >> > > To: dev@mxnet.incubator.apache.org
> >     >     > >> > > Subject: Re: [Announce] Upcoming Apache MXNet
> > (incubating)
> >     > 1.4.0
> >     >     > >> release
> >     >     > >> > >
> >     >     > >> > > On Friday the contributors working on Java
API
> > discovered a
> >     >     > potential
> >     >     > >> > > performance problem with inference using Java
API vs.
> > Python.
> >     >     > >> > Investigation
> >     >     > >> > > is ongoing.
> >     >     > >> > > As the Java API is one of the main features
for the
> > upcoming
> >     >     > release,
> >     >     > >> I
> >     >     > >> > > suggest to post-pone the code freeze towards
end of
> > this week.
> >     >     > >> > >
> >     >     > >> > > Please provide feedback and concern about the
change
> in
> > dates
> >     > for
> >     >     > code
> >     >     > >> > > freeze and 1.4.0 release. I will provide updates
on
> > progress
> >     >     > resolving
> >     >     > >> > the
> >     >     > >> > > potential performance problem.
> >     >     > >> > >
> >     >     > >> > > Patrick - do you think it is possible to resolve
the
> > remaining
> >     >     > issues
> >     >     > >> on
> >     >     > >> > MKL-
> >     >     > >> > > DNN this week, so we can consider GA for MKL-DNN
with
> > 1.4.0?
> >     >     > >> > >
> >     >     > >> > > Regards,
> >     >     > >> > > Steffen
> >     >     > >> > >
> >     >     > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov
<
> >     > mechernov@gmail.com>
> >     >     > >> > > wrote:
> >     >     > >> > >
> >     >     > >> > > > I'd like to remind everyone that 'code
freeze' would
> > mean
> >     > cutting
> >     >     > a
> >     >     > >> > > > v1.4.x release branch and all following
fixes would
> > need to
> >     > be
> >     >     > >> > backported.
> >     >     > >> > > > Development on master can be continued
as usual.
> >     >     > >> > > >
> >     >     > >> > > > Best
> >     >     > >> > > > Anton
> >     >     > >> > > >
> >     >     > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen
Rochel <
> >     >     > >> steffenrochel@gmail.com>:
> >     >     > >> > > >
> >     >     > >> > > > > Dear MXNet community,
> >     >     > >> > > > > the agreed plan was to establish
code freeze for
> > 1.4.0
> >     > release
> >     >     > >> > > > > today. As the 1.3.1 patch release
is still
> ongoing I
> >     > suggest to
> >     >     > >> > > > > post-pone the code freeze to Friday
16th November
> > 2018.
> >     >     > >> > > > >
> >     >     > >> > > > > Sergey Kolychev has agreed to act
as co-release
> > manager
> >     > for all
> >     >     > >> > > > > tasks
> >     >     > >> > > > which
> >     >     > >> > > > > require committer privileges. If
anybody is
> > interested to
> >     >     > >> volunteer
> >     >     > >> > > > > as release manager - now is the time
to speak up.
> >     > Otherwise I
> >     >     > will
> >     >     > >> > > > > manage
> >     >     > >> > > > the
> >     >     > >> > > > > release.
> >     >     > >> > > > >
> >     >     > >> > > > > Regards,
> >     >     > >> > > > > Steffen
> >     >     > >> > > > >
> >     >     > >> > > >
> >     >     > >> >
> >     >     > >>
> >     >     > >
> >     >     >
> >     >
> >     >
> >     >
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message