mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davydenko, Denis" <dzianis.davydze...@gmail.com>
Subject Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release
Date Thu, 29 Nov 2018 21:39:54 GMT
I suggest to include this issue into tracked ones for the release: https://github.com/apache/incubator-mxnet/issues/12255.
It has proven to be a problem with MXNet start up time and it will cause even more problems
down the line with Elastic Training, EIA where MXNet is a commodity rather than statically
running process. Also it already causes noticeable issues with MMS (MXNet Model Server [1]).
MMS users already noticed significant lag with MMS start up time, especially on beefy instances
like C5.18xl with 72 vCPUs. MMS spins up multiple MXNet instances during its start up to ensure
full utilization of CPU or GPU resources on the host. By default it spins up as many MXNet
instances as there are cores (either CPU or GPU cores) and the bigger the host the more MXNet
instances are spun up. And the more MXNet instances spun up - the more each instance takes
time to start. For example, on C5.4xl users reported waiting for as long as 2 minutes to have
just 8 MXNet instances spun up with MXNet 1.3. Same efforts with MXNet 1.1 take less than
0.5 sec.

This is quite a significant regression in MXNet when it comes to start up experience. I suggest
to consider this as a blocker for 1.4.

[1] https://github.com/awslabs/mxnet-model-server 

On 11/29/18, 12:51 PM, "Steffen Rochel" <steffenrochel@gmail.com> wrote:

    added to 1.4.0 tracking list
    <https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack>
    .
    Steffen
    
    On Thu, Nov 29, 2018 at 9:32 AM Zheng, Da <dzzhen@amazon.com.invalid> wrote:
    
    > Hello Steffen,
    >
    > Can this bug be fixed in 1.4.0 release? It's a significant performance
    > regression on sparse matrix multiplication.
    > https://github.com/apache/incubator-mxnet/issues/13449
    >
    > Thanks,
    > Da
    >
    > On 11/26/18, 6:42 AM, "Steffen Rochel" <steffenrochel@gmail.com> wrote:
    >
    >     Dear MXNet community,
    >
    >     I will be the release manager for the upcoming Apache MXNet 1.4.0
    > release.
    >     Sergey Kolychev will be co-managing the release and providing help
    > from the
    >     committers side.
    >     A release candidate will be cut on November 29, 2018 and voting will
    > start
    >     December 7, 2018. Release notes have been drafted here [1]. If you
    > have any
    >     additional features in progress and would like to include it in this
    >     release, please assure they have been merged by November 27, 2018.
    > Release
    >     schedule is available here [2].
    >
    >     Feel free to add any other comments/suggestions. Please help to review
    > and
    >     merge outstanding PR's and resolve issues impacting the quality of the
    >     1.4.0 release.
    >
    >     Regards,
    >
    >     Steffen
    >
    >     [1]
    >
    > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
    >
    >     [2]
    > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
    >
    >
    >
    >
    >     On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
    >     kellen.sunderland@gmail.com> wrote:
    >
    >     > Spoke too soon[1], looks like others have been adding Turing support
    > as
    >     > well (thanks to those helping with this).  I believe there's still a
    > few
    >     > changes we'd have to make to claim support though (mshadow CMake
    > changes,
    >     > PyPi package creation tweaks).
    >     >
    >     > 1:
    >     >
    >     >
    > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
    >     >
    >     > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
    >     > kellen.sunderland@gmail.com> wrote:
    >     >
    >     > > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
    >     > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
    >     > > regression in master which causes incorrect feature vectors to be
    > output
    >     > > when using the TensorRT feature.  (Thanks to Nathalie for helping
    > me
    >     > track
    >     > > down the root cause of the issue).   I'm currently blocked on a CI
    > issue
    >     > I
    >     > > haven't seen before, but hope to have it resolved by EOW.
    >     > >
    >     > > One call-out I would make is that we currently don't support Turing
    >     > > architecture (sm_75).  I've been slowly trying to add support, but
    > I
    >     > don't
    >     > > think I'd have capacity to do this done by EOW.  Does anyone feel
    >     > strongly
    >     > > we need this in the 1.4 release?  From my perspective this will
    > already
    >     > be
    >     > > a strong release without it.
    >     > >
    >     > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
    > steffenrochel@gmail.com>
    >     > > wrote:
    >     > >
    >     > >> Thanks Patrick, lets target to get the PR's merged this week.
    >     > >>
    >     > >> Call for contributions from the community: Right now we have 10
PR
    >     > >> awaiting
    >     > >> merge
    >     > >> <
    >     > >>
    >     >
    > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
    >     > >> >
    >     > >> and
    >     > >> we have 61 open PR awaiting review.
    >     > >> <
    >     > >>
    >     >
    > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
    >     > >> >
    >     > >> I would appreciate if you all can help to review the open PR and
    > the
    >     > >> committers can drive the merge before code freeze for 1.4.0.
    >     > >>
    >     > >> The contributors on the Java API are making progress, but not all
    >     > >> performance issues are resolved. With some luck it should be
    > possible to
    >     > >> code freeze towards end of this week.
    >     > >>
    >     > >> Are there other critical features/bugs/PR you think need to be
    > included
    >     > in
    >     > >> 1.4.0? If so, please communicate as soon as possible.
    >     > >>
    >     > >> Regards,
    >     > >> Steffen
    >     > >>
    >     > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
    > patric.zhao@intel.com>
    >     > >> wrote:
    >     > >>
    >     > >> > Thanks, Steffen. I think there is NO open issue to block the
    > MKLDNN to
    >     > >> GA
    >     > >> > now.
    >     > >> >
    >     > >> > BTW, several quantization related PRs (#13297,#13260) are
under
    > the
    >     > >> review
    >     > >> > and I think it can be merged in this week.
    >     > >> >
    >     > >> > Thanks,
    >     > >> >
    >     > >> > --Patric
    >     > >> >
    >     > >> >
    >     > >> > > -----Original Message-----
    >     > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
    >     > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
    >     > >> > > To: dev@mxnet.incubator.apache.org
    >     > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
    > 1.4.0
    >     > >> release
    >     > >> > >
    >     > >> > > On Friday the contributors working on Java API discovered
a
    >     > potential
    >     > >> > > performance problem with inference using Java API vs.
Python.
    >     > >> > Investigation
    >     > >> > > is ongoing.
    >     > >> > > As the Java API is one of the main features for the upcoming
    >     > release,
    >     > >> I
    >     > >> > > suggest to post-pone the code freeze towards end of this
week.
    >     > >> > >
    >     > >> > > Please provide feedback and concern about the change
in dates
    > for
    >     > code
    >     > >> > > freeze and 1.4.0 release. I will provide updates on progress
    >     > resolving
    >     > >> > the
    >     > >> > > potential performance problem.
    >     > >> > >
    >     > >> > > Patrick - do you think it is possible to resolve the
remaining
    >     > issues
    >     > >> on
    >     > >> > MKL-
    >     > >> > > DNN this week, so we can consider GA for MKL-DNN with
1.4.0?
    >     > >> > >
    >     > >> > > Regards,
    >     > >> > > Steffen
    >     > >> > >
    >     > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
    > mechernov@gmail.com>
    >     > >> > > wrote:
    >     > >> > >
    >     > >> > > > I'd like to remind everyone that 'code freeze' would
mean
    > cutting
    >     > a
    >     > >> > > > v1.4.x release branch and all following fixes would
need to
    > be
    >     > >> > backported.
    >     > >> > > > Development on master can be continued as usual.
    >     > >> > > >
    >     > >> > > > Best
    >     > >> > > > Anton
    >     > >> > > >
    >     > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel
<
    >     > >> steffenrochel@gmail.com>:
    >     > >> > > >
    >     > >> > > > > Dear MXNet community,
    >     > >> > > > > the agreed plan was to establish code freeze
for 1.4.0
    > release
    >     > >> > > > > today. As the 1.3.1 patch release is still
ongoing I
    > suggest to
    >     > >> > > > > post-pone the code freeze to Friday 16th November
2018.
    >     > >> > > > >
    >     > >> > > > > Sergey Kolychev has agreed to act as co-release
manager
    > for all
    >     > >> > > > > tasks
    >     > >> > > > which
    >     > >> > > > > require committer privileges. If anybody is
interested to
    >     > >> volunteer
    >     > >> > > > > as release manager - now is the time to speak
up.
    > Otherwise I
    >     > will
    >     > >> > > > > manage
    >     > >> > > > the
    >     > >> > > > > release.
    >     > >> > > > >
    >     > >> > > > > Regards,
    >     > >> > > > > Steffen
    >     > >> > > > >
    >     > >> > > >
    >     > >> >
    >     > >>
    >     > >
    >     >
    >
    >
    >
    



Mime
View raw message