mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Larroy <pedro.larroy.li...@gmail.com>
Subject Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release
Date Thu, 29 Nov 2018 14:13:24 GMT
Hi all.

There are two important issues / fixes that should go in the next
release in my radar:

1) https://github.com/apache/incubator-mxnet/pull/13409/files
There is a bug in shape inference on CPU when not using MKL, also we
are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
I'm finishing a fix for these issues in the above PR.

2) https://github.com/apache/incubator-mxnet/issues/13438
We are seeing crashes due to unsafe setenv in multithreaded code.
Setenv / getenv from multiple threads is not safe and is causing
segfaults. This piece of code (the handlers in pthread_atfork) already
caused a very difficult to diagnose hang in a previous release, where
a fork inside cudnn would deadlock the engine.

I would remove setenv from 2) as a mitigation, but we would need to
check for regressions as we could be creating additional threads
inside the engine.

I would suggest that we address these two major issues before the next release.

Pedro



On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <steffenrochel@gmail.com> wrote:
>
> Dear MXNet community,
>
> I will be the release manager for the upcoming Apache MXNet 1.4.0 release.
> Sergey Kolychev will be co-managing the release and providing help from the
> committers side.
> A release candidate will be cut on November 29, 2018 and voting will start
> December 7, 2018. Release notes have been drafted here [1]. If you have any
> additional features in progress and would like to include it in this
> release, please assure they have been merged by November 27, 2018. Release
> schedule is available here [2].
>
> Feel free to add any other comments/suggestions. Please help to review and
> merge outstanding PR's and resolve issues impacting the quality of the
> 1.4.0 release.
>
> Regards,
>
> Steffen
>
> [1]
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
>
> [2] https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
>
>
>
>
> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > Spoke too soon[1], looks like others have been adding Turing support as
> > well (thanks to those helping with this).  I believe there's still a few
> > changes we'd have to make to claim support though (mshadow CMake changes,
> > PyPi package creation tweaks).
> >
> > 1:
> >
> > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> >
> > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > kellen.sunderland@gmail.com> wrote:
> >
> > > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
> > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
> > > regression in master which causes incorrect feature vectors to be output
> > > when using the TensorRT feature.  (Thanks to Nathalie for helping me
> > track
> > > down the root cause of the issue).   I'm currently blocked on a CI issue
> > I
> > > haven't seen before, but hope to have it resolved by EOW.
> > >
> > > One call-out I would make is that we currently don't support Turing
> > > architecture (sm_75).  I've been slowly trying to add support, but I
> > don't
> > > think I'd have capacity to do this done by EOW.  Does anyone feel
> > strongly
> > > we need this in the 1.4 release?  From my perspective this will already
> > be
> > > a strong release without it.
> > >
> > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <steffenrochel@gmail.com>
> > > wrote:
> > >
> > >> Thanks Patrick, lets target to get the PR's merged this week.
> > >>
> > >> Call for contributions from the community: Right now we have 10 PR
> > >> awaiting
> > >> merge
> > >> <
> > >>
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > >> >
> > >> and
> > >> we have 61 open PR awaiting review.
> > >> <
> > >>
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > >> >
> > >> I would appreciate if you all can help to review the open PR and the
> > >> committers can drive the merge before code freeze for 1.4.0.
> > >>
> > >> The contributors on the Java API are making progress, but not all
> > >> performance issues are resolved. With some luck it should be possible to
> > >> code freeze towards end of this week.
> > >>
> > >> Are there other critical features/bugs/PR you think need to be included
> > in
> > >> 1.4.0? If so, please communicate as soon as possible.
> > >>
> > >> Regards,
> > >> Steffen
> > >>
> > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <patric.zhao@intel.com>
> > >> wrote:
> > >>
> > >> > Thanks, Steffen. I think there is NO open issue to block the MKLDNN
to
> > >> GA
> > >> > now.
> > >> >
> > >> > BTW, several quantization related PRs (#13297,#13260) are under the
> > >> review
> > >> > and I think it can be merged in this week.
> > >> >
> > >> > Thanks,
> > >> >
> > >> > --Patric
> > >> >
> > >> >
> > >> > > -----Original Message-----
> > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > >> > > To: dev@mxnet.incubator.apache.org
> > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0
> > >> release
> > >> > >
> > >> > > On Friday the contributors working on Java API discovered a
> > potential
> > >> > > performance problem with inference using Java API vs. Python.
> > >> > Investigation
> > >> > > is ongoing.
> > >> > > As the Java API is one of the main features for the upcoming
> > release,
> > >> I
> > >> > > suggest to post-pone the code freeze towards end of this week.
> > >> > >
> > >> > > Please provide feedback and concern about the change in dates
for
> > code
> > >> > > freeze and 1.4.0 release. I will provide updates on progress
> > resolving
> > >> > the
> > >> > > potential performance problem.
> > >> > >
> > >> > > Patrick - do you think it is possible to resolve the remaining
> > issues
> > >> on
> > >> > MKL-
> > >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> > >> > >
> > >> > > Regards,
> > >> > > Steffen
> > >> > >
> > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <mechernov@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > I'd like to remind everyone that 'code freeze' would mean
cutting
> > a
> > >> > > > v1.4.x release branch and all following fixes would need
to be
> > >> > backported.
> > >> > > > Development on master can be continued as usual.
> > >> > > >
> > >> > > > Best
> > >> > > > Anton
> > >> > > >
> > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > >> steffenrochel@gmail.com>:
> > >> > > >
> > >> > > > > Dear MXNet community,
> > >> > > > > the agreed plan was to establish code freeze for 1.4.0
release
> > >> > > > > today. As the 1.3.1 patch release is still ongoing
I suggest to
> > >> > > > > post-pone the code freeze to Friday 16th November 2018.
> > >> > > > >
> > >> > > > > Sergey Kolychev has agreed to act as co-release manager
for all
> > >> > > > > tasks
> > >> > > > which
> > >> > > > > require committer privileges. If anybody is interested
to
> > >> volunteer
> > >> > > > > as release manager - now is the time to speak up. Otherwise
I
> > will
> > >> > > > > manage
> > >> > > > the
> > >> > > > > release.
> > >> > > > >
> > >> > > > > Regards,
> > >> > > > > Steffen
> > >> > > > >
> > >> > > >
> > >> >
> > >>
> > >
> >

Mime
View raw message