mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steffen Rochel <steffenroc...@gmail.com>
Subject Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release
Date Fri, 30 Nov 2018 02:45:58 GMT
Hi Zhi - thanks for the improvement, which we should consider for 1.4.0.
However, I don't see any tests with the PR and think it is too risky to add
changes without tests. I will add your PR to the tracking list, but would
like to ask you to add functional tests before completing the PR to master
and v1.4.x branch.

Steffen

On Thu, Nov 29, 2018 at 5:01 PM Joshua Z. Zhang <cheungchih@gmail.com>
wrote:

> Hi, I would like to bring a critical performance and stability patch of
> existing gluon dataloader to 1.4.0:
> https://github.com/apache/incubator-mxnet/pull/13447 <
> https://github.com/apache/incubator-mxnet/pull/13447>.
>
> This PR is finished, waiting for CI to pass.
>
> Steffen, could you help me add that to the tracked list?
>
> Best,
> Zhi
>
> > On Nov 29, 2018, at 4:25 PM, Naveen Swamy <mnnaveen@gmail.com> wrote:
> >
> > the tests are randomly failing in different stages
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/
> > This PR has failed 8 times so far
> >
> > On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel <steffenrochel@gmail.com>
> > wrote:
> >
> >> Pedro - ok. Please add PR to v1.4.x branch after merge to master and
> please
> >> update tracking page
> >> <
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> >>>
> >> .
> >> Steffen
> >>
> >> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy <
> pedro.larroy.lists@gmail.com
> >>>
> >> wrote:
> >>
> >>> PR is ready from my side and passes the tests, unless somebody raises
> >>> any concerns it's good to go.
> >>> On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <
> steffenrochel@gmail.com>
> >>> wrote:
> >>>>
> >>>> Pedro - added  to 1.4.0 tracking list
> >>>> <
> >>>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> >>>>
> >>>>
> >>>> Do you have already ETA?
> >>>> Steffen
> >>>>
> >>>> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> >>> pedro.larroy.lists@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hi all.
> >>>>>
> >>>>> There are two important issues / fixes that should go in the next
> >>>>> release in my radar:
> >>>>>
> >>>>> 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> >>>>> There is a bug in shape inference on CPU when not using MKL, also
we
> >>>>> are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> >>>>> I'm finishing a fix for these issues in the above PR.
> >>>>>
> >>>>> 2) https://github.com/apache/incubator-mxnet/issues/13438
> >>>>> We are seeing crashes due to unsafe setenv in multithreaded code.
> >>>>> Setenv / getenv from multiple threads is not safe and is causing
> >>>>> segfaults. This piece of code (the handlers in pthread_atfork)
> >> already
> >>>>> caused a very difficult to diagnose hang in a previous release,
where
> >>>>> a fork inside cudnn would deadlock the engine.
> >>>>>
> >>>>> I would remove setenv from 2) as a mitigation, but we would need
to
> >>>>> check for regressions as we could be creating additional threads
> >>>>> inside the engine.
> >>>>>
> >>>>> I would suggest that we address these two major issues before the
> >> next
> >>>>> release.
> >>>>>
> >>>>> Pedro
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> >>> steffenrochel@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Dear MXNet community,
> >>>>>>
> >>>>>> I will be the release manager for the upcoming Apache MXNet
1.4.0
> >>>>> release.
> >>>>>> Sergey Kolychev will be co-managing the release and providing
help
> >>> from
> >>>>> the
> >>>>>> committers side.
> >>>>>> A release candidate will be cut on November 29, 2018 and voting
> >> will
> >>>>> start
> >>>>>> December 7, 2018. Release notes have been drafted here [1].
If you
> >>> have
> >>>>> any
> >>>>>> additional features in progress and would like to include it
in
> >> this
> >>>>>> release, please assure they have been merged by November 27,
2018.
> >>>>> Release
> >>>>>> schedule is available here [2].
> >>>>>>
> >>>>>> Feel free to add any other comments/suggestions. Please help
to
> >>> review
> >>>>> and
> >>>>>> merge outstanding PR's and resolve issues impacting the quality
of
> >>> the
> >>>>>> 1.4.0 release.
> >>>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Steffen
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> >>>>>>
> >>>>>> [2]
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> >>>>>> kellen.sunderland@gmail.com> wrote:
> >>>>>>
> >>>>>>> Spoke too soon[1], looks like others have been adding Turing
> >>> support as
> >>>>>>> well (thanks to those helping with this).  I believe there's
> >> still
> >>> a
> >>>>> few
> >>>>>>> changes we'd have to make to claim support though (mshadow
CMake
> >>>>> changes,
> >>>>>>> PyPi package creation tweaks).
> >>>>>>>
> >>>>>>> 1:
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> >>>>>>>
> >>>>>>> On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> >>>>>>> kellen.sunderland@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> Hey Steffen, I'd like to be able to merge this PR for
version
> >>> 1.4:
> >>>>>>>> https://github.com/apache/incubator-mxnet/pull/13310
. It
> >> fixes
> >>> a
> >>>>>>>> regression in master which causes incorrect feature
vectors to
> >> be
> >>>>> output
> >>>>>>>> when using the TensorRT feature.  (Thanks to Nathalie
for
> >>> helping me
> >>>>>>> track
> >>>>>>>> down the root cause of the issue).   I'm currently blocked
on a
> >>> CI
> >>>>> issue
> >>>>>>> I
> >>>>>>>> haven't seen before, but hope to have it resolved by
EOW.
> >>>>>>>>
> >>>>>>>> One call-out I would make is that we currently don't
support
> >>> Turing
> >>>>>>>> architecture (sm_75).  I've been slowly trying to add
support,
> >>> but I
> >>>>>>> don't
> >>>>>>>> think I'd have capacity to do this done by EOW.  Does
anyone
> >> feel
> >>>>>>> strongly
> >>>>>>>> we need this in the 1.4 release?  From my perspective
this will
> >>>>> already
> >>>>>>> be
> >>>>>>>> a strong release without it.
> >>>>>>>>
> >>>>>>>> On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> >>>>> steffenrochel@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Thanks Patrick, lets target to get the PR's merged
this week.
> >>>>>>>>>
> >>>>>>>>> Call for contributions from the community: Right
now we have
> >> 10
> >>> PR
> >>>>>>>>> awaiting
> >>>>>>>>> merge
> >>>>>>>>> <
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> >>>>>>>>>>
> >>>>>>>>> and
> >>>>>>>>> we have 61 open PR awaiting review.
> >>>>>>>>> <
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> >>>>>>>>>>
> >>>>>>>>> I would appreciate if you all can help to review
the open PR
> >>> and the
> >>>>>>>>> committers can drive the merge before code freeze
for 1.4.0.
> >>>>>>>>>
> >>>>>>>>> The contributors on the Java API are making progress,
but not
> >>> all
> >>>>>>>>> performance issues are resolved. With some luck
it should be
> >>>>> possible to
> >>>>>>>>> code freeze towards end of this week.
> >>>>>>>>>
> >>>>>>>>> Are there other critical features/bugs/PR you think
need to be
> >>>>> included
> >>>>>>> in
> >>>>>>>>> 1.4.0? If so, please communicate as soon as possible.
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Steffen
> >>>>>>>>>
> >>>>>>>>> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> >>> patric.zhao@intel.com
> >>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks, Steffen. I think there is NO open issue
to block the
> >>>>> MKLDNN to
> >>>>>>>>> GA
> >>>>>>>>>> now.
> >>>>>>>>>>
> >>>>>>>>>> BTW, several quantization related PRs (#13297,#13260)
are
> >>> under
> >>>>> the
> >>>>>>>>> review
> >>>>>>>>>> and I think it can be merged in this week.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>>
> >>>>>>>>>> --Patric
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>> From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> >>>>>>>>>>> Sent: Tuesday, November 20, 2018 2:57 AM
> >>>>>>>>>>> To: dev@mxnet.incubator.apache.org
> >>>>>>>>>>> Subject: Re: [Announce] Upcoming Apache
MXNet (incubating)
> >>> 1.4.0
> >>>>>>>>> release
> >>>>>>>>>>>
> >>>>>>>>>>> On Friday the contributors working on Java
API discovered
> >> a
> >>>>>>> potential
> >>>>>>>>>>> performance problem with inference using
Java API vs.
> >>> Python.
> >>>>>>>>>> Investigation
> >>>>>>>>>>> is ongoing.
> >>>>>>>>>>> As the Java API is one of the main features
for the
> >> upcoming
> >>>>>>> release,
> >>>>>>>>> I
> >>>>>>>>>>> suggest to post-pone the code freeze towards
end of this
> >>> week.
> >>>>>>>>>>>
> >>>>>>>>>>> Please provide feedback and concern about
the change in
> >>> dates
> >>>>> for
> >>>>>>> code
> >>>>>>>>>>> freeze and 1.4.0 release. I will provide
updates on
> >> progress
> >>>>>>> resolving
> >>>>>>>>>> the
> >>>>>>>>>>> potential performance problem.
> >>>>>>>>>>>
> >>>>>>>>>>> Patrick - do you think it is possible to
resolve the
> >>> remaining
> >>>>>>> issues
> >>>>>>>>> on
> >>>>>>>>>> MKL-
> >>>>>>>>>>> DNN this week, so we can consider GA for
MKL-DNN with
> >> 1.4.0?
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Steffen
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov
<
> >>>>> mechernov@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> I'd like to remind everyone that 'code
freeze' would
> >> mean
> >>>>> cutting
> >>>>>>> a
> >>>>>>>>>>>> v1.4.x release branch and all following
fixes would need
> >>> to be
> >>>>>>>>>> backported.
> >>>>>>>>>>>> Development on master can be continued
as usual.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best
> >>>>>>>>>>>> Anton
> >>>>>>>>>>>>
> >>>>>>>>>>>> ср, 14 нояб. 2018 г. в 6:04,
Steffen Rochel <
> >>>>>>>>> steffenrochel@gmail.com>:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Dear MXNet community,
> >>>>>>>>>>>>> the agreed plan was to establish
code freeze for 1.4.0
> >>>>> release
> >>>>>>>>>>>>> today. As the 1.3.1 patch release
is still ongoing I
> >>>>> suggest to
> >>>>>>>>>>>>> post-pone the code freeze to Friday
16th November
> >> 2018.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sergey Kolychev has agreed to act
as co-release
> >> manager
> >>> for
> >>>>> all
> >>>>>>>>>>>>> tasks
> >>>>>>>>>>>> which
> >>>>>>>>>>>>> require committer privileges. If
anybody is interested
> >>> to
> >>>>>>>>> volunteer
> >>>>>>>>>>>>> as release manager - now is the
time to speak up.
> >>> Otherwise
> >>>>> I
> >>>>>>> will
> >>>>>>>>>>>>> manage
> >>>>>>>>>>>> the
> >>>>>>>>>>>>> release.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>> Steffen
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message