mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua Z. Zhang" <cheungc...@gmail.com>
Subject Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release
Date Fri, 30 Nov 2018 01:01:43 GMT
Hi, I would like to bring a critical performance and stability patch of existing gluon dataloader
to 1.4.0: https://github.com/apache/incubator-mxnet/pull/13447 <https://github.com/apache/incubator-mxnet/pull/13447>.


This PR is finished, waiting for CI to pass. 

Steffen, could you help me add that to the tracked list?

Best,
Zhi

> On Nov 29, 2018, at 4:25 PM, Naveen Swamy <mnnaveen@gmail.com> wrote:
> 
> the tests are randomly failing in different stages
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/
> This PR has failed 8 times so far
> 
> On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel <steffenrochel@gmail.com>
> wrote:
> 
>> Pedro - ok. Please add PR to v1.4.x branch after merge to master and please
>> update tracking page
>> <
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
>>> 
>> .
>> Steffen
>> 
>> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy <pedro.larroy.lists@gmail.com
>>> 
>> wrote:
>> 
>>> PR is ready from my side and passes the tests, unless somebody raises
>>> any concerns it's good to go.
>>> On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <steffenrochel@gmail.com>
>>> wrote:
>>>> 
>>>> Pedro - added  to 1.4.0 tracking list
>>>> <
>>> 
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
>>>> 
>>>> 
>>>> Do you have already ETA?
>>>> Steffen
>>>> 
>>>> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
>>> pedro.larroy.lists@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi all.
>>>>> 
>>>>> There are two important issues / fixes that should go in the next
>>>>> release in my radar:
>>>>> 
>>>>> 1) https://github.com/apache/incubator-mxnet/pull/13409/files
>>>>> There is a bug in shape inference on CPU when not using MKL, also we
>>>>> are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
>>>>> I'm finishing a fix for these issues in the above PR.
>>>>> 
>>>>> 2) https://github.com/apache/incubator-mxnet/issues/13438
>>>>> We are seeing crashes due to unsafe setenv in multithreaded code.
>>>>> Setenv / getenv from multiple threads is not safe and is causing
>>>>> segfaults. This piece of code (the handlers in pthread_atfork)
>> already
>>>>> caused a very difficult to diagnose hang in a previous release, where
>>>>> a fork inside cudnn would deadlock the engine.
>>>>> 
>>>>> I would remove setenv from 2) as a mitigation, but we would need to
>>>>> check for regressions as we could be creating additional threads
>>>>> inside the engine.
>>>>> 
>>>>> I would suggest that we address these two major issues before the
>> next
>>>>> release.
>>>>> 
>>>>> Pedro
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
>>> steffenrochel@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Dear MXNet community,
>>>>>> 
>>>>>> I will be the release manager for the upcoming Apache MXNet 1.4.0
>>>>> release.
>>>>>> Sergey Kolychev will be co-managing the release and providing help
>>> from
>>>>> the
>>>>>> committers side.
>>>>>> A release candidate will be cut on November 29, 2018 and voting
>> will
>>>>> start
>>>>>> December 7, 2018. Release notes have been drafted here [1]. If you
>>> have
>>>>> any
>>>>>> additional features in progress and would like to include it in
>> this
>>>>>> release, please assure they have been merged by November 27, 2018.
>>>>> Release
>>>>>> schedule is available here [2].
>>>>>> 
>>>>>> Feel free to add any other comments/suggestions. Please help to
>>> review
>>>>> and
>>>>>> merge outstanding PR's and resolve issues impacting the quality of
>>> the
>>>>>> 1.4.0 release.
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> Steffen
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
>>>>>> 
>>>>>> [2]
>>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
>>>>>> kellen.sunderland@gmail.com> wrote:
>>>>>> 
>>>>>>> Spoke too soon[1], looks like others have been adding Turing
>>> support as
>>>>>>> well (thanks to those helping with this).  I believe there's
>> still
>>> a
>>>>> few
>>>>>>> changes we'd have to make to claim support though (mshadow CMake
>>>>> changes,
>>>>>>> PyPi package creation tweaks).
>>>>>>> 
>>>>>>> 1:
>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
>>>>>>> 
>>>>>>> On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
>>>>>>> kellen.sunderland@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hey Steffen, I'd like to be able to merge this PR for version
>>> 1.4:
>>>>>>>> https://github.com/apache/incubator-mxnet/pull/13310 . It
>> fixes
>>> a
>>>>>>>> regression in master which causes incorrect feature vectors
to
>> be
>>>>> output
>>>>>>>> when using the TensorRT feature.  (Thanks to Nathalie for
>>> helping me
>>>>>>> track
>>>>>>>> down the root cause of the issue).   I'm currently blocked
on a
>>> CI
>>>>> issue
>>>>>>> I
>>>>>>>> haven't seen before, but hope to have it resolved by EOW.
>>>>>>>> 
>>>>>>>> One call-out I would make is that we currently don't support
>>> Turing
>>>>>>>> architecture (sm_75).  I've been slowly trying to add support,
>>> but I
>>>>>>> don't
>>>>>>>> think I'd have capacity to do this done by EOW.  Does anyone
>> feel
>>>>>>> strongly
>>>>>>>> we need this in the 1.4 release?  From my perspective this
will
>>>>> already
>>>>>>> be
>>>>>>>> a strong release without it.
>>>>>>>> 
>>>>>>>> On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
>>>>> steffenrochel@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Thanks Patrick, lets target to get the PR's merged this
week.
>>>>>>>>> 
>>>>>>>>> Call for contributions from the community: Right now
we have
>> 10
>>> PR
>>>>>>>>> awaiting
>>>>>>>>> merge
>>>>>>>>> <
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
>>>>>>>>>> 
>>>>>>>>> and
>>>>>>>>> we have 61 open PR awaiting review.
>>>>>>>>> <
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
>>>>>>>>>> 
>>>>>>>>> I would appreciate if you all can help to review the
open PR
>>> and the
>>>>>>>>> committers can drive the merge before code freeze for
1.4.0.
>>>>>>>>> 
>>>>>>>>> The contributors on the Java API are making progress,
but not
>>> all
>>>>>>>>> performance issues are resolved. With some luck it should
be
>>>>> possible to
>>>>>>>>> code freeze towards end of this week.
>>>>>>>>> 
>>>>>>>>> Are there other critical features/bugs/PR you think need
to be
>>>>> included
>>>>>>> in
>>>>>>>>> 1.4.0? If so, please communicate as soon as possible.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Steffen
>>>>>>>>> 
>>>>>>>>> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
>>> patric.zhao@intel.com
>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks, Steffen. I think there is NO open issue to
block the
>>>>> MKLDNN to
>>>>>>>>> GA
>>>>>>>>>> now.
>>>>>>>>>> 
>>>>>>>>>> BTW, several quantization related PRs (#13297,#13260)
are
>>> under
>>>>> the
>>>>>>>>> review
>>>>>>>>>> and I think it can be merged in this week.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> 
>>>>>>>>>> --Patric
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Steffen Rochel [mailto:steffenrochel@gmail.com]
>>>>>>>>>>> Sent: Tuesday, November 20, 2018 2:57 AM
>>>>>>>>>>> To: dev@mxnet.incubator.apache.org
>>>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet
(incubating)
>>> 1.4.0
>>>>>>>>> release
>>>>>>>>>>> 
>>>>>>>>>>> On Friday the contributors working on Java API
discovered
>> a
>>>>>>> potential
>>>>>>>>>>> performance problem with inference using Java
API vs.
>>> Python.
>>>>>>>>>> Investigation
>>>>>>>>>>> is ongoing.
>>>>>>>>>>> As the Java API is one of the main features for
the
>> upcoming
>>>>>>> release,
>>>>>>>>> I
>>>>>>>>>>> suggest to post-pone the code freeze towards
end of this
>>> week.
>>>>>>>>>>> 
>>>>>>>>>>> Please provide feedback and concern about the
change in
>>> dates
>>>>> for
>>>>>>> code
>>>>>>>>>>> freeze and 1.4.0 release. I will provide updates
on
>> progress
>>>>>>> resolving
>>>>>>>>>> the
>>>>>>>>>>> potential performance problem.
>>>>>>>>>>> 
>>>>>>>>>>> Patrick - do you think it is possible to resolve
the
>>> remaining
>>>>>>> issues
>>>>>>>>> on
>>>>>>>>>> MKL-
>>>>>>>>>>> DNN this week, so we can consider GA for MKL-DNN
with
>> 1.4.0?
>>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Steffen
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov
<
>>>>> mechernov@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I'd like to remind everyone that 'code freeze'
would
>> mean
>>>>> cutting
>>>>>>> a
>>>>>>>>>>>> v1.4.x release branch and all following fixes
would need
>>> to be
>>>>>>>>>> backported.
>>>>>>>>>>>> Development on master can be continued as
usual.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best
>>>>>>>>>>>> Anton
>>>>>>>>>>>> 
>>>>>>>>>>>> ср, 14 нояб. 2018 г. в 6:04, Steffen
Rochel <
>>>>>>>>> steffenrochel@gmail.com>:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Dear MXNet community,
>>>>>>>>>>>>> the agreed plan was to establish code
freeze for 1.4.0
>>>>> release
>>>>>>>>>>>>> today. As the 1.3.1 patch release is
still ongoing I
>>>>> suggest to
>>>>>>>>>>>>> post-pone the code freeze to Friday 16th
November
>> 2018.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Sergey Kolychev has agreed to act as
co-release
>> manager
>>> for
>>>>> all
>>>>>>>>>>>>> tasks
>>>>>>>>>>>> which
>>>>>>>>>>>>> require committer privileges. If anybody
is interested
>>> to
>>>>>>>>> volunteer
>>>>>>>>>>>>> as release manager - now is the time
to speak up.
>>> Otherwise
>>>>> I
>>>>>>> will
>>>>>>>>>>>>> manage
>>>>>>>>>>>> the
>>>>>>>>>>>>> release.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Steffen
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message