mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sheng Zha <szha....@gmail.com>
Subject Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release
Date Tue, 06 Nov 2018 23:59:27 GMT
Similar to the two PRs that Haibin suggested, 12992 introduces new interface for controlling
determinism, which is better suited for minor release.

I think other than lack of release manager to drive 1.4.0 release, there’s no reason we
cannot do two releases (1.4.0 & 1.3.1) at the same time. I’m willing to help with the
1.4.0 release to make these new features available one month sooner, if there’s no other
concern.

-sz

> On Nov 6, 2018, at 3:30 PM, Lin Yuan <apeforest@gmail.com> wrote:
> 
> Hi Anton,
> 
> Thanks for helping the release.
> The following PRs are needed by customers who want to use deterministic
> CUDNN convolution algorithms:
> 
> https://github.com/apache/incubator-mxnet/pull/12992
> https://github.com/apache/incubator-mxnet/pull/13049
> 
> Thanks!
> 
> Lin
> 
> 
> On Tue, Nov 6, 2018 at 1:51 PM Aaron Markham <aaron.s.markham@gmail.com>
> wrote:
> 
>> Hi Anton,
>> I have the following suggestions for fixes to include in 1.3.1. These each
>> have updates to files that will impact docs generation for the 1.3.x
>> version of the website's Python API docs:
>> 
>> https://github.com/apache/incubator-mxnet/pull/12879
>> https://github.com/apache/incubator-mxnet/pull/12871
>> https://github.com/apache/incubator-mxnet/pull/12856
>> 
>> Thanks,
>> Aaron
>> 
>>> On Tue, Nov 6, 2018 at 1:29 PM Lai Wei <royweilai@gmail.com> wrote:
>>> 
>>> Hi Anton,
>>> 
>>> Thanks for driving this, I would like to include the following fix in
>>> 1.3.1:
>>> Allow infer shape partial on foreach operator:
>>> https://github.com/apache/incubator-mxnet/pull/12471
>>> 
>>> Keras-MXNet needs this functionality to infer shape partially
>>> on foreach operator. (Used in RNN operators)
>>> 
>>> Thanks a lot!
>>> 
>>> 
>>> Best Regards
>>> Lai Wei
>>> 
>>> 
>>> 
>>> On Tue, Nov 6, 2018 at 10:44 AM Haibin Lin <haibin.lin.aws@gmail.com>
>>> wrote:
>>> 
>>>> Hi Naveen and Anton,
>>>> 
>>>> Thanks for pointing that out. You are right that these are not critical
>>>> fixes. Putting them in 1.4.0 is more appropriate. PRs are closed.
>>>> 
>>>> Best,
>>>> Haibin
>>>> 
>>>> On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy <mnnaveen@gmail.com>
>> wrote:
>>>> 
>>>>> Please note that this is a patch release(1.3.1) to address critical
>>>> bugs!,
>>>>> For everything else please wait for 1.4.0 which is planned very
>> shortly
>>>>> after 1.3.1
>>>>> 
>>>>>> On Nov 6, 2018, at 7:17 AM, Anton Chernov <mechernov@gmail.com>
>>> wrote:
>>>>>> 
>>>>>> The following PR's have been created so far:
>>>>>> 
>>>>>> Infer dtype in SymbolBlock import from input symbol (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13117
>>>>>> 
>>>>>> [MXNET-953] Fix oob memory read (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13118
>>>>>> 
>>>>>> [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13119
>>>>>> 
>>>>>> [MXNET-922] Fix memleak in profiler (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13120
>>>>>> 
>>>>>> Set correct update on kvstore flag in dist_device_sync mode
>> (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13121
>>>>>> 
>>>>>> update mshadow (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13122
>>>>>> 
>>>>>> CudnnFind() usage improvements (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13123
>>>>>> 
>>>>>> Fix lazy record io when used with dataloader and multi_worker >
0
>>>>> (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13124
>>>>>> 
>>>>>> 
>>>>>> As stated previously I would be rather opposed to have following
>> PR's
>>>> it
>>>>> in
>>>>>> the patch release:
>>>>>> 
>>>>>> Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
>>>>>> https://github.com/apache/incubator-mxnet/pull/13129
>>>>>> 
>>>>>> sample_like operators (#13034) v1.3.x
>>>>>> https://github.com/apache/incubator-mxnet/pull/13130
>>>>>> 
>>>>>> 
>>>>>> Best
>>>>>> Anton
>>>>>> 
>>>>>> вт, 6 нояб. 2018 г. в 16:06, Anton Chernov <mechernov@gmail.com>:
>>>>>> 
>>>>>>> Hi Haibin,
>>>>>>> 
>>>>>>> I have a few comments regarding the proposed performance
>> improvement
>>>>>>> changes.
>>>>>>> 
>>>>>>> CUDNN support for LSTM with projection & clipping
>>>>>>> https://github.com/apache/incubator-mxnet/pull/13056
>>>>>>> 
>>>>>>> There is no doubt that this change brings value, but I don't
see
>> it
>>>> as a
>>>>>>> critical bug fix. I would rather leave it for the next major
>>> release.
>>>>>>> 
>>>>>>> sample_like operators
>>>>>>> https://github.com/apache/incubator-mxnet/pull/13034
>>>>>>> 
>>>>>>> Even if it's related to performance, this is an addition of
>>>>> functionality
>>>>>>> and I would also push this to be in the next major release only.
>>>>>>> 
>>>>>>> 
>>>>>>> Best
>>>>>>> Anton
>>>>>>> 
>>>>>>> 
>>>>>>> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov <mechernov@gmail.com>:
>>>>>>> 
>>>>>>>> Hi Patric,
>>>>>>>> 
>>>>>>>> This change was listed in the 'PR candidates suggested for
>>>>> consideration
>>>>>>>> for v1.3.1 patch release' section [1].
>>>>>>>> 
>>>>>>>> You are right, I also think that this is not a critical hotfix
>>> change
>>>>>>>> that should be included into the 1.3.1 patch release.
>>>>>>>> 
>>>>>>>> Thus I'm not making any further efforts to bring it in.
>>>>>>>> 
>>>>>>>> Best
>>>>>>>> Anton
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
>>>>>>>> 
>>>>>>>> 
>>>>>>>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric <patric.zhao@intel.com
>>> :
>>>>>>>> 
>>>>>>>>> Hi Anton,
>>>>>>>>> 
>>>>>>>>> Thanks for looking into the MKL-DNN PR.
>>>>>>>>> 
>>>>>>>>> As my understanding of cwiki (
>>>>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
>>>>>>>>> ),
>>>>>>>>> these features will go into 1.4 rather than patch release
of
>>> 1.3.1.
>>>>>>>>> 
>>>>>>>>> Feel free to correct me :)
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> --Patric
>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Anton Chernov [mailto:mechernov@gmail.com]
>>>>>>>>>> Sent: Tuesday, November 6, 2018 3:11 AM
>>>>>>>>>> To: dev@mxnet.apache.org
>>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
>> 1.3.1
>>>>> patch
>>>>>>>>>> release
>>>>>>>>>> 
>>>>>>>>>> It seems that there is a problem porting following
changes to
>> the
>>>>>>>>> v1.3.x
>>>>>>>>>> release branch:
>>>>>>>>>> 
>>>>>>>>>> Implement mkldnn convolution fusion and quantization
>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
>>>>>>>>>> 
>>>>>>>>>> MKL-DNN Quantization Examples and README
>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12808
>>>>>>>>>> 
>>>>>>>>>> The bases are different.
>>>>>>>>>> 
>>>>>>>>>> I would need help from authors of these changes to
make a
>>> backport
>>>>> PR.
>>>>>>>>>> 
>>>>>>>>>> @ZhennanQin, @xinyu-intel would you be able to assist
me and
>>> create
>>>>> the
>>>>>>>>>> corresponding PR's?
>>>>>>>>>> 
>>>>>>>>>> Without proper history and domain knowledge I would
not be able
>>> to
>>>>>>>>> create
>>>>>>>>>> them by my own in reasonable amount of time, I'm
afraid.
>>>>>>>>>> 
>>>>>>>>>> Best regards,
>>>>>>>>>> Anton
>>>>>>>>>> 
>>>>>>>>>> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov
<
>> mechernov@gmail.com
>>>> :
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> As part of:
>>>>>>>>>>> 
>>>>>>>>>>> Implement mkldnn convolution fusion and quantization
>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
>>>>>>>>>>> 
>>>>>>>>>>> I propose to add the examples and documentation
PR as well:
>>>>>>>>>>> 
>>>>>>>>>>> MKL-DNN Quantization Examples and README
>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12808
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Anton
>>>>>>>>>>> 
>>>>>>>>>>> пн, 5 нояб. 2018 г. в 19:02, Anton Chernov
<
>> mechernov@gmail.com
>>>> :
>>>>>>>>>>> 
>>>>>>>>>>>> Dear MXNet community,
>>>>>>>>>>>> 
>>>>>>>>>>>> I will be the release manager for the upcoming
1.3.1 patch
>>>> release.
>>>>>>>>>>>> Naveen will be co-managing the release and
providing help
>> from
>>>> the
>>>>>>>>>>>> committers side.
>>>>>>>>>>>> 
>>>>>>>>>>>> The following dates have been set:
>>>>>>>>>>>> 
>>>>>>>>>>>> Code Freeze: 31st October 2018
>>>>>>>>>>>> Release published: 13th November 2018
>>>>>>>>>>>> 
>>>>>>>>>>>> Release notes have been drafted here [1].
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> * Known issues
>>>>>>>>>>>> 
>>>>>>>>>>>> Update MKL-DNN dependency
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12953
>>>>>>>>>>>> 
>>>>>>>>>>>> This PR hasn't been merged even to master
yet. Requires
>>>> additional
>>>>>>>>>>>> discussion and merge.
>>>>>>>>>>>> 
>>>>>>>>>>>> distributed kvstore bug in MXNet
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/issues/12713
>>>>>>>>>>>> 
>>>>>>>>>>>>> When distributed kvstore is used, by
default gluon.Trainer
>>>> doesn't
>>>>>>>>>>>>> work
>>>>>>>>>>>> with mx.optimizer.LRScheduler if a worker
has more than 1
>> GPU.
>>> To
>>>>> be
>>>>>>>>>>>> more specific, the trainer updates once per
GPU, the
>>> LRScheduler
>>>>>>>>>>>> object is shared across GPUs and get a wrong
update count.
>>>>>>>>>>>> 
>>>>>>>>>>>> This needs to be fixed. [6]
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> * Changes
>>>>>>>>>>>> 
>>>>>>>>>>>> The following changes will be ported to the
release branch,
>> per
>>>>> [2]:
>>>>>>>>>>>> 
>>>>>>>>>>>> Infer dtype in SymbolBlock import from input
symbol [3]
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12412
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-953] Fix oob memory read
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12631
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-969] Fix buffer overflow in RNNOp
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12603
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-922] Fix memleak in profiler
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12499
>>>>>>>>>>>> 
>>>>>>>>>>>> Implement mkldnn convolution fusion and quantization
(MXNet
>>> Graph
>>>>>>>>>>>> Optimization and Quantization based on subgraph
and MKL-DNN
>>>>>>>>>> proposal
>>>>>>>>>>>> [4])
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
>>>>>>>>>>>> 
>>>>>>>>>>>> Following items (test cases) should be already
part of 1.3.0:
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-486] Create CPP test for concat MKLDNN
operator
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11371
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-489] MKLDNN Pool test
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11608
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-484] MKLDNN C++ test for LRN operator
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11831
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-546] Add unit test for MKLDNNSum
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11272
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-498] Test MKLDNN backward operators
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11232
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-500] Test cases improvement for MKLDNN
on Gluon
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/10921
>>>>>>>>>>>> 
>>>>>>>>>>>> Set correct update on kvstore flag in dist_device_sync
mode
>> (as
>>>>> part
>>>>>>>>>>>> of fixing [5])
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12786
>>>>>>>>>>>> 
>>>>>>>>>>>> upgrade mshadow version
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12692
>>>>>>>>>>>> But another PR will be used instead:
>>>>>>>>>>>> update mshadow
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12674
>>>>>>>>>>>> 
>>>>>>>>>>>> CudnnFind() usage improvements
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12804
>>>>>>>>>>>> A critical CUDNN fix that reduces GPU memory
consumption and
>>>>>>>>>>>> addresses this memory leak issue. This is
an important fix to
>>>>>>>>> include
>>>>>>>>>>>> in 1.3.1
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> From discussion about gluon toolkits:
>>>>>>>>>>>> 
>>>>>>>>>>>> disable opencv threading for forked process
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12025
>>>>>>>>>>>> 
>>>>>>>>>>>> Fix lazy record io when used with dataloader
and multi_worker
>>>> 0
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12554
>>>>>>>>>>>> 
>>>>>>>>>>>> fix potential floating number overflow, enable
float16
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12118
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> * Resolved issues
>>>>>>>>>>>> 
>>>>>>>>>>>> MxNet 1.2.1–module get_outputs()
>>>>>>>>>>>> 
>> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
>>>>>>>>>>>> 
>>>>>>>>>>>> As far as I can see from the comments the
issue has been
>>>> resolved,
>>>>>>>>> no
>>>>>>>>>>>> actions need to be taken for this release.
[7] is mentioned
>> in
>>>> this
>>>>>>>>>>>> regards, but I don't see any action points
here either.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> I will start with help of Naveen port the
mentioned PR's to
>> the
>>>>>>>>> 1.3.x
>>>>>>>>>>>> branch.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Anton
>>>>>>>>>>>> 
>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
>>>>>>>>>>>> [2]
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
>>>>>>>>>>>> or+next+MXNet+Release [3]
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/issues/11849
>>>>>>>>>>>> [4]
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>> 
>> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
>>>>>>>>>>>> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
>>>>>>>>>>>> [5] https://github.com/apache/incubator-mxnet/issues/12713
>>>>>>>>>>>> [6]
>>>>>>>>>>>> https://github.com/apache/incubator-
>>>>>>>>>> mxnet/issues/12713#issuecomment-4
>>>>>>>>>>>> 35773777 [7]
>>>> https://github.com/apache/incubator-mxnet/pull/11005
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Mime
View raw message