mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Chernov <mecher...@gmail.com>
Subject Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release
Date Mon, 12 Nov 2018 21:17:47 GMT
Unfortunately, merging the following PR

Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13121

Broke `dist-kvstore tests CPU` test stage:

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/82/pipeline

A revert PR has been opened:

Revert "Set correct update on kvstore flag in dist_device_sync mode
(v1.3.x) (#13121)
https://github.com/apache/incubator-mxnet/pull/13228

The test already passed, so the PR is good to go. The initial fix will not
be considered for the release and will get a notion in the known issues
section.

Added a version bump to the release branch:

news, readme update for v1.3.1 release
https://github.com/apache/incubator-mxnet/pull/13225

Since patch releases are now done on branches the master branch needs a
version update. Following PR for introducing the change:

Bumped minor version to 1.4.0 as 1.3.1 will be continued in the v1.3x branch
https://github.com/apache/incubator-mxnet/pull/13231


The confluence page 'Apache MXNet (incubating) 1.3.1 Release Notes' has
been updated:
https://cwiki.apache.org/confluence/x/eZGzBQ


Best
Anton

сб, 10 нояб. 2018 г. в 11:59, Anton Chernov <mechernov@gmail.com>:

> Due to various problems we had to postpone the tagging and vote for the
> release till Monday, the 12th of November 2018.
>
> Following change has been updated and waiting to be merged:
>
> Disable flaky test test_operator.test_dropout (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13200
>
> Indeed the MACOS tests timed out as well for the branch. The proposed
> change contains thus only the build:
>
> [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13179
>
>
> Best
> Anton
>
> пт, 9 нояб. 2018 г. в 13:11, Anton Chernov <mechernov@gmail.com>:
>
>> I created the following PR to disable the test:
>>
>> Disable flaky test test_operator.test_dropout (v1.3.x)
>> https://github.com/apache/incubator-mxnet/pull/13200
>>
>> The second failure I suppose is related to:
>>
>> distributed kvstore bug in MXNet
>> https://github.com/apache/incubator-mxnet/issues/12713
>>
>> Which partially was fixed by
>>
>> Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
>> https://github.com/apache/incubator-mxnet/pull/13121
>>
>> But another part of the issue is still open and does not have a fix yet:
>>
>> "When distributed kvstore is used, by default gluon.Trainer doesn't work
>> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be more
>> specific, the trainer updates once per GPU, the LRScheduler object is
>> shared across GPUs and get a wrong update count."
>>
>>
>> Best
>> Anton
>>
>>
>> пт, 9 нояб. 2018 г. в 11:48, Anton Chernov <mechernov@gmail.com>:
>>
>>> In case the tests for MACOS will time out as well we can disable them
>>> and keep at least the build stage as in:
>>>
>>> Disable travis tests
>>> https://github.com/apache/incubator-mxnet/pull/13137
>>>
>>> Best
>>> Anton
>>>
>>> пт, 9 нояб. 2018 г. в 11:17, Anton Chernov <mechernov@gmail.com>:
>>>
>>>>
>>>> Hi Naveen,
>>>>
>>>> I believe that the timeout is not an issue for the branch. And I see
>>>> great benefit in having tests for MACOS on the release branch. The travis
>>>> build is not blocking anyway, so I don't see any risk in adding it.
>>>>
>>>> * test_dropout
>>>>
>>>> Currently, there is a problem with test_dropout that fails consistently
>>>> on the branch:
>>>>
>>>>
>>>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/97/pipeline
>>>>
>>>> Error reported:
>>>>
>>>> ======================================================================
>>>> FAIL: test_operator.test_dropout
>>>> ----------------------------------------------------------------------
>>>> Traceback (most recent call last):
>>>>   File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line
>>>> 197, in runTest
>>>>     self.test(*self.arg)
>>>>   File
>>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py",
>>>> line 173, in test_new
>>>>     orig_test(*args, **kwargs)
>>>>   File
>>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
>>>> line 5853, in test_dropout
>>>>     check_dropout_ratio(0.0, shape)
>>>>   File
>>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
>>>> line 5797, in check_dropout_ratio
>>>>     assert exe.outputs[0].asnumpy().min() == min_value
>>>> AssertionError:
>>>> -------------------- >> begin captured logging << --------------------
>>>> common: INFO: Setting test np/mx/python random seeds, use
>>>> MXNET_TEST_SEED=428273587 to reproduce.
>>>> --------------------- >> end captured logging << ---------------------
>>>>
>>>> The test is enabled on master:
>>>>
>>>> Re-enables test_operator.test_dropout
>>>> https://github.com/apache/incubator-mxnet/pull/12717
>>>>
>>>> And there are no failures for it [1].
>>>>
>>>> * KVStore tests
>>>>
>>>> Unfortunately, KVStore tests fail as well.
>>>>
>>>>
>>>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/96/pipeline
>>>>
>>>> Error reported:
>>>>
>>>> AssertionError
>>>> test_gluon_trainer_type()
>>>>     assert trainer._update_on_kvstore is update_on_kv\
>>>>   File "dist_sync_kvstore.py", line 388, in test_gluon_trainer_type
>>>>
>>>> If nobody has a fix for these issues, I will disable the tests and add
>>>> information to the known issues section.
>>>>
>>>> Best
>>>> Anton
>>>>
>>>> [1]
>>>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/master/
>>>>
>>>> чт, 8 нояб. 2018 г. в 21:44, Naveen Swamy <mnnaveen@gmail.com>:
>>>>
>>>>> Anton, I don't think we need to add the Mac OS tests for 1.3.1 branch
>>>>> since
>>>>> travis CI is timing out and creates blockers, it also did not exist for
>>>>> v1.3.0.
>>>>>
>>>>>
>>>>> On Thu, Nov 8, 2018 at 10:04 AM Anton Chernov <mechernov@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > A PR to fix the tests:
>>>>> >
>>>>> > Remove test for non existing index copy operator (v1.3.x)
>>>>> > https://github.com/apache/incubator-mxnet/pull/13180
>>>>> >
>>>>> >
>>>>> > Best
>>>>> > Anton
>>>>> >
>>>>> > чт, 8 нояб. 2018 г. в 10:05, Anton Chernov <mechernov@gmail.com>:
>>>>> >
>>>>> > > An addition has been made to include MacOS tests for the v1.3.x
>>>>> branch:
>>>>> > >
>>>>> > > [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
>>>>> > > https://github.com/apache/incubator-mxnet/pull/13179
>>>>> > >
>>>>> > > It includes following PR's for master:
>>>>> > >
>>>>> > > [MXNET-908] Enable minimal OSX Travis build
>>>>> > > https://github.com/apache/incubator-mxnet/pull/12462
>>>>> > >
>>>>> > > [MXNET-908] Enable python tests in Travis
>>>>> > > https://github.com/apache/incubator-mxnet/pull/12550
>>>>> > >
>>>>> > > [MXNET-968] Fix MacOS python tests
>>>>> > > https://github.com/apache/incubator-mxnet/pull/12590
>>>>> > >
>>>>> > >
>>>>> > > Best
>>>>> > > Anton
>>>>> > >
>>>>> > >
>>>>> > > чт, 8 нояб. 2018 г. в 9:38, Anton Chernov <mechernov@gmail.com>:
>>>>> > >
>>>>> > >> Thank you everyone for your support and suggestions. All
proposed
>>>>> PR's
>>>>> > >> have been merged. We will tag the release candidate and
start the
>>>>> vote
>>>>> > on
>>>>> > >> Friday, the 9th of November 2018.
>>>>> > >>
>>>>> > >> Unfortunately after the merges the tests started to fail:
>>>>> > >>
>>>>> > >>
>>>>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/
>>>>> > >>
>>>>> > >> I will look into the failures, but any help as usual is
very
>>>>> > appreciated.
>>>>> > >>
>>>>> > >> The nightly tests are fine:
>>>>> > >>
>>>>> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/
>>>>> > >>
>>>>> > >>
>>>>> > >> Best
>>>>> > >> Anton
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov <mechernov@gmail.com>:
>>>>> > >>
>>>>> > >>> Yes, you are right about the versions wording, thanks
for
>>>>> > clarification.
>>>>> > >>>
>>>>> > >>> A performance improvement can be considered a bugfix
as well. I
>>>>> see no
>>>>> > >>> big risks in including PR's by Haibin and Lin into
the patch
>>>>> release.
>>>>> > >>>
>>>>> > >>> @Haibin, if you can reopen the PR's they should be
good to go
>>>>> for the
>>>>> > >>> relase, considering the importance of the improvements.
>>>>> > >>>
>>>>> > >>> I propose the following bugfixes for the release as
well (already
>>>>> > >>> created corresponding PR's):
>>>>> > >>>
>>>>> > >>> Fixed __setattr__ method of _MXClassPropertyMetaClass
(v1.3.x)
>>>>> > >>> https://github.com/apache/incubator-mxnet/pull/13157
>>>>> > >>>
>>>>> > >>> fixed symbols naming in RNNCell, LSTMCell, GRUCell
(v1.3.x)
>>>>> > >>> https://github.com/apache/incubator-mxnet/pull/13158
>>>>> > >>>
>>>>> > >>> We will be starting to merge the PR's shortly. If are
no more
>>>>> proposals
>>>>> > >>> for backporting I would consider the list as set.
>>>>> > >>>
>>>>> > >>> Best
>>>>> > >>> Anton
>>>>> > >>>
>>>>> > >>> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha <szha.pvg@gmail.com>:
>>>>> > >>>
>>>>> > >>>> Hi Anton,
>>>>> > >>>>
>>>>> > >>>> I hear your concern about a simultaneous 1.4.0
release and it
>>>>> > certainly
>>>>> > >>>> is a valid one.
>>>>> > >>>>
>>>>> > >>>> Regarding the release, let’s agree on the language
first.
>>>>> According to
>>>>> > >>>> semver.org, 1.3.1 release is considered patch release,
which
>>>>> is for
>>>>> > >>>> backward compatible bug fixes, while 1.4.0 release
is
>>>>> considered minor
>>>>> > >>>> release, which is for backward compatible new features.
A major
>>>>> > release
>>>>> > >>>> would mean 2.0.
>>>>> > >>>>
>>>>> > >>>> The three PRs suggested by Haibin and Lin are all
introducing
>>>>> new
>>>>> > >>>> features. If they go into a patch release, it would
require an
>>>>> > exception
>>>>> > >>>> accepted by the community. Also, if other violation
happens it
>>>>> could
>>>>> > be
>>>>> > >>>> ground for declining a release during votes.
>>>>> > >>>>
>>>>> > >>>> -sz
>>>>> > >>>>
>>>>> > >>>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov
<
>>>>> mechernov@gmail.com>
>>>>> > >>>> wrote:
>>>>> > >>>> >
>>>>> > >>>> > [MXNET-1179] Enforce deterministic algorithms
in convolution
>>>>> layers
>>>>> > >>>>
>>>>> > >>>
>>>>> >
>>>>>
>>>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message