mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Chernov <mecher...@gmail.com>
Subject Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release
Date Sat, 10 Nov 2018 10:59:15 GMT
Due to various problems we had to postpone the tagging and vote for the
release till Monday, the 12th of November 2018.

Following change has been updated and waiting to be merged:

Disable flaky test test_operator.test_dropout (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13200

Indeed the MACOS tests timed out as well for the branch. The proposed
change contains thus only the build:

[MXNET-908] Enable minimal OSX Travis build (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13179


Best
Anton

пт, 9 нояб. 2018 г. в 13:11, Anton Chernov <mechernov@gmail.com>:

> I created the following PR to disable the test:
>
> Disable flaky test test_operator.test_dropout (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13200
>
> The second failure I suppose is related to:
>
> distributed kvstore bug in MXNet
> https://github.com/apache/incubator-mxnet/issues/12713
>
> Which partially was fixed by
>
> Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13121
>
> But another part of the issue is still open and does not have a fix yet:
>
> "When distributed kvstore is used, by default gluon.Trainer doesn't work
> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be more
> specific, the trainer updates once per GPU, the LRScheduler object is
> shared across GPUs and get a wrong update count."
>
>
> Best
> Anton
>
>
> пт, 9 нояб. 2018 г. в 11:48, Anton Chernov <mechernov@gmail.com>:
>
>> In case the tests for MACOS will time out as well we can disable them and
>> keep at least the build stage as in:
>>
>> Disable travis tests
>> https://github.com/apache/incubator-mxnet/pull/13137
>>
>> Best
>> Anton
>>
>> пт, 9 нояб. 2018 г. в 11:17, Anton Chernov <mechernov@gmail.com>:
>>
>>>
>>> Hi Naveen,
>>>
>>> I believe that the timeout is not an issue for the branch. And I see
>>> great benefit in having tests for MACOS on the release branch. The travis
>>> build is not blocking anyway, so I don't see any risk in adding it.
>>>
>>> * test_dropout
>>>
>>> Currently, there is a problem with test_dropout that fails consistently
>>> on the branch:
>>>
>>>
>>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/97/pipeline
>>>
>>> Error reported:
>>>
>>> ======================================================================
>>> FAIL: test_operator.test_dropout
>>> ----------------------------------------------------------------------
>>> Traceback (most recent call last):
>>>   File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197,
>>> in runTest
>>>     self.test(*self.arg)
>>>   File
>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py",
>>> line 173, in test_new
>>>     orig_test(*args, **kwargs)
>>>   File
>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
>>> line 5853, in test_dropout
>>>     check_dropout_ratio(0.0, shape)
>>>   File
>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
>>> line 5797, in check_dropout_ratio
>>>     assert exe.outputs[0].asnumpy().min() == min_value
>>> AssertionError:
>>> -------------------- >> begin captured logging << --------------------
>>> common: INFO: Setting test np/mx/python random seeds, use
>>> MXNET_TEST_SEED=428273587 to reproduce.
>>> --------------------- >> end captured logging << ---------------------
>>>
>>> The test is enabled on master:
>>>
>>> Re-enables test_operator.test_dropout
>>> https://github.com/apache/incubator-mxnet/pull/12717
>>>
>>> And there are no failures for it [1].
>>>
>>> * KVStore tests
>>>
>>> Unfortunately, KVStore tests fail as well.
>>>
>>>
>>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/96/pipeline
>>>
>>> Error reported:
>>>
>>> AssertionError
>>> test_gluon_trainer_type()
>>>     assert trainer._update_on_kvstore is update_on_kv\
>>>   File "dist_sync_kvstore.py", line 388, in test_gluon_trainer_type
>>>
>>> If nobody has a fix for these issues, I will disable the tests and add
>>> information to the known issues section.
>>>
>>> Best
>>> Anton
>>>
>>> [1]
>>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/master/
>>>
>>> чт, 8 нояб. 2018 г. в 21:44, Naveen Swamy <mnnaveen@gmail.com>:
>>>
>>>> Anton, I don't think we need to add the Mac OS tests for 1.3.1 branch
>>>> since
>>>> travis CI is timing out and creates blockers, it also did not exist for
>>>> v1.3.0.
>>>>
>>>>
>>>> On Thu, Nov 8, 2018 at 10:04 AM Anton Chernov <mechernov@gmail.com>
>>>> wrote:
>>>>
>>>> > A PR to fix the tests:
>>>> >
>>>> > Remove test for non existing index copy operator (v1.3.x)
>>>> > https://github.com/apache/incubator-mxnet/pull/13180
>>>> >
>>>> >
>>>> > Best
>>>> > Anton
>>>> >
>>>> > чт, 8 нояб. 2018 г. в 10:05, Anton Chernov <mechernov@gmail.com>:
>>>> >
>>>> > > An addition has been made to include MacOS tests for the v1.3.x
>>>> branch:
>>>> > >
>>>> > > [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
>>>> > > https://github.com/apache/incubator-mxnet/pull/13179
>>>> > >
>>>> > > It includes following PR's for master:
>>>> > >
>>>> > > [MXNET-908] Enable minimal OSX Travis build
>>>> > > https://github.com/apache/incubator-mxnet/pull/12462
>>>> > >
>>>> > > [MXNET-908] Enable python tests in Travis
>>>> > > https://github.com/apache/incubator-mxnet/pull/12550
>>>> > >
>>>> > > [MXNET-968] Fix MacOS python tests
>>>> > > https://github.com/apache/incubator-mxnet/pull/12590
>>>> > >
>>>> > >
>>>> > > Best
>>>> > > Anton
>>>> > >
>>>> > >
>>>> > > чт, 8 нояб. 2018 г. в 9:38, Anton Chernov <mechernov@gmail.com>:
>>>> > >
>>>> > >> Thank you everyone for your support and suggestions. All proposed
>>>> PR's
>>>> > >> have been merged. We will tag the release candidate and start
the
>>>> vote
>>>> > on
>>>> > >> Friday, the 9th of November 2018.
>>>> > >>
>>>> > >> Unfortunately after the merges the tests started to fail:
>>>> > >>
>>>> > >>
>>>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/
>>>> > >>
>>>> > >> I will look into the failures, but any help as usual is very
>>>> > appreciated.
>>>> > >>
>>>> > >> The nightly tests are fine:
>>>> > >> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/
>>>> > >>
>>>> > >>
>>>> > >> Best
>>>> > >> Anton
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov <mechernov@gmail.com>:
>>>> > >>
>>>> > >>> Yes, you are right about the versions wording, thanks for
>>>> > clarification.
>>>> > >>>
>>>> > >>> A performance improvement can be considered a bugfix as
well. I
>>>> see no
>>>> > >>> big risks in including PR's by Haibin and Lin into the
patch
>>>> release.
>>>> > >>>
>>>> > >>> @Haibin, if you can reopen the PR's they should be good
to go for
>>>> the
>>>> > >>> relase, considering the importance of the improvements.
>>>> > >>>
>>>> > >>> I propose the following bugfixes for the release as well
(already
>>>> > >>> created corresponding PR's):
>>>> > >>>
>>>> > >>> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
>>>> > >>> https://github.com/apache/incubator-mxnet/pull/13157
>>>> > >>>
>>>> > >>> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
>>>> > >>> https://github.com/apache/incubator-mxnet/pull/13158
>>>> > >>>
>>>> > >>> We will be starting to merge the PR's shortly. If are no
more
>>>> proposals
>>>> > >>> for backporting I would consider the list as set.
>>>> > >>>
>>>> > >>> Best
>>>> > >>> Anton
>>>> > >>>
>>>> > >>> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha <szha.pvg@gmail.com>:
>>>> > >>>
>>>> > >>>> Hi Anton,
>>>> > >>>>
>>>> > >>>> I hear your concern about a simultaneous 1.4.0 release
and it
>>>> > certainly
>>>> > >>>> is a valid one.
>>>> > >>>>
>>>> > >>>> Regarding the release, let’s agree on the language
first.
>>>> According to
>>>> > >>>> semver.org, 1.3.1 release is considered patch release,
which is
>>>> for
>>>> > >>>> backward compatible bug fixes, while 1.4.0 release
is considered
>>>> minor
>>>> > >>>> release, which is for backward compatible new features.
A major
>>>> > release
>>>> > >>>> would mean 2.0.
>>>> > >>>>
>>>> > >>>> The three PRs suggested by Haibin and Lin are all introducing
new
>>>> > >>>> features. If they go into a patch release, it would
require an
>>>> > exception
>>>> > >>>> accepted by the community. Also, if other violation
happens it
>>>> could
>>>> > be
>>>> > >>>> ground for declining a release during votes.
>>>> > >>>>
>>>> > >>>> -sz
>>>> > >>>>
>>>> > >>>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov <mechernov@gmail.com
>>>> >
>>>> > >>>> wrote:
>>>> > >>>> >
>>>> > >>>> > [MXNET-1179] Enforce deterministic algorithms
in convolution
>>>> layers
>>>> > >>>>
>>>> > >>>
>>>> >
>>>>
>>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message