mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco de Abreu <marco.g.ab...@googlemail.com>
Subject Re: Problems with test_sparse_operator.test_sparse_mathematical_core
Date Fri, 11 May 2018 09:05:37 GMT
Thanks Haibin, it was in fact related to scipy. The latest version 1.1.0
causes this test to fail. Switching back to 1.0.1 makes the test pass. I
have created an issue at [1].

I'm not familiar with Scipy and don't really know where to start for
creating a fix. For now, I have submitted a PR [2] to pin the version to
1.0.1. None the less, it would be good if we could get this fixed. Could
somebody assist me here?

Best regards,
Marco

[1]: https://github.com/apache/incubator-mxnet/issues/10901
[2]: https://github.com/apache/incubator-mxnet/pull/10902



On Wed, May 9, 2018 at 11:08 PM, Marco de Abreu <
marco.g.abreu@googlemail.com> wrote:

> Hi Haibin,
>
> auto scaling is currently not enabled on MXNet Apache CI. This only
> happens on my test environment. Thanks for the hint with Scipy, I will
> definitely look into this!
>
> That's a good idea. I have spoken to Steffen in the last days and we
> brainstormed some ideas how to handle test failures. We will let the
> community know if we have a more detailed plan.
>
> Best regards,
> Marco
>
> On Wed, May 9, 2018 at 7:19 PM, Haibin Lin <haibin.lin.aws@gmail.com>
> wrote:
>
>> Hi Marco,
>>
>> Is auto scaling already enabled on mxnet apache CI, or this is only
>> happens
>> on your setup? I see the test is using scipy. Do both environments have
>> the
>> same version of scipy installed?
>>
>> I recently see lots of test failures on mxnet master. One thing on my wish
>> list is a database which stores all the occurrences of test failures and
>> their commit ids, which would be very helpful for initial diagnosing what
>> code changes potentially introduced bugs. Otherwise clicking all past
>> tests
>> and reading those logs requires a lot of manual work.
>>
>> Best,
>> Haibin
>>
>> On Wed, May 9, 2018 at 5:32 AM, Marco de Abreu <
>> marco.g.abreu@googlemail.com
>> > wrote:
>>
>> > Hello,
>> >
>> > I'm currently working on auto scaling and encountering a consistent test
>> > failure on CPU. At the moment, I'm not really sure what's causing this,
>> > considering the setup should be identical.
>> >
>> > http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/
>> > incubator-mxnet/detail/ci-master/557/pipeline/694
>> >
>> > ======================================================================
>> >
>> > FAIL: test_sparse_operator.test_sparse_mathematical_core
>> >
>> > ----------------------------------------------------------------------
>> >
>> > Traceback (most recent call last):
>> >
>> >   File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line
>> 198, in
>> > runTest
>> >
>> >     self.test(*self.arg)
>> >
>> >   File "/work/mxnet/tests/python/unittest/common.py", line 157, in
>> > test_new
>> >
>> >     orig_test(*args, **kwargs)
>> >
>> >   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py",
>> line
>> > 1084, in test_sparse_mathematical_core
>> >
>> >     density=density, ograd_density=ograd_density)
>> >
>> >   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py",
>> line
>> > 1056, in check_mathematical_core
>> >
>> >     density=density, ograd_density=ograd_density)
>> >
>> >   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py",
>> line
>> > 698, in check_sparse_mathematical_core
>> >
>> >     assert_almost_equal(arr_grad, input_grad, equal_nan=True)
>> >
>> >   File "/work/mxnet/python/mxnet/test_utils.py", line 493, in
>> > assert_almost_equal
>> >
>> >     raise AssertionError(msg)
>> >
>> > AssertionError:
>> >
>> > Items are not equal:
>> >
>> > Error nan exceeds tolerance rtol=0.000010, atol=0.000000.  Location of
>> > maximum error:(0, 0), a=inf, b=-inf
>> >
>> >  a: array([[inf],
>> >
>> >        [inf],
>> >
>> >        [inf],...
>> >
>> >  b: array([[-inf],
>> >
>> >        [-inf],
>> >
>> >        [-inf],...
>> >
>> > -------------------- >> begin captured stdout << ---------------------
>> >
>> > pass 0
>> >
>> > 0.0, 0.0, False
>> >
>> > --------------------- >> end captured stdout << ----------------------
>> >
>> > -------------------- >> begin captured logging << --------------------
>> >
>> > common: INFO: Setting test np/mx/python random seeds, use
>> > MXNET_TEST_SEED=2103230797 to reproduce.
>> >
>> > --------------------- >> end captured logging << ---------------------
>> >
>> >
>> > Does this ring any bells?
>> >
>> > Thanks in advance!
>> >
>> > -Marco
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message