mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Olivier <cjolivie...@gmail.com>
Subject Re: Improving and rationalizing unit tests
Date Mon, 16 Oct 2017 16:45:58 GMT
My argument is that I am actually categorically against having a
requirement that the same input values be used for testing for every run.

I don't personally view "convenience in reproducing" as outweighing
"finding edge cases that I didn't think of or that haven't been tried
before".

On Mon, Oct 16, 2017 at 9:34 AM, Pedro Larroy <pedro.larroy.lists@gmail.com>
wrote:

> It's always going to be deterministic one way or another unless you use
> random from the entropy pool such as /dev/random. I don't think it's a good
> practice not to seed properly and have values depend on execution order /
> parallelism / time or whatever, but that's just my opinion. I would want to
> use the same values for all test runs for reproducibility.
>
> I think your argument goes more towards the previously mentioned "property
> based testing" approach, which is in the spirit of what you are supporting,
> if I'm not mistaken.
>
> On Mon, Oct 16, 2017 at 6:22 PM, Chris Olivier <cjolivier01@gmail.com>
> wrote:
>
> > My take on the suggestion of purely deterministic inputs is (including
> > deterministic seeding):
> >
> > "I want the same values to be used for all test runs because it is
> > inconvenient when a unit test fails for some edge cases.  I prefer that
> > unforseen edge case failures occur in the field and not during testing".
> >
> > Is this the motivation?  Seems strange to me.
> >
> >
> > On Mon, Oct 16, 2017 at 9:09 AM, Pedro Larroy <
> > pedro.larroy.lists@gmail.com>
> > wrote:
> >
> > > I think using a properly seeded and initialized (pseudo)random is
> > actually
> > > beneficial (and deterministic), handpicked examples are usually too
> > > simplistic and miss corner cases.
> > >
> > > Better yet is to use property based testing, which will pick corner
> cases
> > > and do fuzzing automatically to check with high degree of confidence
> > that a
> > > testing condition holds.
> > >
> > > Probably it would be good if we use a property based testing library in
> > > adition to nose to check invariants.
> > >
> > > A quick googling yields this one for python for example:
> > > https://hypothesis.readthedocs.io/en/latest/quickstart.html does
> anyone
> > > have experience or can recommend a nice property based testing library
> > for
> > > python?
> > >
> > >
> > > Regards
> > >
> > > On Mon, Oct 16, 2017 at 4:56 PM, Bhavin Thaker <bhavinthaker@gmail.com
> >
> > > wrote:
> > >
> > > > I agree with Pedro.
> > > >
> > > > Based on various observations on unit test failures, I would like to
> > > > propose a few guidelines to follow for the unit tests. Even though I
> > use
> > > > the word, “must” for my humble opinions below, please feel free to
> > > suggest
> > > > alternatives or modifications to these guidelines:
> > > >
> > > > 1) 1a) Each unit test must have a run time budget <= X minutes. Say,
> X
> > =
> > > 2
> > > > minutes max.
> > > > 1b) The total run time budget for all unit tests <= Y minutes. Say,
> Y =
> > > 60
> > > > minutes max.
> > > >
> > > > 2) All Unit tests must have deterministic (not Stochastic) behavior.
> > That
> > > > is, instead of using the random() function to test a range of input
> > > values,
> > > > each input test value must be carefully hand-picked to represent the
> > > > commonly used input scenarios. The correct place to stochastically
> test
> > > > random input values is to have continuously running nightly tests and
> > NOT
> > > > the sanity/smoke/unit tests for each PR.
> > > >
> > > > 3) All Unit tests must be as much self-contained and independent of
> > > > external components as possible. For example, datasets required for
> the
> > > > unit test must NOT be present on external website which, if
> > unreachable,
> > > > can cause test run failures. Instead, all datasets must be available
> > > > locally.
> > > >
> > > > 4) It is impossible to test everything in unit tests and so only
> common
> > > > use-cases and code-paths must be tested in unit-tests. Less common
> > > > scenarios like integration with 3rd party products must be tested in
> > > > nightly/weekly tests.
> > > >
> > > > 5) A unit test must NOT be disabled on a failure unless proven to
> > exhibit
> > > > unreliable behavior. The burden-of-proof for a test failure must be
> on
> > > the
> > > > PR submitter and the PR must NOT be merged without a opening a new
> > github
> > > > issue explaining the problem. If the unit test is disabled for some
> > > reason,
> > > > then the unit test must NOT be removed from the unit tests list;
> > instead
> > > > the unit test must be modified to add the following lines at the
> start
> > of
> > > > the test:
> > > >     Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
> > > >     Exit(0)
> > > >
> > > > Please suggest modifications to the above proposal such that we can
> > make
> > > > the unit tests framework to be the rock-solid foundation for the
> active
> > > > development of Apache MXNet (Incubating).
> > > >
> > > > Regards,
> > > > Bhavin Thaker.
> > > >
> > > >
> > > > On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <
> > > pedro.larroy.lists@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > Some of the unit tests are extremely costly in terms of memory and
> > > > compute.
> > > > >
> > > > > As an example in the gluon tests we are loading all the datasets.
> > > > >
> > > > > test_gluon_data.test_datasets
> > > > >
> > > > > Also running huge networks like resnets in test_gluon_model_zoo.
> > > > >
> > > > > This is ridiculously slow, and straight impossible on some
> embedded /
> > > > > memory constrained devices, and anyway is making tests run for
> longer
> > > > than
> > > > > needed.
> > > > >
> > > > > Unit tests should be small, self contained, if possible pure
> > (avoiding
> > > > this
> > > > > kind of dataset IO if possible).
> > > > >
> > > > > I think it would be better to split them in real unit tests and
> > > extended
> > > > > integration test suites that do more intensive computation. This
> > would
> > > > also
> > > > > help with the feedback time with PRs and CI infrastructure.
> > > > >
> > > > >
> > > > > Thoughts?
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message