mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kellen sunderland <kellen.sunderl...@gmail.com>
Subject Re: Improving and rationalizing unit tests
Date Mon, 16 Oct 2017 20:28:33 GMT
Interesting package Sheng, but I think one problem with that approach is
that tests are already taking an extremely long time to run.  Re-running
tests that fail could make it even more difficult to merge PRs.

On Mon, Oct 16, 2017 at 8:45 PM, Zha, Sheng <zhasheng@amazon.com> wrote:

> Here’s a package that may help us on flaky tests:
> https://pypi.python.org/pypi/flaky. It can retry tests that are marked
> flaky and can pass or fail based on specified threshold. In the short term
> we can use this to pass tests that are not 100% reliable.
>
> Best regards,
> -sz
>
> On 10/16/17, 10:32 AM, "kellen sunderland" <kellen.sunderland@gmail.com>
> wrote:
>
>     I think you’ve covered the pros/cons of having determinism in your
> tests.  It seems like a potential maintenance effort versus forced code
> robustness argument to me.  I’d suggest you have a separate vote on this
> topic.
>
>     For me the main topic that should be focused on is making the CI
> system fast and stable for PR builds.  I think the best road to doing this,
> as previously suggested, is to segment tests and move those that are long
> running, fail occasionally, or require external components into a
> nightly/weekly test.
>
>     I would also propose that:
>
>     6) Test fixtures are created to test a subset of functionality, and
> that we don’t have test fixtures like test_operator.py that are nearly 5000
> lines long, and take 20 minutes to run.  There’s a few advantages to
> breaking these tests into smaller files:
>
>     We will have fewer merge conflicts, because fewer people will be
> editing the same test files across PRs.  Debugging issues with tests will
> become easier, as hopefully there will be less potential side effects
> between tests (this does happen now). We may be a little more confident
> that the tests run independently, eventually meaning that we could run them
> in parallel more easily, which would reduce test run latency time (but not
> throughput).  Last, we will be able to disable tests at some convenient
> level of granularity, for example when running on IoT devices, or without
> OpenCV.  At the moment we’d have to disable tests individually.
>
>     7) We cleanup tests that are no longer needed:
>
>     I’ve personally found it quite unintuitive in MXNet to discover which
> tests are actually needed, where they are run, how often, etc.  Are the
> nightly tests actually being run nightly?  Are the cpp tests run, why is
> the Travis CI folder still there, what is the difference between ci_build
> folder and the Jenkins folder, etc.  If we’re going to take a look at
> revamping the tests folder I’d recommend we clean up the folder structure a
> bit, and delete the non-relevant files to make it easier for newcomers to
> know what’s happening.  We’ll always have these files for reference in
> source control.
>
>     -Kellen
>
>     From: Chris Olivier
>     Sent: Monday, October 16, 2017 6:46 PM
>     To: dev@mxnet.incubator.apache.org
>     Subject: Re: Improving and rationalizing unit tests
>
>     My argument is that I am actually categorically against having a
>     requirement that the same input values be used for testing for every
> run.
>
>     I don't personally view "convenience in reproducing" as outweighing
>     "finding edge cases that I didn't think of or that haven't been tried
>     before".
>
>     On Mon, Oct 16, 2017 at 9:34 AM, Pedro Larroy <
> pedro.larroy.lists@gmail.com>
>     wrote:
>
>     > It's always going to be deterministic one way or another unless you
> use
>     > random from the entropy pool such as /dev/random. I don't think it's
> a good
>     > practice not to seed properly and have values depend on execution
> order /
>     > parallelism / time or whatever, but that's just my opinion. I would
> want to
>     > use the same values for all test runs for reproducibility.
>     >
>     > I think your argument goes more towards the previously mentioned
> "property
>     > based testing" approach, which is in the spirit of what you are
> supporting,
>     > if I'm not mistaken.
>     >
>     > On Mon, Oct 16, 2017 at 6:22 PM, Chris Olivier <
> cjolivier01@gmail.com>
>     > wrote:
>     >
>     > > My take on the suggestion of purely deterministic inputs is
> (including
>     > > deterministic seeding):
>     > >
>     > > "I want the same values to be used for all test runs because it is
>     > > inconvenient when a unit test fails for some edge cases.  I prefer
> that
>     > > unforseen edge case failures occur in the field and not during
> testing".
>     > >
>     > > Is this the motivation?  Seems strange to me.
>     > >
>     > >
>     > > On Mon, Oct 16, 2017 at 9:09 AM, Pedro Larroy <
>     > > pedro.larroy.lists@gmail.com>
>     > > wrote:
>     > >
>     > > > I think using a properly seeded and initialized (pseudo)random is
>     > > actually
>     > > > beneficial (and deterministic), handpicked examples are usually
> too
>     > > > simplistic and miss corner cases.
>     > > >
>     > > > Better yet is to use property based testing, which will pick
> corner
>     > cases
>     > > > and do fuzzing automatically to check with high degree of
> confidence
>     > > that a
>     > > > testing condition holds.
>     > > >
>     > > > Probably it would be good if we use a property based testing
> library in
>     > > > adition to nose to check invariants.
>     > > >
>     > > > A quick googling yields this one for python for example:
>     > > > https://hypothesis.readthedocs.io/en/latest/quickstart.html does
>     > anyone
>     > > > have experience or can recommend a nice property based testing
> library
>     > > for
>     > > > python?
>     > > >
>     > > >
>     > > > Regards
>     > > >
>     > > > On Mon, Oct 16, 2017 at 4:56 PM, Bhavin Thaker <
> bhavinthaker@gmail.com
>     > >
>     > > > wrote:
>     > > >
>     > > > > I agree with Pedro.
>     > > > >
>     > > > > Based on various observations on unit test failures, I would
> like to
>     > > > > propose a few guidelines to follow for the unit tests. Even
> though I
>     > > use
>     > > > > the word, “must” for my humble opinions below, please feel
> free to
>     > > > suggest
>     > > > > alternatives or modifications to these guidelines:
>     > > > >
>     > > > > 1) 1a) Each unit test must have a run time budget <= X
> minutes. Say,
>     > X
>     > > =
>     > > > 2
>     > > > > minutes max.
>     > > > > 1b) The total run time budget for all unit tests <= Y minutes.
> Say,
>     > Y =
>     > > > 60
>     > > > > minutes max.
>     > > > >
>     > > > > 2) All Unit tests must have deterministic (not Stochastic)
> behavior.
>     > > That
>     > > > > is, instead of using the random() function to test a range of
> input
>     > > > values,
>     > > > > each input test value must be carefully hand-picked to
> represent the
>     > > > > commonly used input scenarios. The correct place to
> stochastically
>     > test
>     > > > > random input values is to have continuously running nightly
> tests and
>     > > NOT
>     > > > > the sanity/smoke/unit tests for each PR.
>     > > > >
>     > > > > 3) All Unit tests must be as much self-contained and
> independent of
>     > > > > external components as possible. For example, datasets
> required for
>     > the
>     > > > > unit test must NOT be present on external website which, if
>     > > unreachable,
>     > > > > can cause test run failures. Instead, all datasets must be
> available
>     > > > > locally.
>     > > > >
>     > > > > 4) It is impossible to test everything in unit tests and so
> only
>     > common
>     > > > > use-cases and code-paths must be tested in unit-tests. Less
> common
>     > > > > scenarios like integration with 3rd party products must be
> tested in
>     > > > > nightly/weekly tests.
>     > > > >
>     > > > > 5) A unit test must NOT be disabled on a failure unless proven
> to
>     > > exhibit
>     > > > > unreliable behavior. The burden-of-proof for a test failure
> must be
>     > on
>     > > > the
>     > > > > PR submitter and the PR must NOT be merged without a opening
a
> new
>     > > github
>     > > > > issue explaining the problem. If the unit test is disabled for
> some
>     > > > reason,
>     > > > > then the unit test must NOT be removed from the unit tests
> list;
>     > > instead
>     > > > > the unit test must be modified to add the following lines at
> the
>     > start
>     > > of
>     > > > > the test:
>     > > > >     Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
>     > > > >     Exit(0)
>     > > > >
>     > > > > Please suggest modifications to the above proposal such that
> we can
>     > > make
>     > > > > the unit tests framework to be the rock-solid foundation for
> the
>     > active
>     > > > > development of Apache MXNet (Incubating).
>     > > > >
>     > > > > Regards,
>     > > > > Bhavin Thaker.
>     > > > >
>     > > > >
>     > > > > On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <
>     > > > pedro.larroy.lists@gmail.com
>     > > > > >
>     > > > > wrote:
>     > > > >
>     > > > > > Hi
>     > > > > >
>     > > > > > Some of the unit tests are extremely costly in terms of
> memory and
>     > > > > compute.
>     > > > > >
>     > > > > > As an example in the gluon tests we are loading all the
> datasets.
>     > > > > >
>     > > > > > test_gluon_data.test_datasets
>     > > > > >
>     > > > > > Also running huge networks like resnets in
> test_gluon_model_zoo.
>     > > > > >
>     > > > > > This is ridiculously slow, and straight impossible on some
>     > embedded /
>     > > > > > memory constrained devices, and anyway is making tests run
> for
>     > longer
>     > > > > than
>     > > > > > needed.
>     > > > > >
>     > > > > > Unit tests should be small, self contained, if possible
pure
>     > > (avoiding
>     > > > > this
>     > > > > > kind of dataset IO if possible).
>     > > > > >
>     > > > > > I think it would be better to split them in real unit tests
> and
>     > > > extended
>     > > > > > integration test suites that do more intensive computation.
> This
>     > > would
>     > > > > also
>     > > > > > help with the feedback time with PRs and CI infrastructure.
>     > > > > >
>     > > > > >
>     > > > > > Thoughts?
>     > > > > >
>     > > > >
>     > > >
>     > >
>     >
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message