mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steffen Rochel <steffenroc...@gmail.com>
Subject Re: PR validation and runtime of CI
Date Fri, 08 Jun 2018 01:53:37 GMT
Agree, we need to get serious about reliable fixing and re-enabling the
tests.
Mapping from test to folders might be a good enough approximation for mxnet
repo. In general you would have to trace code - test dependencies.
Steffen

On Thu, Jun 7, 2018 at 6:48 PM Marco de Abreu <marco.g.abreu@googlemail.com>
wrote:

> We already have GitHub issues for most of the flaky tests. Even for some
> that have been disabled for almost half a year - they have never been
> re-enabled and thus we still lack coverage.
>
> I think I have an idea how to do it. I will check if it actually works and
> then provide a small POC. Basically a mapping from test to folders - if a
> specific folder has been updated, run the test. I have checked if there are
> some ready made solutions but haven't found anything useful.
>
> -Marco
>
> Steffen Rochel <steffenrochel@gmail.com> schrieb am Fr., 8. Juni 2018,
> 03:43:
>
> > I support to create github/Jira for flaky tests and disable for now.
> > However, we need to get serious and prioritize fixing the disabled tests.
> > Making PR checks smart and test only code impacted by change is a good
> > idea, anybody has experience with tools enabling smart validation?
> >
> > I'm concerned about moving tests from PR check to nightly. We should
> review
> > which are long running tests and evaluate options to reduce runtime. We
> > must prevent bad code to get into the repo through automated process as
> it
> > will be cumbersome to find out which PR cause failures or performance
> > degradation in nighty regression. Unless we get code coverage
> measurements
> > in place it is difficult to decide which tests are redundant and can
> either
> > be removed or moved to nightly.
> > Is the PR validation turn around time the bottleneck? I see a number of
> PR
> > waiting for reviews.
> > Steffen
> >
> > On Thu, Jun 7, 2018 at 6:00 PM Naveen Swamy <mnnaveen@gmail.com> wrote:
> >
> > > Sorry, I missed reading that Pedro was asking to move the tests that
> run
> > > training. I agree with that.
> > >
> > > Additionally we should make the CI smart as I mentioned above.
> > >
> > > -Naveen
> > >
> > >
> > > On Thu, Jun 7, 2018 at 3:59 PM, Naveen Swamy <mnnaveen@gmail.com>
> wrote:
> > >
> > > > -1 for moving to nightly. I think that would be detrimental.
> > > >
> > > > We have to make our CI a little more smart and only build required
> > > > components and not build all components to reduce cost and the time
> it
> > > > takes to run CI. A Scala build need not build everything and run
> tests
> > > > related to Python, etc.,
> > > >
> > > > Thanks, Naveen
> > > >
> > > > On Thu, Jun 7, 2018 at 9:57 AM, Marco de Abreu <
> > > > marco.g.abreu@googlemail.com> wrote:
> > > >
> > > >> Thanks a lot for our input, Thomas! You are right, 3h are only hit
> if
> > > >> somebody makes changes in their Dockerfiles and thus every node has
> to
> > > >> rebuild their containers - but this is expected and inevitable.
> > > >>
> > > >> So far there have not been any big attempts to resolve the number
of
> > > flaky
> > > >> tests. We had a few people fixing some tests (and that's very
> > > >> appreciated!!!), but it feels like we're introducing more than we
> can
> > > fix.
> > > >> I'd definitely love a flaky-test-bash-week and proposed it a few
> > times,
> > > >> but
> > > >> there was no success so far unfortunately.
> > > >>
> > > >> We will definitely not drop any platforms. What we will probably do
> > > after
> > > >> the nightly tests are in place, is move some things like CentOS or
> > > >> overlapping Python2/Python3 tests to nightly. We don't need to test
> > > >> python2
> > > >> and python3 compatibility half a dozen times on different platforms
> > for
> > > >> every commit.
> > > >>
> > > >> I've been thinking about merging the integration and unit test state
> > and
> > > >> I'm pretty tempted to do it. My only concern so far was the
> increased
> > > >> cost.
> > > >> I expect the nightly tests to be in place in about 2 weeks. I'd
> > propose
> > > we
> > > >> wait until then and then revisit which runs are actually required
> for
> > > >> every
> > > >> PR and which ones can be moved. During that process, we will
> probably
> > > >> consolidate a lot of tests and put them into one stage.
> > > >>
> > > >> But I agree, past has shown that disabling tests did only mask the
> > > problem
> > > >> and won't get them fixed. Also, quite a lot of failures have proven
> to
> > > be
> > > >> actual bugs in our code. So from a customer perspective, we should
> > > >> actually
> > > >> give these failures a high priority. I hope they will get into the
> > > >> spotlight after I provide the proper statistics.
> > > >>
> > > >> -Marco
> > > >>
> > > >>
> > > >> On Thu, Jun 7, 2018 at 6:35 PM Thomas DELTEIL <
> > > thomas.delteil1@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Thanks for bringing the issue of CI stability!
> > > >> >
> > > >> > However I disagree with some points in this thread:
> > > >> >
> > > >> > - "We are at approximately 3h for a full successful run."
> > > >> > => Looking at Jenkins I see the last successful runs oscillating
> > > between
> > > >> > 1h53 and 2h42 with a mean that seems to be at 2h20. Or are you
> > talking
> > > >> > about something different than the jenkins CI run?
> > > >> >
> > > >> > - "For this I propose working towards moving tests from CI to
> > nightly,
> > > >> > specially
> > > >> > the ones that take most time or do black box testing with full
> > > training
> > > >> of
> > > >> > models. And addressing flaky tests by either fixing them or
> > *disabling
> > > >> > them." *
> > > >> > => Is there any evidence that some serious effort has been
spent
> > > trying
> > > >> to
> > > >> > fix the flaky tests? I know Sheng and Marco have worked to
> > > >> consolidating a
> > > >> > list of Flaky tests, but I think simply disabling tests will
just
> > make
> > > >> the
> > > >> > platform weaker. Let's organize a flaky test week where we each
> take
> > > on
> > > >> a
> > > >> > couple of these flaky tests and hopefully we should make good
> > progress
> > > >> > towards stabilizing the CI.
> > > >> >
> > > >> > -"I'd like to disable flaky tests until they're fixed."
> > > >> > => Wishful thinking IMO, we know this never happens, if we
can't
> > make
> > > >> time
> > > >> > now to fix them, we'll never go back and fix them.
> > > >> >
> > > >> > "I would want a turnaround time of less than 30 minutes and 0%
> > failure
> > > >> rate
> > > >> > on master."
> > > >> >  => With current timing, this means barely finishing the build
> step.
> > > Do
> > > >> we
> > > >> > propose dropping some platforms for building?
> > > >> >
> > > >> > I agree with some points:
> > > >> >
> > > >> > "Won't we end up in the same situation with so many flaky tests?"
> =>
> > > >> pretty
> > > >> > sure it will
> > > >> > "This could be set to 100% for nightly, for example."[for the
> > release]
> > > >> =>
> > > >> > That would be a given to me
> > > >> > "I'm also currently working on a system that tracks all test
> > failures,
> > > >> so
> > > >> > this will also cover nightly tests. This will give us actionable
> > data
> > > "
> > > >> =>
> > > >> > Awesome, that would be great to have data on that to help
> prioritize
> > > >> what
> > > >> > to fix!
> > > >> >
> > > >> > I personally think if we disable most tests and move them to
> nightly
> > > >> tests,
> > > >> > we will decrease the trust and stability of the platform and
it
> > leaves
> > > >> the
> > > >> > door open to conflicting changes creating hard to debug failures.
> I
> > > >> think
> > > >> > the biggest potential win here is reducing test flakiness. That's
> > the
> > > >> one
> > > >> > that is killing the productivity, we can redesign the test
> pipeline
> > to
> > > >> run
> > > >> > integration and unit test in parallel and that would give us
> > straight
> > > >> away
> > > >> > a 30 minutes reduced time in the CI run. Then we'd be always
at
> <2h
> > > for
> > > >> a
> > > >> > build, which seems reasonable if it never fails for no reason.
> > > >> >
> > > >> > Thomas
> > > >> >
> > > >> > 2018-06-07 8:27 GMT-07:00 Marco de Abreu <
> > > marco.g.abreu@googlemail.com>
> > > >> :
> > > >> >
> > > >> > > Yeah, I think we are at the point at which we have to disable
> > > tests..
> > > >> > >
> > > >> > > If a test fails in nightly, the commit would not be reverted
> since
> > > >> it's
> > > >> > > hard to pin a failure to a specific PR. We will have reporting
> for
> > > >> > failures
> > > >> > > on nightly (they have proven to be stable, so we can enable
it
> > right
> > > >> from
> > > >> > > the beginning). I'm also currently working on a system that
> tracks
> > > all
> > > >> > test
> > > >> > > failures, so this will also cover nightly tests. This will
give
> us
> > > >> > > actionable data which allows us to define acceptance criteria
> for
> > a
> > > >> > > release. E.g. if the test success rate is below X%, a release
> can
> > > not
> > > >> be
> > > >> > > made. This could be set to 100% for nightly, for example.
> > > >> > >
> > > >> > > It would definitely be good if we could determine which
tests
> are
> > > >> > required
> > > >> > > to run and which ones are unnecessary. I don't really like
the
> > flag
> > > in
> > > >> > the
> > > >> > > comment (and also it's hard to integrate). A good idea would
be
> > some
> > > >> > > analytics on the changed file content. If we have this data,
we
> > > could
> > > >> > > easily enable and disable different jobs. Since this behaviour
> is
> > > >> > entirely
> > > >> > > defined in GitHub, I'd like to invite everybody to submit
a PR.
> > > >> > >
> > > >> > > -Marco
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Thu, Jun 7, 2018 at 5:20 PM Aaron Markham <
> > > >> aaron.s.markham@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > I'd like to disable flaky tests until they're fixed.
> > > >> > > > What would the process be for fixing a failure if the
tests
> are
> > > done
> > > >> > > > nightly? Would the commit be reverted? Won't we end
up in the
> > same
> > > >> > > > situation with so many flaky tests?
> > > >> > > >
> > > >> > > > I'd like to see if we can separate the test pipelines
based on
> > the
> > > >> > > content
> > > >> > > > of the commit. I think that md, html, and js updates
should
> fly
> > > >> through
> > > >> > > and
> > > >> > > > not have to go through GPU tests.
> > > >> > > >
> > > >> > > > Maybe some special flag added to the comment?
> > > >> > > > Is this possible?
> > > >> > > >
> > > >> > > >
> > > >> > > > On Wed, Jun 6, 2018 at 10:37 PM, Pedro Larroy <
> > > >> > > > pedro.larroy.lists@gmail.com>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Hi Team
> > > >> > > > >
> > > >> > > > > The time to validate a PR is growing, due to our
number of
> > > >> supported
> > > >> > > > > platforms and increased time spent in testing
and running
> > > >> models.  We
> > > >> > > are
> > > >> > > > > at approximately 3h for a full successful run.
> > > >> > > > >
> > > >> > > > > This is compounded with the failure rate of builds
due to
> > flaky
> > > >> tests
> > > >> > > of
> > > >> > > > > more than 50% which is a big drag in developer
productivity
> if
> > > you
> > > >> > can
> > > >> > > > only
> > > >> > > > > get one or two CI runs to a change per day.
> > > >> > > > >
> > > >> > > > > I would want a turnaround time of less than 30
minutes and
> 0%
> > > >> failure
> > > >> > > > rate
> > > >> > > > > on master.
> > > >> > > > >
> > > >> > > > > For this I propose working towards moving tests
from CI to
> > > >> nightly,
> > > >> > > > > specially the ones that take most time or do black
box
> testing
> > > >> with
> > > >> > > full
> > > >> > > > > training of models. And addressing flaky tests
by either
> > fixing
> > > >> them
> > > >> > or
> > > >> > > > > disabling them.
> > > >> > > > >
> > > >> > > > > I would like to check if there's consensus on
this previous
> > plan
> > > >> so
> > > >> > we
> > > >> > > > are
> > > >> > > > > aligned on pursuing this common goal as a shared
effort.
> > > >> > > > >
> > > >> > > > > Pedro.
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message