reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mariia Mykhailova <mamyk...@microsoft.com.INVALID>
Subject RE: Disable tests with transient failures
Date Wed, 19 Jul 2017 05:56:01 GMT
Markus,
Integration tests are not preventing us from improving our code, and disabling them is definitely
not going to help us improve. We run additional tests on Yarn not because we don’t trust
the results of AppVeyor runs, but because Yarn tests and AppVeyor tests have different coverage,
AppVeyor just can't run Yarn tests.

We are already very good at ignoring CI failures during code review. With all CI and tests
we have in place we managed to break our master build twice that I know of in the past four
months, and several more times only integration tests in CI caught that there is something
wrong (https://github.com/apache/reef/pull/1328 from a week ago is a fresh example - it's
exactly .NET integration tests which are failing, without them the change would look good).
Disabling *all* integration tests will just take us one step further down the road which I'd
rather we not go - admitting that not only our code is so bad that we can't keep a set of
tests from failing but also that we can't be bothered to invest into improving them so we'd
rather go without the tests completely.

Would you really prefer to do a release without tests than a release with some known transient
issues tracked by CI and JIRA issues?

Also, keep in mind adding integration tests one by one ensuring that they are solid takes
an order of magnitude more work than investigating transient failures in the current setup.
With integration tests being part of CI we have statistics of which tests have been unstable
over several months' worth of runs - when I was putting together the list of transient failures,
I could go three months back, see that some tests started or stopped failing around a certain
time etc. With tests disabled, we immediately lose this information, and whoever looks into
the tests next has to gather it manually for every single of our 70 tests, not just the few
failing ones. Disabling integration tests will quite possibly be the end of our .NET integration
tests.

With that said, 
+1 for doing the release with known transient issues - so that your desire to do a release
doesn’t push you to bad engineering decisions, 
-1000 for disabling any tests.

-Mariia

-----Original Message-----
From: Julia Wang (QIUHE) [mailto:Qiuhe.Wang@microsoft.com.INVALID] 
Sent: Tuesday, July 18, 2017 4:28 PM
To: dev@reef.apache.org
Subject: RE: Disable tests with transient failures

I would keep e2e tests, those are important tests that ensure the system is still function.
It will detects issues if we  change the bridge code or contract. It helps us to find any
regression that may be caused from Java side change and doesn't work from .Net. Most unit
tests use mock, they only tests the code path without complete e2e scenario. In the past years,
we added so many e2e functional tests, mainly used to detect regression issues and ensure
we are not badly broken. 

Transit test issues are separate. Depending on log files to verify the result is one issue,
but potentially we have some driver not ends properly issue I think. Recently I got a few
repro in my local test run, when it happened, apparently, the system was still running and
never ended until I stopped it. We should not ignore those issues. However, I also don't think
they must be the blocker of releases. We should allow bugs in a release, as long as the bugs
don't impact the major functionality of the system and we track the bugs. 

Julia

-----Original Message-----
From: Douglas Service [mailto:dsopsrc@gmail.com]
Sent: Tuesday, July 18, 2017 3:18 PM
To: dev@reef.apache.org
Subject: Re: Disable tests with transient failures

I would only disable the ones with transient failures as we need some level of testing.

On Tue, Jul 18, 2017 at 2:31 PM, Markus Weimer <markus@weimo.de> wrote:

> Coming back to the original question: I am actually +1 on disabling
> *all* integration tests in REEF.NET.
>
> Reasoning: The integration tests are flaky, and we don't really know 
> why. It is plausible that our test approach based on log files is to 
> blame for the failures. We don't even trust them and always run 
> HelloREEF or such on a YARN cluster to know whether or not we broke 
> something. At the same time, we know that our current approach to 
> integration tests in REEF.NET is broken and at best allows us to make 
> statements about the local runtime. Meanwhile, the failing tests 
> prevent releases which in turn prevent us from moving on the work 
> towards REEF.NET on Linux, which would broaden the developer base and 
> therefore help with testing. This creates a situation where our bad 
> approach to testing prevents us from making progress towards quality 
> software, which is the opposite of what tests should do.
>
> Hence, I propose the following:
>
>   1. Disable all REEF.NET integration tests. Leave them in the code 
> base for reference, but ignore them when doing CI builds or releases.
>   2. Make a release with zero integration tests of REEF.NET. Be open 
> about the fact that we have no integration tests that convince us or 
> anyone that REEF.NET actually works.
>   3. Add integration tests to REEF.NET, one at a time. This time, we 
> make sure that they are solid and indicative of actual quality of the 
> code.
>
>
> WDYT?
>
> Markus
>
Mime
View raw message