hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Smart Flaky Handler
Date Fri, 20 May 2016 16:25:11 GMT
The system seems to be working nicely Appy. We are getting green precommit
builds for the first time in ages.

Should we change the includes and excludes lists so they have a file type
ending? .txt? Then I could open them easily in the browser. Currently I
have to download them.

Includes are tests that are currently considered 'flakey'?

TestGenerateDelegationToken,TestMobCompactor,TestRegionServerMetrics,TestAcidGuarantees,TestMasterReplication,TestRowProcessorEndpoint,TestAsyncLogRolling,DynamicLogicExpressionSuite,TestMasterFailoverWithProcedures,TestChoreService,TestScannerHeartbeatMessages,TestWALProcedureStore,TestRegionMergeTransactionOnCluster,TestSaslFanOutOneBlockAsyncDFSOutput,TestReplicationEndpointWithMultipleWAL

We have a nice list.

Excludes are:

**/TestGenerateDelegationToken.java,**/TestMobCompactor.java,**/TestRegionServerMetrics.java,**/TestAcidGuarantees.java,**/TestMasterReplication.java,**/TestRowProcessorEndpoint.java,**/TestAsyncLogRolling.java,**/DynamicLogicExpressionSuite.java,**/TestMasterFailoverWithProcedures.java,**/TestChoreService.java,**/TestScannerHeartbeatMessages.java,**/TestWALProcedureStore.java,**/TestRegionMergeTransactionOnCluster.java,**/TestSaslFanOutOneBlockAsyncDFSOutput.java,**/TestReplicationEndpointWithMultipleWAL.java,

Whats the '**/' about? Is it supposed to have opening/closing versions?

Thanks boss,
St.



On Mon, May 16, 2016 at 4:45 PM, Stack <stack@duboce.net> wrote:

> Sweet!
>
> On Mon, May 16, 2016 at 4:38 PM, Apekshit Sharma <appy@cloudera.com>
> wrote:
>
>> This mail is to introduce the work to tackle the flaky tests in our build.
>>
>> *Why is it important?*
>> - Our build history sucks, last 175 post-commit runs failed. We need to
>> make it useful.
>> - To better understand our code’s testing status, more importantly it’s
>> weak points.
>> - We know those 2-3 tests which keep failing every now and then, but not
>> those ~10 nasty ones which fail like 1 out of 50 times, and screw our build.
>> - This isn’t something that can be done manually on a daily basis. We
>> need automation.
>>
>> *Changes made so far:*
>> Code changes: HBASE-15839
>> <https://issues.apache.org/jira/browse/HBASE-15839>  (Umbrella issue)
>>
>> *Jenkins changes:*
>>
>>
>> [Diagram link:
>> https://issues.apache.org/jira/secure/attachment/12804292/Screen%20Shot%202016-05-16%20at%204.02.46%20PM.png
>> ]
>> ​
>> *(new job) HBase-Find-Flaky-Tests*: Gets test reports of recent builds
>> of post-commit job (TRUNK_matrix) and HBase-Flaky-Tests job (see below) to
>> find flaky tests. Frequency of run determines how fast we catch test
>> regressions. So if we run it every 4 hours, any test which started failing
>> in post-commit job (TRUNK_matrix) in last 4 hour will be blacklisted.
>>
>> *(new job) HBase-Flaky-Tests*: This job runs only the flaky tests. The
>> aim is to run this job back-to-back to collect as many runs as we can.
>> Higher the run rate, the better will be our system at catching the flaky
>> tests. We currently run it hourly. so we’ll be able to keep track of flaky
>> tests with ~5% failure rate or more.
>>
>> *Post-commit (TRUNK_matrix) and pre-commit jobs*: Exclude these flaky
>> tests.
>>
>>
>> *So what if a bad commit makes a good test bad?*
>> Since the test is not bad, it’ll run in next post-commit and will fail.
>> Next run of HBase-Find-Flaky-Tests will  pick it up and blacklist it.
>> Blacklisting will help keep the post-commit job and more importantly
>> pre-commit job clean, a problem we face quite often.
>>
>> *Are we just tucking away are shit?*
>> Nope, this will help us:
>> - first, Maintain a list of bad test (we lack that today).
>> - second, make our build greener to the point that a failed/red build is
>> something we worry about seriously.
>>
>> Once we are confident that the system is working fine, we’ll setup up
>> HBase-Find-Flaky-Tests job to send reports to dev@hbase so that devs
>> know about the bad tests. If it remains hidden somewhere in a jenkins job’s
>> archive, it’s unlike that we’ll actively work on getting them fixed :).
>>
>> I'll keep posting further updates on this thread.
>>
>> -- Appy
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message