hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11045) Introducing a tool to detect flaky tests of hadoop jenkins test job
Date Thu, 09 Oct 2014 18:30:35 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165489#comment-14165489
] 

Yongjun Zhang commented on HADOOP-11045:
----------------------------------------

Results of recent  PreCommit-HDFS-Build run:

{code}
****Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build
     THERE ARE 44 builds (out of 52) that have failed tests in the past 3 days, as listed
below:
...
Among 52 runs examined, all failed tests <#failedRuns: testName>:
    15: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
    3: org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode
    2: org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS.testIsEncryptedMethod
    2: org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS.testListEncryptionZonesAsNonSuperUser
    2: org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS.testRenameFileSystem
    2: org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS.testSnapshotsOnEncryptionZones
    1: org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS.testBasicOperations
    1: org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshot.testSnapshot
    1: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
    1: org.apache.hadoop.fs.contract.hdfs.TestHDFSContractAppend.testAppendToEmptyFile
    1: org.apache.hadoop.hdfs.TestReplaceDatanodeOnFailure.testAppend
    1: org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testAddVolumesDuringWrite
    1: org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS.testFsckOnEncryptionZones
......
{code}




> Introducing a tool to detect flaky tests of hadoop jenkins test job
> -------------------------------------------------------------------
>
>                 Key: HADOOP-11045
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11045
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build, tools
>    Affects Versions: 2.5.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HADOOP-11045.001.patch, HADOOP-11045.002.patch, HADOOP-11045.003.patch
>
>
> File this jira to introduce a tool to detect flaky tests of hadoop jenkins test jobs.
Certainly it can be adapted to projects other than hadoop.
> I developed the tool on top of some initial work [~tlipcon] did. We find it quite useful.
With Todd's agreement, I'd like to push it to upstream so all of us can share (thanks Todd
for the initial work and support). I hope you find the tool useful too.
> The idea is, when one has the need to see if the test failure s/he is seeing in a pre-build
jenkins run is flaky or not, s/he could run this tool to get a good idea. Also, if one wants
to look at the failure trend of a testcase in a given jenkins job, the tool can be used too.
I hope people find it useful.
> This tool is for hadoop contributors rather than hadoop users. Thanks [~tedyu] for the
advice to put to dev-support dir.
> Description of the tool:
> {code}
> #
> # Given a jenkins test job, this script examines all runs of the job done
> # within specified period of time (number of days prior to the execution
> # time of this script), and reports all failed tests.
> #
> # The output of this script includes a section for each run that has failed
> # tests, with each failed test name listed.
> #
> # More importantly, at the end, it outputs a summary section to list all failed
> # tests within all examined runs, and indicate how many runs a same test
> # failed, and sorted all failed tests by how many runs each test failed in.
> #
> # This way, when we see failed tests in PreCommit build, we can quickly tell 
> # whether a failed test is a new failure or it failed before, and it may just 
> # be a flaky test.
> #
> # Of course, to be 100% sure about the reason of a failed test, closer look 
> # at the failed test for the specific run is necessary.
> #
> {code}
> How to use the tool:
> {code}
> Usage: determine-flaky-tests-hadoop.py [options]
> Options:
>   -h, --help            show this help message and exit
>   -J JENKINS_URL, --jenkins-url=JENKINS_URL
>                         Jenkins URL
>   -j JOB_NAME, --job-name=JOB_NAME
>                         Job name to look at
>   -n NUM_PREV_DAYS, --num-days=NUM_PREV_DAYS
>                         Number of days to examine
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message