hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mike Zarrin" <m...@unitedrmr.com>
Subject RE: a script to find out flaky tests of Hadoop jenkins job
Date Tue, 02 Sep 2014 04:07:22 GMT
Unsubscribe

-----Original Message-----
From: Yongjun Zhang [mailto:yzhang@cloudera.com] 
Sent: Monday, September 01, 2014 7:43 PM
To: common-dev@hadoop.apache.org
Subject: Re: a script to find out flaky tests of Hadoop jenkins job

HI Ted, thanks a lot, your suggestion is well taken!

Hi All,

I created HADOOP-11045 and uploaded the tool script. I hope you find it useful, thanks for
reviewing and providing feedback.

Best regards.

--Yongjun

On Sun, Aug 31, 2014 at 12:27 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> How about putting this tool in dev-support directory ?
>
> Thanks
>
> On Aug 30, 2014, at 11:10 PM, Yongjun Zhang <yzhang@cloudera.com> wrote:
>
> > Hi,
> >
> > I developed a tool to detect flaky tests of hadoop jenkins test 
> > jobs, on top of the initial work Todd Lipcon did. We find it quite 
> > useful, and
> with
> > Todd's agreement, I'd like to push it to upstream so all of us can 
> > share (thanks Todd for the initial work and support). I hope you 
> > find the tool useful.
> >
> > This is a tool for hadoop contributors rather than hadoop users. And 
> > it
> can
> > certainly be adapted to projects other than hadoop. I wonder where 
> > would
> be
> > a good place to put it.  Your advice is very much appreciated.
> >
> > Please see below the description and example output of the tool.
> >
> > Thanks a lot.
> >
> > --Yongjun
> >
> > Description of the tool:
> >
> > #
> > # Given a jenkins test job, this script examines all runs of the job 
> > done # within specified period of time (number of days prior to the 
> > execution # time of this script), and reports all failed tests.
> > #
> > # The output of this script includes a section for each run that has
> failed
> > # tests, with each failed test name listed.
> > #
> > # More importantly, at the end, it outputs a summary section to list 
> > all failed # tests within all examined runs, and indicate how many 
> > runs a same test # failed, and sorted all failed tests by how many 
> > runs each test failed
> in.
> > #
> > # This way, when we see failed tests in PreCommit build, we can 
> > quickly tell # whether a failed test is a new failure or it failed 
> > before, and it may just # be a flaky test.
> > #
> > # Of course, to be 100% sure about the reason of a failued test, 
> > closer look # at the failed test for the specific run is necessary.
> > #
> >
> > Example usage and output of the tool for job 
> > Hadoop-Common-0.23-Build, which indicates that the same test failed five times in
a row:
> >
> > ./determine-flaky-tests-hadoop.py -j Hadoop-Common-0.23-Build 
> > ****Recently FAILED builds in url:
> > https://builds.apache.org//job/Hadoop-Common-0.23-Build
> >    THERE ARE 5 builds (out of 5) that have failed tests in the past 
> > 14 days, as listed below:
> >
> > ==>
> https://builds.apache.org/job/Hadoop-Common-0.23-Build/1057/testReport
> > (2014-08-30 02:01:30)
> >    Failed test: 
> > org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
> > ==>
> https://builds.apache.org/job/Hadoop-Common-0.23-Build/1056/testReport
> > (2014-08-29 02:01:30)
> >    Failed test: 
> > org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
> > ==>
> https://builds.apache.org/job/Hadoop-Common-0.23-Build/1055/testReport
> > (2014-08-28 02:01:30)
> >    Failed test: 
> > org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
> > ==>
> https://builds.apache.org/job/Hadoop-Common-0.23-Build/1054/testReport
> > (2014-08-27 02:01:29)
> >    Failed test: 
> > org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
> > ==>
> https://builds.apache.org/job/Hadoop-Common-0.23-Build/1053/testReport
> > (2014-08-26 02:01:30)
> >    Failed test: 
> > org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
> >
> > All failed tests <#occurrences: testName>:
> >    5: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
> >
> >
> > Another example (for job Hadoop-Hdfs-trunk):
> >
> > [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -n 
> > 7 ****Recently FAILED builds in url:
> > https://builds.apache.org//job/Hadoop-Hdfs-trunk
> >    THERE ARE 7 builds (out of 8) that have failed tests in the past 
> > 7 days, as listed below:
> >
> > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1856/testReport
> > (2014-08-30 09:46:54)
> >    Failed test:
> >
> org.apache.hadoop.hdfs.TestDFSClientRetries.testIdempotentAllocateBloc
> kAndClose
> >    Failed test:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testFailuresArePerOperation
> >    Failed test:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testRetryOnChecksumFailure
> >    Failed test:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testWriteTimeoutAtDataNode
> >    Failed test:
> >
> org.apache.hadoop.hdfs.TestDFSClientRetries.testDFSClientRetriesOnBusy
> Blocks
> >    Failed test:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testClientDNProtocolTimeout
> >    Failed test:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum
> >    Failed test:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testNamenodeRestart
> > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1855/testReport
> > (2014-08-30 04:31:30)
> >    Failed test:
> >
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNod
> es.testBalancer
> >    Failed test:
> >
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNod
> es.testUnevenDistribution
> > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1854/testReport
> > (2014-08-29 04:31:30)
> >   Could not open testReport
> > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1853/testReport
> > (2014-08-28 09:37:18)
> >   Could not open testReport
> > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1852/testReport
> > (2014-08-28 09:28:48)
> >   Could not open testReport
> > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1850/testReport
> > (2014-08-27 04:31:30)
> >    Failed test:
> >
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.te
> stEnd2End
> > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1849/testReport
> > (2014-08-26 04:31:29)
> >    Failed test:
> >
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfe
> r.testBalancer0Integrity
> >
> > All failed tests <#occurrences: testName>:
> >    1:
> >
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfe
> r.testBalancer0Integrity
> >    1:
> >
> org.apache.hadoop.hdfs.TestDFSClientRetries.testIdempotentAllocateBloc
> kAndClose
> >    1:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testFailuresArePerOperation
> >    1:
> >
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.te
> stEnd2End
> >    1:
> >
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNod
> es.testUnevenDistribution
> >    1:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testRetryOnChecksumFailure
> >    1:
> >
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNod
> es.testBalancer
> >    1:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testWriteTimeoutAtDataNode
> >    1:
> >
> org.apache.hadoop.hdfs.TestDFSClientRetries.testDFSClientRetriesOnBusy
> Blocks
> >    1:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testClientDNProtocolTimeout
> >    1: org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum
> >    1: 
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testNamenodeRestart
> > [yzhang@localhost jenkinsftf]$
> >
> >
> >
> > On Thu, Aug 28, 2014 at 8:04 PM, Yongjun Zhang <yzhang@cloudera.com>
> wrote:
> >
> >> Hi,
> >>
> >> I just noticed that the recent jenkin test report doesn't include 
> >> link
> to
> >> test result, however, the email notice does show the failed tests:
> >>
> >> E.g.
> >>
> >> https://builds.apache.org/job/PreCommit-HDFS-Build/7846//
> >>
> >> Example old job report that has the link:
> >>
> >> https://builds.apache.org/job/PreCommit-HDFS-Build/7590/
> >>
> >> Would any one please take a look?
> >>
> >> Thanks a lot.
> >>
> >> --Yongjun
> >>
> >> On Thu, Aug 28, 2014 at 4:21 PM, Karthik Kambatla 
> >> <kasha@cloudera.com>
> >> wrote:
> >>
> >>> Thanks Giri and Ted for fixing the builds.
> >>>
> >>>
> >>> On Thu, Aug 28, 2014 at 9:49 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >>>
> >>>> Charles:
> >>>> QA build is running for your JIRA:
> >>>> https://builds.apache.org/job/PreCommit-hdfs-Build/7828/parameter
> >>>> s/
> >>>>
> >>>> Cheers
> >>>>
> >>>>
> >>>> On Thu, Aug 28, 2014 at 9:41 AM, Charles Lamb 
> >>>> <clamb@cloudera.com>
> >>> wrote:
> >>>>
> >>>>> On 8/28/2014 12:07 PM, Giridharan Kesavan wrote:
> >>>>>
> >>>>>> Fixed all the 3 pre-commit buids. test-patch's git reset --hard

> >>>>>> is removing the patchprocess dir, so moved it off the 
> >>>>>> workspace.
> >>>>> Thanks Giri. Should I resubmit HDFS-6954's patch? I've gotten 3

> >>>>> or 4 jenkins messages that indicated the problem so something is
> >>> resubmitting,
> >>>>> but now that you've fixed it, should I resubmit it again?
> >>>>>
> >>>>> Charles
> >>
> >>
>


Mime
View raw message