hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: we need a fix: precommit failures correlate to hdfs patches
Date Mon, 04 May 2015 21:23:32 GMT
If we suspect long run times are a potential root cause, then another
thing we could try is turning on parallel test execution.  To do that,
we'd add the -Pparallel-tests argument and possibly tune
-DtestsThreadCount=N.  (The default for N is 4.)


This has given some of us significant speed-ups while running tests in our
dev environments.  I haven't tried it in a while though, so we might
surface some test isolation problems, such as if 2 test suites tried to
work in the same directory for data.  We cleaned up a lot of issues like
that before committing the parallel-tests patches, but it's possible new
problems have crept in.

--Chris Nauroth

On 5/3/15, 9:02 PM, "Sean Busbey" <busbey@cloudera.com> wrote:

>The patch artifact directory in the mainline hadoop jenkins jobs are
>outside of the workspace. I'm not sure what, if anything, jenkins
>guarantees about files out of the main workspace.
>They all write to ${WORKSPACE}/../patchProcess, which will probably
>if multiple runs happen on the same machine. They also all blindly move
>that directory at the end of the run.
>On Sun, May 3, 2015 at 3:02 PM, Allen Wittenauer <aw@altiscale.com> wrote:
>>         So, as some may have noticed, I slammed the Jenkins servers over
>> the weekend to get some recent patch test runs in JIRA for the bug bash
>> this week.  I've had a suspicion for a while now that either the long
>> times of the hadoop-hdfs module unit tests (typically 2+ hours) or the
>> tests themselves were related to the patch process directory getting
>> removed out from underneath test-patch.
>>         To test the hypothesis, I submitted all of the non-HDFS patches
>> that they were first in the queue.  Let them run for a very long time.
>> Jenkins bounced back and forth between YARN, MR, and HADOOP.   No issues
>> encounters.  Added HDFS patches into the mix. BOOM. The dreaded "The
>> artifact directory has been removed! ³ started to appear here and there.
>> This seems to provide some evidence that, yes, hdfs unit tests are
>> directory or indirectly related to the failures.
>>         IMO, I think we need to take a serious look at:
>>         * splitting up the hadoop-hdfs module into multiple modules to
>> reduce unit test run times
>>         * checking to see if the pre commit hooks in hdfs are different
>> than the rest (I do know that the YARN bits are different and appear to
>> have some bugs as well)
>>         * increasing the timeout for jenkins job runs
>>         FWIW, I¹ve also found some minor things here and there with the
>> rewritten test-patch.sh.  JIRAs have been filed.  One critical, one
>> and a handful of minor things.

View raw message