hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: upstream jenkins build broken?
Date Wed, 11 Mar 2015 21:51:56 GMT
Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
repo?  Is there any additional context we need to be aware of?

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/11/15, 2:44 PM, "Sean Busbey" <busbey@cloudera.com> wrote:

>+dev@hbase
>
>HBase has recently been cleaning up our precommit jenkins jobs to make
>them
>more robust. From what I can tell our stuff started off as an earlier
>version of what Hadoop uses for testing.
>
>Folks on either side open to an experiment of combining our precommit
>check
>tooling? In principle we should be looking for the same kinds of things.
>
>Naturally we'll still need different jenkins jobs to handle different
>resource needs and we'd need to figure out where stuff eventually lives,
>but that could come later.
>
>On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cnauroth@hortonworks.com>
>wrote:
>
>> The only thing I'm aware of is the failOnError option:
>>
>> 
>>http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>rs
>> .html
>>
>>
>> I prefer that we don't disable this, because ignoring different kinds of
>> failures could leave our build directories in an indeterminate state.
>>For
>> example, we could end up with an old class file on the classpath for
>>test
>> runs that was supposedly deleted.
>>
>> I think it's worth exploring Eddy's suggestion to try simulating failure
>> by placing a file where the code expects to see a directory.  That might
>> even let us enable some of these tests that are skipped on Windows,
>> because Windows allows access for the owner even after permissions have
>> been stripped.
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/11/15, 2:10 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu> wrote:
>>
>> >Is there a maven plugin or setting we can use to simply remove
>> >directories that have no executable permissions on them?  Clearly we
>> >have the permission to do this from a technical point of view (since
>> >we created the directories as the jenkins user), it's simply that the
>> >code refuses to do it.
>> >
>> >Otherwise I guess we can just fix those tests...
>> >
>> >Colin
>> >
>> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com> wrote:
>> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >>
>> >> In HDFS-7722:
>> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >>TearDown().
>> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> >>
>> >> Also I ran mvn test several times on my machine and all tests passed.
>> >>
>> >> However, since in DiskChecker#checkDirAccess():
>> >>
>> >> private static void checkDirAccess(File dir) throws
>>DiskErrorException {
>> >>   if (!dir.isDirectory()) {
>> >>     throw new DiskErrorException("Not a directory: "
>> >>                                  + dir.toString());
>> >>   }
>> >>
>> >>   checkAccessByFileMethods(dir);
>> >> }
>> >>
>> >> One potentially safer alternative is replacing data dir with a
>>regular
>> >> file to stimulate disk failures.
>> >>
>> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >><cnauroth@hortonworks.com> wrote:
>> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >>> TestDataNodeVolumeFailureReporting, and
>> >>> TestDataNodeVolumeFailureToleration all remove executable
>>permissions
>> >>>from
>> >>> directories like the one Colin mentioned to simulate disk failures
>>at
>> >>>data
>> >>> nodes.  I reviewed the code for all of those, and they all appear
>>to be
>> >>> doing the necessary work to restore executable permissions at the
>>end
>> >>>of
>> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
>> >>>changes
>> >>> in these test suites is HDFS-7722.  That patch still looks fine
>> >>>though.  I
>> >>> don¹t know if there are other uncommitted patches that changed these
>> >>>test
>> >>> suites.
>> >>>
>> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>>died
>> >>> after removing executable permissions but before restoring them.
>>That
>> >>> always would have been a weakness of these test suites, regardless
>>of
>> >>>any
>> >>> recent changes.
>> >>>
>> >>> Chris Nauroth
>> >>> Hortonworks
>> >>> http://hortonworks.com/
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com> wrote:
>> >>>
>> >>>>Hey Colin,
>> >>>>
>> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>>with
>> >>>>these boxes. He took a look and concluded that some perms are being
>> >>>>set in
>> >>>>those directories by our unit tests which are precluding those files
>> >>>>from
>> >>>>getting deleted. He's going to clean up the boxes for us, but we
>>should
>> >>>>expect this to keep happening until we can fix the test in question
>>to
>> >>>>properly clean up after itself.
>> >>>>
>> >>>>To help narrow down which commit it was that started this, Andrew
>>sent
>> >>>>me
>> >>>>this info:
>> >>>>
>> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> 
>>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>/
>> >>>>has
>> >>>>500 perms, so I'm guessing that's the problem. Been that way since
>>9:32
>> >>>>UTC
>> >>>>on March 5th."
>> >>>>
>> >>>>--
>> >>>>Aaron T. Myers
>> >>>>Software Engineer, Cloudera
>> >>>>
>> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>><cmccabe@apache.org>
>> >>>>wrote:
>> >>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> A very quick (and not thorough) survey shows that I can't find
any
>> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
them
>> >>>>>seem
>> >>>>> to be failing with some variant of this message:
>> >>>>>
>> >>>>> [ERROR] Failed to execute goal
>> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>(default-clean)
>> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> >>>>>
>> >>>>>
>> 
>>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>>fs
>> >>>>>-pr
>> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>> -> [Help 1]
>> >>>>>
>> >>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> >>>>> permissions?
>> >>>>>
>> >>>>> Colin
>> >>>>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Lei (Eddy) Xu
>> >> Software Engineer, Cloudera
>>
>>
>
>
>-- 
>Sean

Mime
View raw message