hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@alumni.cmu.edu>
Subject Re: upstream jenkins build broken?
Date Wed, 11 Mar 2015 23:23:19 GMT
On Wed, Mar 11, 2015 at 2:34 PM, Chris Nauroth <cnauroth@hortonworks.com> wrote:
> The only thing I'm aware of is the failOnError option:
>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-errors
> .html
>
>
> I prefer that we don't disable this, because ignoring different kinds of
> failures could leave our build directories in an indeterminate state.  For
> example, we could end up with an old class file on the classpath for test
> runs that was supposedly deleted.
>
> I think it's worth exploring Eddy's suggestion to try simulating failure
> by placing a file where the code expects to see a directory.  That might
> even let us enable some of these tests that are skipped on Windows,
> because Windows allows access for the owner even after permissions have
> been stripped.

+1.  JIRA?

Colin

>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 3/11/15, 2:10 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu> wrote:
>
>>Is there a maven plugin or setting we can use to simply remove
>>directories that have no executable permissions on them?  Clearly we
>>have the permission to do this from a technical point of view (since
>>we created the directories as the jenkins user), it's simply that the
>>code refuses to do it.
>>
>>Otherwise I guess we can just fix those tests...
>>
>>Colin
>>
>>On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com> wrote:
>>> Thanks a lot for looking into HDFS-7722, Chris.
>>>
>>> In HDFS-7722:
>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>TearDown().
>>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>>
>>> Also I ran mvn test several times on my machine and all tests passed.
>>>
>>> However, since in DiskChecker#checkDirAccess():
>>>
>>> private static void checkDirAccess(File dir) throws DiskErrorException {
>>>   if (!dir.isDirectory()) {
>>>     throw new DiskErrorException("Not a directory: "
>>>                                  + dir.toString());
>>>   }
>>>
>>>   checkAccessByFileMethods(dir);
>>> }
>>>
>>> One potentially safer alternative is replacing data dir with a regular
>>> file to stimulate disk failures.
>>>
>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>><cnauroth@hortonworks.com> wrote:
>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>> TestDataNodeVolumeFailureReporting, and
>>>> TestDataNodeVolumeFailureToleration all remove executable permissions
>>>>from
>>>> directories like the one Colin mentioned to simulate disk failures at
>>>>data
>>>> nodes.  I reviewed the code for all of those, and they all appear to be
>>>> doing the necessary work to restore executable permissions at the end
>>>>of
>>>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>>changes
>>>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>though.  I
>>>> don¹t know if there are other uncommitted patches that changed these
>>>>test
>>>> suites.
>>>>
>>>> I suppose it¹s also possible that the JUnit process unexpectedly died
>>>> after removing executable permissions but before restoring them.  That
>>>> always would have been a weakness of these test suites, regardless of
>>>>any
>>>> recent changes.
>>>>
>>>> Chris Nauroth
>>>> Hortonworks
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com> wrote:
>>>>
>>>>>Hey Colin,
>>>>>
>>>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>>>>>these boxes. He took a look and concluded that some perms are being
>>>>>set in
>>>>>those directories by our unit tests which are precluding those files
>>>>>from
>>>>>getting deleted. He's going to clean up the boxes for us, but we should
>>>>>expect this to keep happening until we can fix the test in question to
>>>>>properly clean up after itself.
>>>>>
>>>>>To help narrow down which commit it was that started this, Andrew sent
>>>>>me
>>>>>this info:
>>>>>
>>>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>>>>>has
>>>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>>>>>UTC
>>>>>on March 5th."
>>>>>
>>>>>--
>>>>>Aaron T. Myers
>>>>>Software Engineer, Cloudera
>>>>>
>>>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmccabe@apache.org>
>>>>>wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> A very quick (and not thorough) survey shows that I can't find any
>>>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>>>>>>seem
>>>>>> to be failing with some variant of this message:
>>>>>>
>>>>>> [ERROR] Failed to execute goal
>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
>>>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>>>>
>>>>>>
>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs
>>>>>>-pr
>>>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>> -> [Help 1]
>>>>>>
>>>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>>> permissions?
>>>>>>
>>>>>> Colin
>>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Lei (Eddy) Xu
>>> Software Engineer, Cloudera
>

Mime
View raw message