hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: upstream jenkins build broken?
Date Wed, 11 Mar 2015 21:34:10 GMT
The only thing I'm aware of is the failOnError option:


I prefer that we don't disable this, because ignoring different kinds of
failures could leave our build directories in an indeterminate state.  For
example, we could end up with an old class file on the classpath for test
runs that was supposedly deleted.

I think it's worth exploring Eddy's suggestion to try simulating failure
by placing a file where the code expects to see a directory.  That might
even let us enable some of these tests that are skipped on Windows,
because Windows allows access for the owner even after permissions have
been stripped.

Chris Nauroth

On 3/11/15, 2:10 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu> wrote:

>Is there a maven plugin or setting we can use to simply remove
>directories that have no executable permissions on them?  Clearly we
>have the permission to do this from a technical point of view (since
>we created the directories as the jenkins user), it's simply that the
>code refuses to do it.
>Otherwise I guess we can just fix those tests...
>On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com> wrote:
>> Thanks a lot for looking into HDFS-7722, Chris.
>> In HDFS-7722:
>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> Also I ran mvn test several times on my machine and all tests passed.
>> However, since in DiskChecker#checkDirAccess():
>> private static void checkDirAccess(File dir) throws DiskErrorException {
>>   if (!dir.isDirectory()) {
>>     throw new DiskErrorException("Not a directory: "
>>                                  + dir.toString());
>>   }
>>   checkAccessByFileMethods(dir);
>> }
>> One potentially safer alternative is replacing data dir with a regular
>> file to stimulate disk failures.
>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>><cnauroth@hortonworks.com> wrote:
>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>> TestDataNodeVolumeFailureReporting, and
>>> TestDataNodeVolumeFailureToleration all remove executable permissions
>>> directories like the one Colin mentioned to simulate disk failures at
>>> nodes.  I reviewed the code for all of those, and they all appear to be
>>> doing the necessary work to restore executable permissions at the end
>>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>> in these test suites is HDFS-7722.  That patch still looks fine
>>>though.  I
>>> don¹t know if there are other uncommitted patches that changed these
>>> suites.
>>> I suppose it¹s also possible that the JUnit process unexpectedly died
>>> after removing executable permissions but before restoring them.  That
>>> always would have been a weakness of these test suites, regardless of
>>> recent changes.
>>> Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com> wrote:
>>>>Hey Colin,
>>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>>>>these boxes. He took a look and concluded that some perms are being
>>>>set in
>>>>those directories by our unit tests which are precluding those files
>>>>getting deleted. He's going to clean up the boxes for us, but we should
>>>>expect this to keep happening until we can fix the test in question to
>>>>properly clean up after itself.
>>>>To help narrow down which commit it was that started this, Andrew sent
>>>>this info:
>>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>>>>on March 5th."
>>>>Aaron T. Myers
>>>>Software Engineer, Cloudera
>>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmccabe@apache.org>
>>>>> Hi all,
>>>>> A very quick (and not thorough) survey shows that I can't find any
>>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>>>>> to be failing with some variant of this message:
>>>>> [ERROR] Failed to execute goal
>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
>>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>>> -> [Help 1]
>>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>> permissions?
>>>>> Colin
>> --
>> Lei (Eddy) Xu
>> Software Engineer, Cloudera

View raw message