hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@alumni.cmu.edu>
Subject Re: upstream jenkins build broken?
Date Wed, 11 Mar 2015 21:10:00 GMT
Is there a maven plugin or setting we can use to simply remove
directories that have no executable permissions on them?  Clearly we
have the permission to do this from a technical point of view (since
we created the directories as the jenkins user), it's simply that the
code refuses to do it.

Otherwise I guess we can just fix those tests...

Colin

On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com> wrote:
> Thanks a lot for looking into HDFS-7722, Chris.
>
> In HDFS-7722:
> TestDataNodeVolumeFailureXXX tests reset data dir permissions in TearDown().
> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>
> Also I ran mvn test several times on my machine and all tests passed.
>
> However, since in DiskChecker#checkDirAccess():
>
> private static void checkDirAccess(File dir) throws DiskErrorException {
>   if (!dir.isDirectory()) {
>     throw new DiskErrorException("Not a directory: "
>                                  + dir.toString());
>   }
>
>   checkAccessByFileMethods(dir);
> }
>
> One potentially safer alternative is replacing data dir with a regular
> file to stimulate disk failures.
>
> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <cnauroth@hortonworks.com> wrote:
>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> TestDataNodeVolumeFailureReporting, and
>> TestDataNodeVolumeFailureToleration all remove executable permissions from
>> directories like the one Colin mentioned to simulate disk failures at data
>> nodes.  I reviewed the code for all of those, and they all appear to be
>> doing the necessary work to restore executable permissions at the end of
>> the test.  The only recent uncommitted patch I¹ve seen that makes changes
>> in these test suites is HDFS-7722.  That patch still looks fine though.  I
>> don¹t know if there are other uncommitted patches that changed these test
>> suites.
>>
>> I suppose it¹s also possible that the JUnit process unexpectedly died
>> after removing executable permissions but before restoring them.  That
>> always would have been a weakness of these test suites, regardless of any
>> recent changes.
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com> wrote:
>>
>>>Hey Colin,
>>>
>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>>>these boxes. He took a look and concluded that some perms are being set in
>>>those directories by our unit tests which are precluding those files from
>>>getting deleted. He's going to clean up the boxes for us, but we should
>>>expect this to keep happening until we can fix the test in question to
>>>properly clean up after itself.
>>>
>>>To help narrow down which commit it was that started this, Andrew sent me
>>>this info:
>>>
>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ has
>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>>>UTC
>>>on March 5th."
>>>
>>>--
>>>Aaron T. Myers
>>>Software Engineer, Cloudera
>>>
>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmccabe@apache.org>
>>>wrote:
>>>
>>>> Hi all,
>>>>
>>>> A very quick (and not thorough) survey shows that I can't find any
>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them seem
>>>> to be failing with some variant of this message:
>>>>
>>>> [ERROR] Failed to execute goal
>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>>
>>>>
>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>> -> [Help 1]
>>>>
>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>> permissions?
>>>>
>>>> Colin
>>>>
>>
>
>
>
> --
> Lei (Eddy) Xu
> Software Engineer, Cloudera

Mime
View raw message