hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@alumni.cmu.edu>
Subject Re: upstream jenkins build broken?
Date Wed, 11 Mar 2015 23:23:19 GMT
On Wed, Mar 11, 2015 at 2:34 PM, Chris Nauroth <cnauroth@hortonworks.com> wrote:
> The only thing I'm aware of is the failOnError option:
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-errors
> .html
> I prefer that we don't disable this, because ignoring different kinds of
> failures could leave our build directories in an indeterminate state.  For
> example, we could end up with an old class file on the classpath for test
> runs that was supposedly deleted.
> I think it's worth exploring Eddy's suggestion to try simulating failure
> by placing a file where the code expects to see a directory.  That might
> even let us enable some of these tests that are skipped on Windows,
> because Windows allows access for the owner even after permissions have
> been stripped.

+1.  JIRA?


> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
> On 3/11/15, 2:10 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu> wrote:
>>Is there a maven plugin or setting we can use to simply remove
>>directories that have no executable permissions on them?  Clearly we
>>have the permission to do this from a technical point of view (since
>>we created the directories as the jenkins user), it's simply that the
>>code refuses to do it.
>>Otherwise I guess we can just fix those tests...
>>On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com> wrote:
>>> Thanks a lot for looking into HDFS-7722, Chris.
>>> In HDFS-7722:
>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>>> Also I ran mvn test several times on my machine and all tests passed.
>>> However, since in DiskChecker#checkDirAccess():
>>> private static void checkDirAccess(File dir) throws DiskErrorException {
>>>   if (!dir.isDirectory()) {
>>>     throw new DiskErrorException("Not a directory: "
>>>                                  + dir.toString());
>>>   }
>>>   checkAccessByFileMethods(dir);
>>> }
>>> One potentially safer alternative is replacing data dir with a regular
>>> file to stimulate disk failures.
>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>><cnauroth@hortonworks.com> wrote:
>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>> TestDataNodeVolumeFailureReporting, and
>>>> TestDataNodeVolumeFailureToleration all remove executable permissions
>>>> directories like the one Colin mentioned to simulate disk failures at
>>>> nodes.  I reviewed the code for all of those, and they all appear to be
>>>> doing the necessary work to restore executable permissions at the end
>>>> the test.  The only recent uncommitted patch I¹ve seen that makes
>>>> in these test suites is HDFS-7722.  That patch still looks fine
>>>>though.  I
>>>> don¹t know if there are other uncommitted patches that changed these
>>>> suites.
>>>> I suppose it¹s also possible that the JUnit process unexpectedly died
>>>> after removing executable permissions but before restoring them.  That
>>>> always would have been a weakness of these test suites, regardless of
>>>> recent changes.
>>>> Chris Nauroth
>>>> Hortonworks
>>>> http://hortonworks.com/
>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com> wrote:
>>>>>Hey Colin,
>>>>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>>>>>these boxes. He took a look and concluded that some perms are being
>>>>>set in
>>>>>those directories by our unit tests which are precluding those files
>>>>>getting deleted. He's going to clean up the boxes for us, but we should
>>>>>expect this to keep happening until we can fix the test in question to
>>>>>properly clean up after itself.
>>>>>To help narrow down which commit it was that started this, Andrew sent
>>>>>this info:
>>>>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>>>>>on March 5th."
>>>>>Aaron T. Myers
>>>>>Software Engineer, Cloudera
>>>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmccabe@apache.org>
>>>>>> Hi all,
>>>>>> A very quick (and not thorough) survey shows that I can't find any
>>>>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>>>>>> to be failing with some variant of this message:
>>>>>> [ERROR] Failed to execute goal
>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
>>>>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>>>> -> [Help 1]
>>>>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>>>>> permissions?
>>>>>> Colin
>>> --
>>> Lei (Eddy) Xu
>>> Software Engineer, Cloudera

View raw message