hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Roelofs (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2238) Undeletable build directories
Date Thu, 06 Jan 2011 22:54:51 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978551#action_12978551

Greg Roelofs commented on MAPREDUCE-2238:

bq. I guess it could be a test timing out right as a setPermissions is done, interrupting
in the middle... but seems pretty unlikely, don't you think?

Yes.  I'm guessing it's more subtle than that and lies within the core MR code or the JVM.
 The fact that I see it semi-frequently on NFS (that is, more frequent than Hudson or production)
suggests either timing (NFS is slow), perhaps via an erroneous assumption of synchronous behavior,
or else an erroneous assumption of an infallible system call.  It could be other things as
well, of course, but those seem to me like the most probable candidates.

bq. I agree we could work around it for the tests, but I'm nervous whether we will see this
issue crop up in production. Have you guys at Yahoo seen this on any clusters running secure

To clarify, I was suggesting working around it in the MR code itself, not realizing that the
Hudson backtrace wasn't using MR code at all.  (Well, apparently.)  So I'm not sure where
that leaves us, other than trying to fix the actual set-permissions problem.  Seems like no
one's basic deleteRecursive() implementation includes an option to attempt a chmod() before
failing on bad permissions?

Anyway, yes, I _think_ we've seen it in production with 0.20S or later, but it wasn't while
I was on call, so I might be remembering a different issue with similar symptoms.  Sorry...there
are lots of interesting failure modes in Hadoop, and my memory is finite. :-)

> Undeletable build directories 
> ------------------------------
>                 Key: MAPREDUCE-2238
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2238
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: build, test
>    Affects Versions: 0.23.0
>            Reporter: Eli Collins
> The MR hudson job is failing, looks like it's due to a test chmod'ing a build directory
so the checkout can't clean the build dir.
> https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/549/console
> Building remotely on hadoop7
> hudson.util.IOException2: remote file operation failed: /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk
at hudson.remoting.Channel@2545938c:hadoop7
> 	at hudson.FilePath.act(FilePath.java:749)
> 	at hudson.FilePath.act(FilePath.java:735)
> 	at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:589)
> 	at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:537)
> 	at hudson.model.AbstractProject.checkout(AbstractProject.java:1116)
> 	at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:479)
> 	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:411)
> 	at hudson.model.Run.run(Run.java:1324)
> 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
> 	at hudson.model.ResourceController.execute(ResourceController.java:88)
> 	at hudson.model.Executor.run(Executor.java:139)
> Caused by: java.io.IOException: Unable to delete /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/userlogs/job_20101230131139886_0001/attempt_20101230131139886_0001_m_000000_0

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message