hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2238) Undeletable build directories
Date Tue, 18 Jan 2011 17:05:46 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983267#action_12983267
] 

Todd Lipcon commented on MAPREDUCE-2238:
----------------------------------------

Spent some time adding logging and looping the tests to figure out this problem. I think I
have it cracked.

The issue is not multiple threads calling setPermission() on the same process, but rather
a case where one thread is calling setPermission on the *parent* directory of a file where
another thread (actually another entire process) is calling setPermission.

In particular, these two invocations race:

2011-01-18 09:00:40,958 INFO  tasktracker.Localizer (Localizer.java:setPermissions(129)) -
Thread[TaskLauncher for MAP tasks,5,main]: About to set permissions on /data/1/todd/cdh/repos/cdh3/hadoop-0.20/build/test/logs/userlogs/job_20110118090037816_0001
java.lang.Exception
  at org.apache.hadoop.mapreduce.server.tasktracker.Localizer$PermissionsHandler.setPermissions(Localizer.java:129)
  at org.apache.hadoop.mapreduce.server.tasktracker.Localizer.initializeJobLogDir(Localizer.java:429)
  at org.apache.hadoop.mapred.TaskTracker.initializeJobLogDir(TaskTracker.java:1072)
  at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:969)
  at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2209)
2011-01-18 09:00:40,985 INFO  tasktracker.Localizer (Localizer.java:setPermissions(129)) -
Thread[Thread-213,5,main]: About to set permissions on /data/1/todd/cdh/repos/cdh3/hadoop-0.20/build/test/logs/userlogs/job_20110118090037816_0001/attempt_20110118090037816_0001_m_000005_0
java.lang.Exception
  at org.apache.hadoop.mapreduce.server.tasktracker.Localizer$PermissionsHandler.setPermissions(Localizer.java:129)
  at org.apache.hadoop.mapred.TaskRunner.prepareLogFiles(TaskRunner.java:285)
  at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:198)

The above traces are from an 0.20 branch but I imagine it's the same deal on trunk.

The issue is that the top invocation flips the job_<id> directory to 000 momentarily.
During that time, the stat/chmod calls for the attempt directory fail with EACCES, which can
leave the attempt directory with the wrong permissions. I have strace output which shows this
as well.

I think we should do away with this Java API nonsense altogether, link in a normal chmod call,
and use fork by default when native isn't available.

> Undeletable build directories 
> ------------------------------
>
>                 Key: MAPREDUCE-2238
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2238
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: build, test
>    Affects Versions: 0.23.0
>            Reporter: Eli Collins
>         Attachments: mapreduce-2238.txt
>
>
> The MR hudson job is failing, looks like it's due to a test chmod'ing a build directory
so the checkout can't clean the build dir.
> https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/549/console
> Building remotely on hadoop7
> hudson.util.IOException2: remote file operation failed: /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk
at hudson.remoting.Channel@2545938c:hadoop7
> 	at hudson.FilePath.act(FilePath.java:749)
> 	at hudson.FilePath.act(FilePath.java:735)
> 	at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:589)
> 	at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:537)
> 	at hudson.model.AbstractProject.checkout(AbstractProject.java:1116)
> 	at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:479)
> 	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:411)
> 	at hudson.model.Run.run(Run.java:1324)
> 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
> 	at hudson.model.ResourceController.execute(ResourceController.java:88)
> 	at hudson.model.Executor.run(Executor.java:139)
> Caused by: java.io.IOException: Unable to delete /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/test/logs/userlogs/job_20101230131139886_0001/attempt_20101230131139886_0001_m_000000_0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message