I resolved this by setting the umask such that mapred created files with group having read permission.  I discovered that for whatever reason the tasktracker running as mapred was trying to read the job.xml file before the permissions were effectively setup by the child.  I was running an strace -f on the parent tasktracker process it I would see:

   * Child: open job.xml (with O_WRONLY|O_CREAT|O_TRUNC and mode 0666)
   * Child: chmod 777 job.xml
   * Child: chmod 640 job.xml
   * Parent: stat job.xml (permissions denied)

I'm assuming either strace isn't showing the parent stat in the right spot or some caching effect is causing the stat to see the original permission from the open.  If I run ls in a tight loop I can see that job.xml is created with 600 permission and it's like that for dozens of iterations of ls (in a while `true` loop).  This strikes me as some kind of bug.

    Chris
 


On Wed, Sep 18, 2013 at 3:06 PM, Christopher Penney <cpenney@gmail.com> wrote:

Here is some more info.  I realized if I run the tasktracker as root it works, but if I run it as mapred (which I assume is what I'm supposed to do) I get the erros below.  When a job attempts running I see this under mapred.localdir.

taskTracker:
total 4
drwxr-s--- 3 cpenney mapred 4096 Sep 18 14:53 cpenney

taskTracker/cpenney:
total 4
drwx--S--- 3 cpenney mapred 4096 Sep 18 14:53 jobcache

taskTracker/cpenney/jobcache:
total 4
drwx--S--- 4 cpenney mapred 4096 Sep 18 14:53 job_201309181359_0029

taskTracker/cpenney/jobcache/job_201309181359_0029:
total 88
drwx--S--- 3 cpenney mapred  4096 Sep 18 14:53 jars
-rw------- 1 cpenney mapred 72974 Sep 18 14:53 job.xml
-rw------- 1 cpenney mapred   230 Sep 18 14:53 jobToken
drwx--S--- 2 cpenney mapred  4096 Sep 18 14:53 work

taskTracker/cpenney/jobcache/job_201309181359_0029/jars:
total 6672
-rw------- 1 cpenney mapred   52780 Sep 18 14:53 .job.jar.crc
-rwxrwxrwx 1 cpenney mapred 6754700 Sep 18 14:53 job.jar
drwx--S--- 3 cpenney mapred    4096 Sep 18 14:53 org

taskTracker/cpenney/jobcache/job_201309181359_0029/jars/org:
total 4
drwx--S--- 3 cpenney mapred 4096 Sep 18 14:53 apache

taskTracker/cpenney/jobcache/job_201309181359_0029/jars/org/apache:
total 4
drwx--S--- 10 cpenney mapred 4096 Sep 18 14:53 pig
[snipped]

But in the log I see:

2013-09-18 14:53:59,951 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201309181359_0029_m_000002_0:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/cpenney/jobcache/job_201309181359_0029/job.xml in any of the configured local directories

2013-09-18 14:53:59,952 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to set finish time for task attempt_201309181359_0029_m_000002_0 when no start time is set, stackTrace is : java.lang.Exception

2013-09-18 14:54:00,303 WARN org.apache.hadoop.mapred.TaskTracker: Exception while localization java.io.IOException: Job initialization failed (255) with output: Reading task controller config from /etc/hadoop/taskcontroller.cfg

My taskcontroller.cfg file has:

mapred.local.dir=/tmp/hadoop/mapred
hadoop.log.dir=/var/log/hadoop
mapred.tasktracker.tasks.sleeptime-before-sigkill=30
mapreduce.tasktracker.group=mapred
banned.users=mapred,hdfs

In /etc/hadoop I have:

---Sr-s--- 1 root   mapred 63382 Nov 19  2012 task-controller
-rw-r--r-- 1 root   mapred   196 Sep 18 14:30 taskcontroller.cfg


   Chris




On Wed, Sep 18, 2013 at 1:26 PM, Vinod Kumar Vavilapalli <vinodkv@apache.org> wrote:
What is your config set to for mapred local dirs? And what are the permissions to those directories?

All users need executable permissions in all the paths up to the local-dir so that they can create their own directories in there. For e.g. if one of the mapred local dir is /a/b/c/mapred, then all of /a, /a/b, /a/b/c etc need to be executable by everyone - an executable permission is needed in a linux dir for someone to be able to create files/dir in some of the sub-directories.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Sep 18, 2013, at 7:26 AM, Christopher Penney wrote:

I have a test environment with hadoop 1.1.1 setup with Kerberos and yesterday I zapped my mapred.local.dir on the job and task trackers as part of some cleanup.  When I started the task trackers back up I was unable to run MR jobs.  This seems like a permission issue, but I can't figure out what it would be since it auto creates everything.  I didn't make any changes to taskcontroller.cfg or mapred-site.xml.  Below is a log from the task tracker.

   Chris

2013-09-18 10:21:27,040 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201309180916_0024_m_000002_0 task's state:UNASSIGNED
2013-09-18 10:21:27,040 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201309180916_0024_m_000002_0 which needs 1 slots
2013-09-18 10:21:27,040 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 16 and trying to launch attempt_201309180916_0024_m_000002_0 which needs 1 slots
2013-09-18 10:21:28,524 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201309180916_0024_m_000002_0:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/cpenney/jobcache/job_201309180916_0024/job.xml in any of the configured local directories
 at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
 at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
 at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1341)
 at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1213)
 at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2568)
 at java.lang.Thread.run(Thread.java:662)

2013-09-18 10:21:28,525 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to set finish time for task attempt_201309180916_0024_m_000002_0 when no start time is set, stackTrace is : java.lang.Exception
 at org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:145)
 at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:3285)
 at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2578)
 at java.lang.Thread.run(Thread.java:662)

2013-09-18 10:21:28,525 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 16
2013-09-18 10:21:28,554 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201309180916_0024_m_000002_1 task's state:UNASSIGNED
2013-09-18 10:21:28,554 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201309180916_0024_m_000002_1 which needs 1 slots
2013-09-18 10:21:28,554 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 16 and trying to launch attempt_201309180916_0024_m_000002_1 which needs 1 slots
2013-09-18 10:21:28,595 INFO org.apache.hadoop.mapred.TaskController: Reading task controller config from /etc/hadoop/taskcontroller.cfg
2013-09-18 10:21:28,595 INFO org.apache.hadoop.mapred.TaskController: main : command provided 0
2013-09-18 10:21:28,595 INFO org.apache.hadoop.mapred.TaskController: main : user is cpenney
2013-09-18 10:21:28,595 INFO org.apache.hadoop.mapred.TaskController: Good mapred-local-dirs are /tmp/hadoop/mapred
2013-09-18 10:21:28,595 INFO org.apache.hadoop.mapred.TaskController: Can't open /tmp/hadoop/mapred/taskTracker/cpenney/jobcache/job_201309180916_0024/jobToken for output - File exists
2013-09-18 10:21:28,596 WARN org.apache.hadoop.mapred.TaskTracker: Exception while localization java.io.IOException: Job initialization failed (255) with output: Reading task controller config from /etc/hadoop/taskcontroller.cfg
main : command provided 0
main : user is cpenney
Good mapred-local-dirs are /tmp/hadoop/mapred
Can't open /tmp/hadoop/mapred/taskTracker/cpenney/jobcache/job_201309180916_0024/jobToken for output - File exists

 at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:193)
 at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1323)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
 at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1298)
 at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1213)
 at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2568)
 at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
 at org.apache.hadoop.util.Shell.run(Shell.java:182)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
 at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:186)
 ... 8 more

2013-09-18 10:21:28,596 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:cpenney cause:java.io.IOException: Job initialization failed (255) with output: Reading task controller config from /etc/hadoop/taskcontroller.cfg
main : command provided 0
main : user is cpenney
Good mapred-local-dirs are /tmp/hadoop/mapred
Can't open /tmp/hadoop/mapred/taskTracker/cpenney/jobcache/job_201309180916_0024/jobToken for output - File exists

2013-09-18 10:21:28,596 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201309180916_0024_m_000002_1:
java.io.IOException: Job initialization failed (255) with output: Reading task controller config from /etc/hadoop/taskcontroller.cfg
main : command provided 0
main : user is cpenney
Good mapred-local-dirs are /tmp/hadoop/mapred
Can't open /tmp/hadoop/mapred/taskTracker/cpenney/jobcache/job_201309180916_0024/jobToken for output - File exists

 at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:193)
 at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1323)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
 at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1298)
 at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1213)
 at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2568)
 at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
 at org.apache.hadoop.util.Shell.run(Shell.java:182)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
 at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:186)
 ... 8 more



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.