hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaikannan Ramamoorthy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-4770) Hadoop jobs failing with FileNotFound Exception while the job is still running
Date Mon, 05 Nov 2012 02:20:12 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jaikannan Ramamoorthy updated MAPREDUCE-4770:
---------------------------------------------

    Description: 
We are having a strange issue in our Hadoop cluster. We have noticed that some of our jobs
fail with the with a file not found exception[see below]. Basically the files in the "attempt_*"
directory and the directory itself are getting deleted while the task is still being run on
the host. Looking through some of the hadoop documentation I see that the job directory gets
wiped out when it gets a KillJobAction however I am not sure why it gets wiped out while the
job is still running.

My question is what could be deleting it while the job is running? Any thoughts or pointers
on how to debug this would be helpful.

Thanks!

java.io.FileNotFoundException: /hadoop/mapred/local_data/taskTracker//jobcache/job_201211030344_15383/attempt_201211030344_15383_m_000169_0/output/spill29.out
(Permission denied) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:120)
at org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107)
at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:400)
at org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205) at org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418) at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1692)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1322) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native
Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)

    
> Hadoop jobs failing with FileNotFound Exception while the job is still running
> ------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4770
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4770
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>            Reporter: Jaikannan Ramamoorthy
>
> We are having a strange issue in our Hadoop cluster. We have noticed that some of our
jobs fail with the with a file not found exception[see below]. Basically the files in the
"attempt_*" directory and the directory itself are getting deleted while the task is still
being run on the host. Looking through some of the hadoop documentation I see that the job
directory gets wiped out when it gets a KillJobAction however I am not sure why it gets wiped
out while the job is still running.
> My question is what could be deleting it while the job is running? Any thoughts or pointers
on how to debug this would be helpful.
> Thanks!
> java.io.FileNotFoundException: /hadoop/mapred/local_data/taskTracker//jobcache/job_201211030344_15383/attempt_201211030344_15383_m_000169_0/output/spill29.out
(Permission denied) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:120)
at org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107)
at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:400)
at org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205) at org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418) at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1692)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1322) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native
Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message