hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaikannan Ramamoorthy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-4770) Hadoop jobs failing with FileNotFound Exception while the job is still running
Date Mon, 05 Nov 2012 02:38:12 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jaikannan Ramamoorthy updated MAPREDUCE-4770:
---------------------------------------------

    Affects Version/s: 0.20.203.0
    
> Hadoop jobs failing with FileNotFound Exception while the job is still running
> ------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4770
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4770
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.20.203.0
>            Reporter: Jaikannan Ramamoorthy
>
> We are having a strange issue in our Hadoop cluster. We have noticed that some of our
jobs fail with the with a file not found exception[see below]. Basically the files in the
"attempt_*" directory and the directory itself are getting deleted while the task is still
being run on the host. Looking through some of the hadoop documentation I see that the job
directory gets wiped out when it gets a KillJobAction however I am not sure why it gets wiped
out while the job is still running.
> My question is what could be deleting it while the job is running? Any thoughts or pointers
on how to debug this would be helpful.
> Thanks!
> java.io.FileNotFoundException: /hadoop/mapred/local_data/taskTracker//jobcache/job_201211030344_15383/attempt_201211030344_15383_m_000169_0/output/spill29.out
(Permission denied) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:120)
at org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107)
at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:400)
at org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205) at org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418) at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1692)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1322) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native
Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message