hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip Su (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3638) Yarn trying to download cacheFile to container but Path is a local file
Date Mon, 30 Jan 2012 21:23:10 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196427#comment-13196427
] 

Philip Su commented on MAPREDUCE-3638:
--------------------------------------

I did some more follow up testing on this and I think I know more precisely where the problem
is. 

1) The failure occurs when running a streaming job with the -cacheFile option on a local file
system using file:///<path>. 
2) I ran hdfs dfs -ls file:///<path> to make sure the file exists. 
3) I ran the same streaming job using the same value from 1). But instead of using the deprecated
-cacheFile option, I used -files instead. The job ran and passed. 

So is seems when running the streaming job using the deprecated option -cacheFile on a local
file system, it is not getting the correct file permission on it. 

                
> Yarn trying to download cacheFile to container but Path is a local file
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3638
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3638
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Thomas Graves
>            Assignee: Mahadev konar
>
> It looks like the AM, which is running on
> host1.com, is trying to access a local file but the file is on host2.com
> (where the command was run).
> ran:
> hadoop --config conf/hadoop/ 
> jar hadoop-streaming.jar          -Dmapreduce.job.acl-view-job=*   
>       -input Streaming/streaming-610/input.txt           -mapper 'xargs cat'        
  -reducer cat          -output
> Streaming/streaming-610/Output          -cacheFile
> file://Streaming/data/streaming-610//InputFile#testlink
>          -jobconf mapred.map.tasks=1           -jobconf mapred.reduce.tasks=1       
  -jobconf
> mapred.job.name=streamingTest-610          -jobconf mapreduce.job.acl-view-job=*
> failure:
> 11/11/10 07:48:06 INFO mapreduce.Job: Job job_1320887371559_0215 failed with state FAILED
due to: Application
> application_1320887371559_0215 failed 1 times due to AM Container for appattempt_1320887371559_0215_000001
exited with 
> exitCode: -1000 due to: java.io.FileNotFoundException: File
> file:/Streaming/data/streaming-610/InputFile
> does not exist
>         at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:431)
>         at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:315)
>         at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:85)
>         at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:152)
>         at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message