hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
Date Thu, 12 Jun 2014 18:49:04 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029597#comment-14029597
] 

Chris Nauroth commented on MAPREDUCE-5912:
------------------------------------------

+1 for this patch.

[~rusanu], [~curino] and [~chris.douglas], my understanding is that MAPREDUCE-5196 accidentally
introduced this bug, but this part of the change is not strictly necessary for the goals of
MAPREDUCE-5196.  Based on that, I'm in favor of committing this patch to revert just the part
of MAPREDUCE-5196 that caused the bug.  The alternative patch on the {{Path}} class posted
in HADOOP-10663 has some other potential side effects, so I prefer doing a localized fix here
in MR.  (I'll enter more details on HADOOP-10663.)

If in the future we want to revisit the idea of map outputs going somewhere different than
the local file system, then I think we'd need a different patch.  I think we'd want to make
sure that the map output's {{Path}} instance contains an explicit scheme, so that the code
here doesn't need to assume local vs. default vs. something else.

Can you let me know if you agree with committing this and not committing HADOOP-10663?  I'll
hold off on committing until I hear from one of you.

> Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5912
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Remus Rusanu
>            Assignee: Remus Rusanu
>             Fix For: 3.0.0
>
>         Attachments: MAPREDUCE-5912.1.patch
>
>
> {code}
> @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException {
>      if (isMapTask() && conf.getNumReduceTasks() > 0) {
>        try {
>          Path mapOutput =  mapOutputFile.getOutputFile();
> -        FileSystem localFS = FileSystem.getLocal(conf);
> -        return localFS.getFileStatus(mapOutput).getLen();
> +        FileSystem fs = mapOutput.getFileSystem(conf);
> +        return fs.getFileStatus(mapOutput).getLen();
>        } catch (IOException e) {
>          LOG.warn ("Could not find output size " , e);
>        }
> {code}
> causes Windows local output files to be routed through HDFS:
> {code}
> 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : java.lang.IllegalArgumentException: Pathname /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_000000_0/file.out
from c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_000000_0/file.out
is not a valid DFS filename.
>        at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
>        at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
>        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024)
>        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020)
>        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020)
>        at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
>        at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
>        at org.apache.hadoop.mapred.Task.done(Task.java:1048)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message