hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4933) MR1 final merge asks for length of file it just wrote before flushing it
Date Fri, 11 Jan 2013 00:24:13 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550596#comment-13550596
] 

Alejandro Abdelnur commented on MAPREDUCE-4933:
-----------------------------------------------

The patch looks right to me even if it is not losing data in practice. Bobby, would you agree?

                
> MR1 final merge asks for length of file it just wrote before flushing it
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4933
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4933
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1, task
>    Affects Versions: 1.1.1
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>            Priority: Blocker
>         Attachments: MAPREDUCE-4933-branch-1.patch
>
>
> createKVIterator in ReduceTask contains the following code:
> {code}
>           try {
>             Merger.writeFile(rIter, writer, reporter, job);
>             addToMapOutputFilesOnDisk(fs.getFileStatus(outputPath));
>           } catch (Exception e) {
>             if (null != outputPath) {
>               fs.delete(outputPath, true);
>             }
>             throw new IOException("Final merge failed", e);
>           } finally {
>             if (null != writer) {
>               writer.close();
>             }
>           }
> {code}
> Merger#writeFile() does not close the file after writing it, so when fs.getFileStatus()
is called on it, it may not return the correct length.  This causes bad accounting further
down the line, which can lead to map output data being lost.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message