hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4933) MR1 final merge asks for length of file it just wrote before flushing it
Date Thu, 10 Jan 2013 23:02:12 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550547#comment-13550547
] 

Sandy Ryza commented on MAPREDUCE-4933:
---------------------------------------

The background to coming across this was trying to apply MAPREDUCE-2264, which caused a bunch
test failures for me due to missing map data.  MAPREDUCE-2264 annotates the FileStatus with
extra data, but doesn't change the timing of writer.close()/getFileStatus().  The code has
been around since 2008, and must be working, so I would not be surprised if the map data loss
is a false alarm, but at least for MAPREDUCE-2264's sake, it should be fixed nonetheless.

                
> MR1 final merge asks for length of file it just wrote before flushing it
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4933
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4933
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1, task
>    Affects Versions: 1.1.1
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>            Priority: Blocker
>         Attachments: MAPREDUCE-4933-branch-1.patch
>
>
> createKVIterator in ReduceTask contains the following code:
> {code}
>           try {
>             Merger.writeFile(rIter, writer, reporter, job);
>             addToMapOutputFilesOnDisk(fs.getFileStatus(outputPath));
>           } catch (Exception e) {
>             if (null != outputPath) {
>               fs.delete(outputPath, true);
>             }
>             throw new IOException("Final merge failed", e);
>           } finally {
>             if (null != writer) {
>               writer.close();
>             }
>           }
> {code}
> Merger#writeFile() does not close the file after writing it, so when fs.getFileStatus()
is called on it, it may not return the correct length.  This causes bad accounting further
down the line, which can lead to map output data being lost.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message