hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5211) Reducer intermediate files can collide during merge
Date Mon, 06 May 2013 22:12:16 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe updated MAPREDUCE-5211:
----------------------------------

    Attachment: MAPREDUCE-5211.branch-0.23.patch

Patch for branch-0.23 that adds the reduce task attempt ID to the output path along with an
increasing sequence number to keep the output files from colliding.

No unit test, but manually tested to verify output paths for on disk merges are emitted properly.
                
> Reducer intermediate files can collide during merge
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-5211
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5211
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.7
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-5211.branch-0.23.patch
>
>
> The OnDiskMerger.merge method constructs an output path that is not unique to a reduce
attempt, and as a result can result in a file collision with other reducers from the same
app that are running on the same node.  In addition the name of the output file is based on
MapOutput.toString which may not be unique in light of multi-pass merges on disk since the
mapId will be null and the basename ends up as "MapOutput(null, DISK)"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message