hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "anty.rao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3685) There are some bugs in implementation of MergeManager
Date Sun, 06 Jan 2013 07:22:15 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545336#comment-13545336

anty.rao commented on MAPREDUCE-3685:

Ravi Prakash The patch looks good. One question: why pass along uncompressed size for the
new MapOutput ctor, shouldn't we be using compressed size so we get the smallest on-disk files
I agree on this.

One nit: we should use MergeManager.getDiskMapOutputs in OnDiskMerger.merge too... maybe MergeManager.getDiskMapOutputs
should just return Path[] and then we can fix MergeManager.finalMerge to use Path[] rather
than List<Path>. Thoughts?
MergeManager.finalMerge could be better to use List<Path>, b/c method finalMerge may
need change the contents of  List<Path>;if you use Path[], you have to create a new
Path[] to make the modification 

OnDiskMerge.merge can't use MergeManager.getDiskMapOutputs, b/c OnDiskMerge.merge will make
changes to MergeManager#onDiskMapOutputs according to its merge policy(e.g. mergeFactor)

I know these codes are ugly, but i can't think of a better way to fix it.Maybe we should use
List<Path> always, but there are many codes using Path[] already.

> There are some bugs in implementation of MergeManager
> -----------------------------------------------------
>                 Key: MAPREDUCE-3685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.1
>            Reporter: anty.rao
>            Assignee: anty
>            Priority: Critical
>         Attachments: MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch,
MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch,
MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message