hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mariappan Asokan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3685) There are some bugs in implementation of MergeManager
Date Sat, 09 Mar 2013 14:09:15 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597951#comment-13597951
] 

Mariappan Asokan commented on MAPREDUCE-3685:
---------------------------------------------

Hi Ravi,
  I looked at the {{Merger}} class a little deeper.  I think the optimization(for more parallelism)
I suggested is a bit aggressive in some cases.  For example, if you end up having only 101
files to merge(instead of 198) {{Merger}} will merge just 2 files in the first pass and then
merge 100 files for the final merge.  Now, if there is a genie that can tell us how many disk
files we will create during the course of shuffle/merge we can either opt to wait or kick
off the merge as soon as we reach the disk file count greater then {{io.sort.factor.}}  This
is something that can be explored later.  For example, if we know that the number of mappers
is huge compared to {{io.sort.factor}} and we do not have enough memory for large in-memory
merges we can opt for the optimization I suggested.

-- Asokan
                
> There are some bugs in implementation of MergeManager
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-3685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.1
>            Reporter: anty.rao
>            Assignee: anty
>            Priority: Critical
>             Fix For: 0.23.7, 2.0.5-beta
>
>         Attachments: MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch,
MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch,
MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch,
MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch,
MAPREDUCE-3685.patch, MAPREDUCE-3685.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message