hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jothi Padmanabhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-956) Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)
Date Wed, 09 Sep 2009 16:23:57 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753149#action_12753149

Jothi Padmanabhan commented on MAPREDUCE-956:

True, we do have a final merge before feeding the reducer. However, assigning 33% of progress
for this one final merge does not seem to be correct.  In cases where the number of files
at that time is < io.sort.factor, this final merge does not even occur, we start feeding
the reducer straight away. Also, since we have merges happening during shuffle phase as well,
I was just proposing that we delineate  as
Shuffle (50%)
Final Merge + Reduce (50%)

> Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)
> --------------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-956
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-956
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.21.0
>            Reporter: Jothi Padmanabhan
> For the progress calculations and displaying on the UI, shuffle, in its current form,
 is decomposed into three phases (copy/sort/reduce). Actually, the sort phase is no longer
applicable. I think we should just reduce the number of phases to two and assign 50% weight-age
to each of copy and reduce phases. Thoughts?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message