hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer
Date Tue, 05 Mar 2013 23:52:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594110#comment-13594110

Ravi Prakash commented on MAPREDUCE-4842:

Hi Mariappan,

bq. This is a tangent to point 1. The mergeFactor is set to the configured value for IntermediateMemoryToMemoryMerger
but to Integer.MAX_VALUE for InMemoryMerger and OnDiskMerger. We have to find out the rationale
behind these choices.

Thanks for all your work on the MergeManager. It is soooooo much cleaner now! Thanks much.

Anyway, since you have been in this area of the code, I was wondering if you could please
review MAPREDUCE-3685? The mergeFactor for the OnDiskMerger was wrong. For inMemoryMerger
it seems to be correct (because io.sort.factor is defined as "The number of streams to merge
at once while sorting files. This determines the number of open file handles."). Besides I
wonder if we want to really go into the level of detail of the number of fetched cache lines
and not just simplify by assuming constant access to all memory. Please consider continuing
the discussion there.


> Shuffle race can hang reducer
> -----------------------------
>                 Key: MAPREDUCE-4842
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Mariappan Asokan
>            Priority: Blocker
>             Fix For: 2.0.3-alpha, 0.23.6
>         Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, mapreduce-4842.patch,
mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch,
MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  It looked
similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told
to WAIT by the MergeManager but no merge was taking place.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message