hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer
Date Mon, 17 Dec 2012 23:38:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534432#comment-13534432

Siddharth Seth commented on MAPREDUCE-4842:

Asokan, one issue I can see with the patch - while a merge is in progress, every completed
fetch will end up generating a single element list for the merger - effectively getting written
out to it's own file. Once the initial merge nears completion - and the inputs are closed,
commitMemory will go back down and allow the next merge list to be larger. For bigger jobs
- this will likely hurt performance. Controlling number of files per merge-list as well as
potentially avoiding the last merge seem to be required. 
Also, there's a couple of exceptions from MergeThread.run during shutdown, which would need
to be addressed, if this approach is being taken.
Not sure about what causes the slightly improved performance (would expect it to be a little
worse in certain situations) - it does remove some of the synchronized checks on merger.isInProgress
and in the individual mergers - don't think that explains it though. Any thoughts on what
would explain the difference in performance ?

bq.  One idea I wanted to try is to change the patch to only trigger a merge after a merge
completes if we're convinced there are no outstanding fetchers that would trigger it later
(e.g.: only trigger if merge conditions are met and commitMemory == usedMemory, IIRC).
That could also prevent a last merge being written to disk - on completion of the last fetcher.
Right now, this seems to be dependent on that status of the merger and occupied memory.
> Shuffle race can hang reducer
> -----------------------------
>                 Key: MAPREDUCE-4842
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch,
MAPREDUCE-4842.patch, MAPREDUCE-4842.patch
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  It looked
similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told
to WAIT by the MergeManager but no merge was taking place.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message