hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mariappan Asokan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
Date Tue, 18 Dec 2012 20:20:13 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Mariappan Asokan updated MAPREDUCE-4842:

    Attachment: mapreduce-4842.patch

I updated patch.  All the changes are in {{MergeManager.}}  Here is the outline of changes:
* Eliminated the line
commitMemory -= size;
in {{unreserve()}} method.  Rationale: The complementary method {{reserve()}} only increments
{{usedMemory}} not {{commitMemory.}}  Besides, {{commitMemory}} is used only to decide when
we have enough shuffled map outputs in memory to trigger an in-memory merge.
* In {{closeInMemoryFile(),}} once an in-memory merge is submitted, {{commitMemory}} is set
back to 0.  Rationale: If any fetcher thread sneaks in(past the in-memory merge's wait because
in-memory merge has not started yet), it will be allowed to shuffle data to memory if memory
was freed by the in-memory merger.  The value of {{commitMemory}} will be incremented from
0 so that another merge will not be triggered unless the number of bytes of data shuffled
by sneaked-in threads is greater than or equal to {{mergeThreshold.}}  This will make sure
that we do not start a merge prematurely.
* Added initialization of {{usedMemory}} and {{commitMemory}} in the constructor(though this
is not needed as the java constructor zeros out these by default.)

Please test this patch for any performance regression.


-- Asokan

> Shuffle race can hang reducer
> -----------------------------
>                 Key: MAPREDUCE-4842
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch,
MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  It looked
similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told
to WAIT by the MergeManager but no merge was taking place.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message