hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mariappan Asokan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer
Date Tue, 18 Dec 2012 17:46:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535094#comment-13535094

Mariappan Asokan commented on MAPREDUCE-4842:

Hi Jason, Thomas, and Siddharth,
  Thanks for running the tests and reporting your findings.  My patch was intended to eliminate
the race condition due to the {{isInProgress()}} method in {{MergeThread.}} One cannot check
the state of a thread and then take an action based on the state because the state might change
before the action is taken.  The state checking and action should be atomic.  So I came up
with a solution to get rid of that method.

I was not intending to change the existing logic on when an in-memory merge is triggered.
 Also, I was not expecting any performance improvement or degradation due to this change.
 There might be very little improvement in the overall performance due to the elimination
of 'synchronized' calls.  However, it simplifies the code.

Now going to Siddharth's comment:
Asokan, one issue I can see with the patch - while a merge is in progress, every completed
fetch will end up generating a single element list for the merger - effectively getting written
out to it's own file.
You are right that such a scenario is possible.  However, the fetcher thread will be waiting
in {{waitForInMemoryMerge()}} or it may get stalled map output.  This may mitigate the problem.
 I have an idea on how to eliminate this problem completely.  I will verify that it will work
and post it as part of the patch later.  It will be simple, I promise:)

Siddharth, you state:
Also, there's a couple of exceptions from MergeThread.run during shutdown, which would need
to be addressed, if this approach is being taken.
Can you describe a scenario when this might be a problem?  We can address that too.

Once again, thanks to all of you.

-- Asokan

> Shuffle race can hang reducer
> -----------------------------
>                 Key: MAPREDUCE-4842
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch,
MAPREDUCE-4842.patch, MAPREDUCE-4842.patch
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  It looked
similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told
to WAIT by the MergeManager but no merge was taking place.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message