hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3366) Shuffle/Merge improvements
Date Fri, 06 Jun 2008 15:45:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603080#action_12603080

Arun C Murthy commented on HADOOP-3366:

bq. 0) The inMem merge thread needs to ignore the criteria when the shuffle thread notifies
it to do a forced merge.
The shuffle thread cannot _force_ a merge, the merge is still triggered only when the various
criteria are satisfied - the notification only wakes up the possibly waiting merge-thread.

bq. 1) A race condition exists in the interval between the ramManager.notify and mergePassComplete.wait()
calls in getMapOutput. What could happen is that the ramManager gets notified and it finishes
the merge before this thread calls mergePassComplete.wait(). If this happens the notification
from the merger is lost and this thread will just wait ...
Even if that notification is lost, the merge-thread always allows the shuffle threads to progress
by doing a 'mergeProgress.notifyAll' _before_ it sleeps - thereby preventing any deadlocks.

2) The handshake between the merger, copier and the ramManager looks complex and there could
be more race conditions like the one i pointed above. I and Sharad had a quick discussion
and we feel it can be simplified.
Have the ramManager.reserve lock the thread if the request cannot be satisfied
Have the ramManager.unreserve do a notifyAll (this the mergeThread does)
Have the shuffle thread notify the mergeThread (before it goes to wait) 

I agree, it is complicated. However it has a couple of important points:
1. RamManager.reserve cannot lock the thread without closing the http connection, doing so
would leak the shuffle into the RamManager where you'd have to pass the HTTP input-stream
to RamManager.reserve.
2. It is much better to do a 'notifyAll' on the shuffle threads when the _merge_ is complete,
so one is reasonably sure that _all_ shuffle threads can progress. Doing it in RamManager.unreserve
would let only one shuffle thread through at a time and the contention for the lock would
be very high - every thread will wake up, and get lockedup inside the RamManager again.

> Shuffle/Merge improvements
> --------------------------
>                 Key: HADOOP-3366
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3366
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.18.0
>         Attachments: 3366.1.patch, 3366.1.patch, 3366.reducetask.patch, HADOOP-3366_0_20080605.patch,
> This is intended to be a meta-issue to track various improvements to shuffle/merge in
the reducer.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message