hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3366) Shuffle/Merge improvements
Date Fri, 06 Jun 2008 17:43:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603113#action_12603113

Devaraj Das commented on HADOOP-3366:

Take this scenario: 
1) Merge is currently in progress. 
2) Shuffle-thread-1 has set stallShuffle=true but hasn't done mergePassComplete.wait() and
it got preempted 
3) All other Shuffle-threads will now block at shuffle.wait (the ramManager.notify won't have
effect since the merge is already in progress) 
4) Merge completes and invokes mergePassComplete.notifyAll. It then goes back to ramManager.wait.

5) Since shuffle-thread-1 hasn't done mergePassComplete.wait, the notification that the merge
thread sent is lost and all the shuffle threads will continue to wait for ever. 

Unless i am missing something, isn't the above a possible scenario? 

I think if the shuffle thread does not get memory from the ram manager and the merge thread
is waiting at that point we should do a merge to free up space irrespective of the criteria.
But I also think that it is an unlikely case that the merge isn't already in progress when
the memory request cannot be serviced. 

The patch I uploaded implements everything in ReduceTask.java (i forgot to mention that).
I didn't have to touch RamManager.java and i still think it is _simpler_ and does exactly
what we need (maybe a cleanup is required). If you remove the condition of forceMerge, it
will become simpler (but maybe a cleanup is required).

> Shuffle/Merge improvements
> --------------------------
>                 Key: HADOOP-3366
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3366
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.18.0
>         Attachments: 3366.1.patch, 3366.1.patch, 3366.reducetask.patch, HADOOP-3366_0_20080605.patch,
> This is intended to be a meta-issue to track various improvements to shuffle/merge in
the reducer.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message