hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2639) Reducers stuck in shuffle
Date Mon, 28 Jan 2008 18:53:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563245#action_12563245

Arun C Murthy commented on HADOOP-2639:

Ok, looks like there is some disconnect...

While implementing HADOOP-2247, I remember that the following code: 

    float failureRate = (float)fetchFailures / runningReduceTasks;

is to mean: _Check if there are too many currently running reduce TIPs are complaining about
this map._

I had a discussion with Amar and he clarified that he was considering the counter as _currently
running reduce task-attempts_ and not _currently running reduce TIPs_ as was the original
intention... and hence this debate/disconnect.

Given that, I think the right fix is for us to figure _why_ the *runningReduceTasks* counter
is wrongly turning -ve and fix the wrong decrement... 


> Reducers stuck in shuffle
> -------------------------
>                 Key: HADOOP-2639
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2639
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sri Ramadasu
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.16.0
>         Attachments: HADOOP-2639.patch
> I started sort benchmark on 500 nodes. It has 40000 maps and 900 reducers.
> There are 11 reducers stuck in shuffle with 33% progress. I could see a node down which
ran 80 maps on it. And all these reducers are trying to fetch map output from that node. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message