hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2639) Reducers stuck in shuffle
Date Thu, 24 Jan 2008 19:42:35 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amar Kamat updated HADOOP-2639:
-------------------------------

    Attachment: HADOOP-2639.patch

Finally found the bug that causes this effect. HADOOP-2247 introduced a new strategy for killing
the maps i.e kill the map if {{(fetch-failure-notification/num-running-reduce-tasks) >
0.5}}. It seems that {{num-running-reduce-tasks}} can achieve negative value thus breaking
the overall strategy and stalling the whole job by not killing the maps. This is because the
reduce count is incremented if the TIP is not running and decremented for every task in the
TIP. Providing a patch that addresses this issue by incrementing the counter for every task
that gets scheduled. 

> Reducers stuck in shuffle
> -------------------------
>
>                 Key: HADOOP-2639
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2639
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sri Ramadasu
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2639.patch
>
>
> I started sort benchmark on 500 nodes. It has 40000 maps and 900 reducers.
> There are 11 reducers stuck in shuffle with 33% progress. I could see a node down which
ran 80 maps on it. And all these reducers are trying to fetch map output from that node. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message