hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1984) some reducer stuck at copy phase and progress extremely slowly
Date Thu, 15 Nov 2007 15:16:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542794
] 

Owen O'Malley commented on HADOOP-1984:
---------------------------------------

Runping, it doesn't get to 10 minutes until it has failed 5 times. And it can easily take
10 minutes to clear a backlog off of a task tracker that is getting slammed. I certainly have
seen jobs that longer than that to work off the backlog. I still maintain that a simple exponential
back off is the right approach, because there are a lot of things that could have caused the
slow down. 

Devaraj, please don't change the failure notification policy in this same bug. If it needs
to be changed, it should be a different issue. Just changing the default number of retries
in this issue is ok, but I don't think we should change the policy for *that* in this issue.
Furthermore, if we do change the policy, I'd argue for something much more direct and say
that if a tracker is black listed for a job, the number of retries should be cut in half or
something.

> some reducer stuck at copy phase and progress extremely slowly
> --------------------------------------------------------------
>
>                 Key: HADOOP-1984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1984
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1984.patch
>
>
> In many cases, some reducers got stuck at copy phase, progressing extremely slowly.
> The entire cluster seems doing nothing. This causes a very bad long tails of otherwise
well tuned map/red jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message