hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1984) some reducer stuck at copy phase and progress extremely slowly
Date Mon, 12 Nov 2007 15:52:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541814
] 

Owen O'Malley commented on HADOOP-1984:
---------------------------------------

how about using:

4 ** n = 4, 16, 64, 256, 1024
sum = 1364s = ~22.5 minutes

That will allow a task tracker that is being pounded by a large cluster to catch up before
the reduce is killed.

Although those jumps are big enough that it probably makes sense to add enough randomness
to make the intervals overlap, so how about?

4**n * rand(0.5, 2.0)

which would let each round of backoffs meet the round before and after it.

> some reducer stuck at copy phase and progress extremely slowly
> --------------------------------------------------------------
>
>                 Key: HADOOP-1984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1984
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1984.patch
>
>
> In many cases, some reducers got stuck at copy phase, progressing extremely slowly.
> The entire cluster seems doing nothing. This causes a very bad long tails of otherwise
well tuned map/red jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message