hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-935) There's little to be gained by putting a host into the penaltybox at reduce time, if its the only host you have
Date Fri, 28 Aug 2009 21:21:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748984#action_12748984
] 

Steve Loughran commented on MAPREDUCE-935:
------------------------------------------

Looking at the thread dump, it also looks like the exponential backoff feature in the {{Client.handleConnectFailure()}}
is interfering with heartbeats. A failure to connect to the server is triggering backoff,
stopping progress from being reported. 

> There's little to be gained by putting a host into the penaltybox at reduce time, if
its the only host you have
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-935
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-935
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Steve Loughran
>
> Exponential backoff may be good for dealing with troublesome hosts, but not if you only
have one host in the entire system. From the log of {{TestNodeRefresh}}, which for some reason
is blocking in the reduce phase, I can see it doesn't take much for the backoff to kick in
so rapidly that the reducer is waiting for longer than the test
> {code}
> 2009-08-28 21:39:16,788 WARN  mapred.ReduceTask (ReduceTask.java:fetchOutputs(2192))
- attempt_20090828213826033_0001_r_000000_0 adding host localhost to penalty box, next contact
in 150 seconds
> {code}
> The result of this backoff process is that the reduce process ends up appearing to hang,
getting killed from above. 
> Note that this isn't the root cause of the problem, but it certainly amplifies things.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message