hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Kunz (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2220) Reduce tasks fail too easily because of repeated fetch failures
Date Fri, 21 Dec 2007 20:24:43 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Christian Kunz updated HADOOP-2220:
-----------------------------------

    Fix Version/s:     (was: 0.15.2)
                   0.16.0

> Reduce tasks fail too easily because of repeated fetch failures
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2220
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2220
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Christian Kunz
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.16.0
>
>
> Currently reduce tasks with more than MAX_FAILED_UNIQUE_FETCHES (= 5 hard-coded) failures
to fetch output from different mappers will fail (I believe, introduced in HADOOP-1158)
> This gives us some problems with longer running jobs with a large number of mappers in
multiple waves:
> Otherwise problem-less reduce tasks fail because of too many fetch failures due to resource
contention, and new reduce tasks have to fetch all data from the already successfully executed
mappers, introducing a lot of additional IO overhead. Also, the job will fail when the same
reducer exhausts the maximum number of attempts.
> The limit should be a function of number of mappers and/or waves of mappers, and should
be more conservative (e.g. no need to let them fail when there are enough slots to start speculatively
executed reducers and speculative execution is enabled). Also, we might consider to not count
such a restart against the number of attempts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message