hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Kunz (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-3333) job failing because of reassigning same tasktracker to failing tasks
Date Thu, 01 May 2008 17:26:55 GMT
job failing because of reassigning same tasktracker to failing tasks
--------------------------------------------------------------------

                 Key: HADOOP-3333
                 URL: https://issues.apache.org/jira/browse/HADOOP-3333
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.16.3
            Reporter: Christian Kunz
            Priority: Blocker


We are long running a job in a 2nd atttempt. Previous job was failing and current jobs risks
to fail as well, because  reduce tasks failing on marginal TaskTrackers are assigned repeatedly
to the same TaskTrackers (probably because it is the only available slot), eventually running
out of attempts.
Reduce tasks should be assigned to the same TaskTrackers at most twice, or TaskTrackers need
to get some better smarts to find  failing hardware.
BTW, mapred.reduce.max.attempts=12, which is high, but does not help in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message