hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-400) the job tracker re-runs failed tasks on the same node
Date Thu, 03 Aug 2006 19:10:15 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-400?page=all ]

Owen O'Malley updated HADOOP-400:

           Status: Patch Available  (was: Open)
    Fix Version/s: 0.6.0
       Attachment: task-schedule.patch

This patch does:
  1. It limits each TaskTracker to running
        min(tasksPerTracker, ceil(tasksLeftToRun/numTaskTrackers))
      this will prevent the problem that we saw where the last 2 reduces scheduled were put
on the same node rather than different empty ones
  2. It refactors obtainNewMapTask and obtainNewReduceTask to call a common utility function.
It also replaces the two parallel loops with one.
  3. Only allowed tasks that have failed on this task tracker to run if we have exhausted
the cluster.

> the job tracker re-runs failed tasks on the same node
> -----------------------------------------------------
>                 Key: HADOOP-400
>                 URL: http://issues.apache.org/jira/browse/HADOOP-400
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.4.0
>            Reporter: Owen O'Malley
>         Assigned To: Owen O'Malley
>             Fix For: 0.6.0
>         Attachments: task-schedule.patch
> The job tracker tries not to run tasks that have previously failed on a node on that
node again, but it doesn't strictly prevent it.
> I propose to change the rule so that when pollForNewTask is called by a TaskTracker,
the JobTracker will only assign it a task that has failed on that TaskTracker, if and only
if it has already failed on the entire cluster. Thus, for "normal" clusters with more than
4 TaskTrackers, you will be guaranteed that it will run on 4 different TaskTrackers. For small
clusters, it will run on every TaskTracker in the cluster at least once.
> Does that sound reasonable to everyone?

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message