hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-4305) repeatedly blacklisted tasktrackers should get declared dead
Date Thu, 13 Nov 2008 11:49:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amareshwari Sriramadasu updated HADOOP-4305:
--------------------------------------------

    Attachment: patch-4305-1.txt

Here is a patch with proposed fix.
The patch does the following:
*  Adds the blacklisted trackers of the job to the potentially faulty list, in JobTracker.finalizeJob()
*  The tracker is moved to blacklisted trackers (across jobs) from potentially faulty list
iff
   ** #blacklists  exceed mapred.max.tracker.blacklists (default value is 4),
   **  #blacklists is 50% above the average #blacklists, over the active and potentially faulty
trackers
   **  50% the cluster is not blacklisted yet
* Restarting the tracker makes it an active tracker
* After a day, the tarcker is given a chance again to run tasks
* Adds #blacklisted_trackers to ClusterStatus
* Updates web UI to show the blacklisted trackers.


> repeatedly blacklisted tasktrackers should get declared dead
> ------------------------------------------------------------
>
>                 Key: HADOOP-4305
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4305
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Christian Kunz
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.20.0
>
>         Attachments: patch-4305-1.txt
>
>
> When running a batch of jobs it often happens that the same tasktrackers are blacklisted
again and again. This can slow job execution considerably, in particular, when tasks fail
because of timeout.
> It would make sense to no longer assign any tasks to such tasktrackers and to declare
them dead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message