hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4305) repeatedly blacklisted tasktrackers should get declared dead
Date Thu, 30 Oct 2008 11:57:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643944#action_12643944
] 

Amareshwari Sriramadasu commented on HADOOP-4305:
-------------------------------------------------

The problem if we do it at the TaskTracker is that it does not know if the application is
buggy.  So, buggy code can bring down all the tasktrackers.

Another approach is:
* JT blacklists TTs on per job basis as is today.
* When a successful job blacklist a tracker, we add the tracker to a potentially-faulty list.
For each tracker, the number of jobs that blacklisted it (#blacklists) will be maintained.
* The tracker is blacklisted across all jobs if #blacklists is X\% above the average #blacklists,
over all the trackers.
For example, In cluster with 100 trackers, 
||Tracker|| ||#BlackLists||
|TT1| |10|
|TT2| |10|
|TT3| |10|
|TT4| |7|
|TT5| |5|
With X=25\%, TT1, TT2 and TT3 will be blacklisted across all the jobs.
* We can reconsider the tracker after time, T, or when it restarts. 
* No more than 50% of the trackers can get blacklisted on the cluster.

Thoughts?


> repeatedly blacklisted tasktrackers should get declared dead
> ------------------------------------------------------------
>
>                 Key: HADOOP-4305
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4305
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Christian Kunz
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.20.0
>
>
> When running a batch of jobs it often happens that the same tasktrackers are blacklisted
again and again. This can slow job execution considerably, in particular, when tasks fail
because of timeout.
> It would make sense to no longer assign any tasks to such tasktrackers and to declare
them dead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message