hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1278) Fix the per-job tasktracker 'blacklist'
Date Fri, 20 Apr 2007 06:38:15 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12490238

Owen O'Malley commented on HADOOP-1278:


> Fix the per-job tasktracker 'blacklist'
> ---------------------------------------
>                 Key: HADOOP-1278
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1278
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
> Today whenever a tracker is 'lost' all the jobs which ever ran on it are considered as
failures and added to the blacklist, which automatically ensures that the particular TT is
*never* considered for allocating new tasks unless *all* tasktrackers are on the list. This
results in an ugly situation where a majority of nodes in the cluster are on the blacklist
and hence idle, while the other TTs are maxed out.
> The proposal is two-fold:
> a) Don't count *all* tasks which ever ran on the TT, we can count it as a 'single' task
failure - which means that each 'lost' tracker results in a loss of 20% of the '5 failures
== blacklisted'  quota.
> b) Stop adding nodes to the blacklist when a certain percentage of the cluster, say 25%,
are already on the blacklist - adding more than that would just delay the inevitable i.e.
there is something horrendously wrong with the cluster - we might as well fail the job early
and noisily.
> Thoughts?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message