hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2175) Blacklisted hosts may not be able to serve map outputs
Date Wed, 26 Mar 2008 15:51:25 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582325#action_12582325

Amar Kamat commented on HADOOP-2175:

The only concern is when all the maps that are yet to be fetched are from the same blacklisted
tracker. The reason being that each of the reducer will fetch one map per host. Hence killing
all the maps will take 
{{5min * num-maps-on-tracker/num-reducers}} in the best case and {{5min * num-maps-on-tracker}}
in the worst case assuming default config. 
Following are some of the tweaks 
1) Keep track of the total failures registered against the tracker (per job) and kill all
the maps for a job if the total failures for a job is greater than 25% .
2) Keep a set of unique hosts per job that have registered against a blacklisted tracker and
kill all the maps for a job if all the reducers have complained against the blacklisted tracker.

Currently we do similar stuff for killing a map based on fetch failures. We should do something
similar in case of trackers i.e re-schedule all the maps (per job maybe) in case of blacklisted
trackers. In future we may relax the condition of the tracker being blacklisted. Thoughts?

> Blacklisted hosts may not be able to serve map outputs
> ------------------------------------------------------
>                 Key: HADOOP-2175
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2175
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>             Fix For: 0.17.0
>         Attachments: HADOOP-2175-v1.1.patch, HADOOP-2175-v1.patch, HADOOP-2175-v2.patch,
> After a node fails 4 mappers (tasks), it is added to blacklist thus it will no longer
accept tasks.
> But, it will continue serve the map outputs of any mappers that ran successfully there.

> However, the node may not be able serve the map outputs either. 
> This will cause the reducers to mark the corresponding map outputs as from slow hosts,

> but continue to try to get the map outputs from that node.
> This may lead to waiting forever.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message