hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2175) Blacklisted hosts may not be able to serve map outputs
Date Sat, 22 Mar 2008 10:35:27 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581260#action_12581260
] 

Devaraj Das commented on HADOOP-2175:
-------------------------------------

I am not clear why you have the check in JobInProgress for doing lostTaskTracker outside the
addTrackerTaskFailure. You could do the check inside the method, right? 
Also, inside lostTaskTracker you check for whether the task was already FAILED/KILLED. Do
you need to do the check for KILLED? 
On the change to MiniMRCluster, I am not convinced that this is the right thing to do (wait
for 10 seconds and then giving up). 
On the TestLostBlackListedTracker, i don't think you need to make it that complicated. A simple
dummy split based map should work. In that case you don't have to change TestRackAwareTaskPlacement.
The way you get events is also not very reliable w.r.t timing. In the first call to getTaskCompletionEvents,
you might get events.length = 0. Isn't this a problem. I'd say that you wait for the job to
complete and then get the events.

> Blacklisted hosts may not be able to serve map outputs
> ------------------------------------------------------
>
>                 Key: HADOOP-2175
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2175
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>         Attachments: HADOOP-2175-v1.1.patch, HADOOP-2175-v1.patch
>
>
> After a node fails 4 mappers (tasks), it is added to blacklist thus it will no longer
accept tasks.
> But, it will continue serve the map outputs of any mappers that ran successfully there.

> However, the node may not be able serve the map outputs either. 
> This will cause the reducers to mark the corresponding map outputs as from slow hosts,

> but continue to try to get the map outputs from that node.
> This may lead to waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message