hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5985) A single slow (but not dead) map TaskTracker impedes MapReduce progress
Date Mon, 08 Jun 2009 20:34:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717396#action_12717396
] 

Hong Tang commented on HADOOP-5985:
-----------------------------------

I don't think it changes the semantics. You were saying that all reducers should either see
output from map(A) or map(B), but not a mixture of both. But this is not the case even without
what I am suggesting. Today, a reducer may gets map output from map(A), then the TT that hosts
the output of map(A) dies, and all maps on that TT gets re-executed, and other reducers that
have not yet fetched output from map(A) will fetch from map(B).

> A single slow (but not dead) map TaskTracker impedes MapReduce progress
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-5985
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5985
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.3
>            Reporter: Aaron Kimball
>
> We see cases where there may be a large number of mapper nodes running many tasks (e.g.,
a thousand). The reducers will pull 980 of the map task intermediate files down, but will
be unable to retrieve the final intermediate shards from the last node. The TaskTracker on
that node returns data to reducers either slowly or not at all, but its heartbeat messages
make it back to the JobTracker -- so the JobTracker doesn't mark the tasks as failed. Manually
stopping the offending TaskTracker works to migrate the tasks to other nodes, where the shuffling
process finishes very quickly. Left on its own, it can take hours to unjam itself otherwise.
> We need a mechanism for reducers to provide feedback to the JobTracker that one of the
mapper nodes should be regarded as lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message