hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4360) Reducers hang in SHUFFLING phase due to duplicate completed tasks in TaskTracker.FetchStatus.allMapEvents
Date Tue, 07 Oct 2008 18:44:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637598#action_12637598
] 

Zheng Shao commented on HADOOP-4360:
------------------------------------

Arun, are there any changes to getCompletionEvents interface from 17 to trunk? Or are there
any changes in the plan?

If not, I will be glad to work on a diff to add the starting index to the JobTracker's reply.


> Reducers hang in SHUFFLING phase due to duplicate completed tasks in TaskTracker.FetchStatus.allMapEvents
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4360
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4360
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.2
>            Reporter: Zheng Shao
>
> On our cluster we have seen JobTracker went to a weird state that a lot of TaskTrackers
are getting duplicate entries in TaskTracker.FetchStatus.allMapEvents.
> Since TaskTracker fetches new completed map tasks using the size of the allMapEvents
as starting index, this prohibits the tasktracker from getting all completed map tasks. And
as a result, reducer just hangs in the shuffling status.
> The problem does not get fixed by killing and restarting TaskTracker, and when it happens
a lot of TaskTrackers will show the same problem.
> It seems some problems happen to the communication between JobTracker and TaskTracker.
> An easy preventive fix will be to include the starting index of the list of completed
map tasks from JobTracker to TaskTracker, so that TaskTracker can just throw away the data
if the starting index does not match the current size of the array.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message