hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1036) task gets lost during assignment
Date Mon, 26 Feb 2007 18:28:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475936

Arun C Murthy commented on HADOOP-1036:

There are two reasons, which combined, result in this scenario:

a) TaskTracker.startNewTask() doesn't catch the 'RuntimeException' (only catches IOException)
which results in a failure to kill the task via TaskInProgress.killAndCleanup()

b) TaskTracker.startNewTask() adds the taskid & tip to 'runningTasks' before localizeJob
(which fails as function right as above) and thus the JobTracker gets the 'status' for the
non-existent task, removes it from ExpireLaunchingTasks's queue and is generally in a state
of bliss...

This issue can be solved either by fixing a) or b), I'd guess we want to fix the exception
part since it doesn't make sense to wait for the 10minute timeout for a task we already know
has failed to init...

> task gets lost during assignment
> --------------------------------
>                 Key: HADOOP-1036
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1036
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.11.2
>            Reporter: Owen O'Malley
>         Assigned To: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.12.0
> I ran a unit test (TestMRClassPath) that had a problem (likely in task initialization)
that cause one of the maps to get "lost". The job tracker had the task as "assigned" but the
task tracker did not know about it. It did not time out even after 30+ minutes.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message