hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4068) JobTracker might wrongly log a tip as failed
Date Thu, 04 Sep 2008 22:25:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628488#action_12628488
] 

Owen O'Malley commented on HADOOP-4068:
---------------------------------------

There used to be code that prevented this. TIPs should not fail unless all of the instances
have failed. At some point, we really should redesign the state tracking code in the JobTracker.

> JobTracker might wrongly log a tip as failed
> --------------------------------------------
>
>                 Key: HADOOP-4068
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4068
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>
> Consider the following case
> 1) attempt _attempt_1_0_ from tip _tip_1_ that ran on tracker _tracker_1_ failed
> 2) jobtracker will mark _attempt_1_0_ for removal under _tracker_1_. Marking basically
means removal of the mapping _tracker_1_->_attempt_1_0_
> 3) Marked attempts are removed only on next heartbeat from _tracker__1 or when _tracker_1_
is lost.
> 4) Consider a case where _tracker_1_ goes down.
> 5) In the meanwhile attempt _attempt_1_1_ succeeds on _tracker_2_ and the jobtracker
marks the tip _tip_1_ as complete
> 6) Now the expiry-tracker thread detect that _tracker_1_ is lost and fails all the attempt
under _tracker_1_. 
> 7) Here the jobtracker will kill _attempt_1_0_ *again* and log tip _tip_1_ as failed
in the history although tip _tip_1_ is really complete/succeeded.
> The events in the history file would be something like
> {noformat}
> tip_1 start
> ---------
> attempt_1_0 start
> attempt_1_0 failed
> ---------
> attempt_1_1 start
> attempt_1_1 finished
> tip_1 finished
> ---------
> tip_1 failed
> {noformat}
> Note that this true even for tasks that expire. Tasks that are scheduled and never come
back are killed by the {{ExpireLaunchingTasks}} thread. It will also call {{JobInProgress.failedTask()}}
which will fail the attempt and log the TIP as failed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message