hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2960) A single TT disk failure can cause the job to fail
Date Tue, 29 Nov 2011 22:51:39 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159604#comment-13159604

Eli Collins commented on MAPREDUCE-2960:

You're right - the last comment is bogus (the JT was on a RO fs).

The earlier ones however are from just TTs running on loop-back mounts with faults injected,
and the JT was fine. On the 1st it looks like the issue is that the JobClient doesn't handle
errors getting task output, or when TT exceptions get plumbed back up to it. Though perhaps
per MAPREDUCE-3473 this is expected behavior given that *.failures.maxpercent defaults to
> A single TT disk failure can cause the job to fail
> --------------------------------------------------
>                 Key: MAPREDUCE-2960
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2960
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: tasktracker
>    Affects Versions:
>            Reporter: Eli Collins
> TaskInProgress#kill in the JT fails because TaskStatus#setFinishTimes fails because no
start time was set. There's no start time because TaskTracker#run (DefaultTaskController#initializeJob)
failed before it was set. The fix is to have TT#launchTask set the start time before it starts
the task runner, this way there's a valid start time even if TT#run fails.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message