hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kang Xiao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2171) job recovery mechanism
Date Tue, 02 Nov 2010 09:36:26 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927341#action_12927341

Kang Xiao commented on MAPREDUCE-2171:

The job recovery mechanism is targeted to solve three kinds of problem:

# If a long running job fails, it  has to be re-submitted as a total new job and all tasks
including succeededones have to be re-executed
# If we update a cluster to a new hadoop version, all running jobs need to re-run.
# If we restart a tasktracker, all running tasks and succeededmaps need to be re-executed.

RecoveryManager of JobTracker solves some part of problem 2. However it just automatically
re-run all running jobs, all succeededtasks still need to be re-executed.

> job recovery mechanism
> ----------------------
>                 Key: MAPREDUCE-2171
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2171
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, tasktracker
>            Reporter: Kang Xiao
> A job recovery mechanism to enable a job to re-execute only failed task upon job failed
or jobtracker/tasktracker restart.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message