hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-873) Simplify Job Recovery
Date Thu, 27 Aug 2009 03:06:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748244#action_12748244

Hadoop QA commented on MAPREDUCE-873:

+1 overall.  Here are the results of testing the latest attachment 
  against trunk revision 808082.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 21 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/522/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/522/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/522/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/522/console

This message is automatically generated.

> Simplify Job Recovery
> ---------------------
>                 Key: MAPREDUCE-873
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-873
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.20.1
>            Reporter: Devaraj Das
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>         Attachments: 873_v1.patch, 873_v2.patch, 873_v3.patch
> On a couple of occasions we have seen the JobTracker not being able to handle job recovery
well, and leading to cluster downtime after a restart. The current design for handling job
recovery is complex and prone to corner cases not being handled well enough. In retrospect,
it seems like the transaction log based approach as was proposed on HADOOP-3245 (http://tinyurl.com/luh9hb),
would have been a better/simpler model. However, that is a big project, and it seems for the
medium term, just handling job re-submissions after a restart is a good tradeoff. That is,
the JobTracker after getting restarted, will resubmit all jobs that were running in its past
life. They will all start from the beginning (downside is completed tasks will reexecute).
In the long term, the transaction log model or some variant of that should be pursued.
> Thoughts/comments welcome.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message