hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mayank Bansal (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
Date Fri, 02 Mar 2012 20:43:57 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221253#comment-13221253

Mayank Bansal commented on MAPREDUCE-3837:

Hi Alejandro

Thanks for your help testing this patch, I am really sorry about confusion as I missed one
function in the patch.  I have attached the new patch , tested it and it is working fine in
my local environment. I am not sure how I missed that before.

Please let me know if you find any more issues with that.


I believe the issues were in terms of recovering the jobs from the point they crashed. Here
what I am doing is very simplistic approach. I am reading the job token file and resubmitting
the jobs in case of crash and recover. I am not trying to recover from the point it left from
the last run.

In this scenario it is a new run of the job and works well. The downside is the whole job
will re run however the upside is Users don't need to resubmit the jobs.

Please let me know your thoughts.

> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user
can submit job.
> --------------------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch,
> If job tracker is crashed while running , and there were some jobs are running , so if
job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover
the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes
its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message