hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-11) Cleanup JobHistory file naming to do with job recovery
Date Thu, 09 Jul 2009 13:01:16 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amar Kamat updated MAPREDUCE-11:
--------------------------------

    Attachment: MAPREDUCE-11-v1.8.patch

Attaching a patch that simplifies the job history filename and recovery. Changes are as follows
:
# job history filename is of the format _hostname_jobid_username_jobname_
# conf filenames are of the format _hostname_jobid_conf.xml_
# upon every restart all the new updates will be directed to _history-file.recover_
# once the job finishes the _history-file.recover_ file will be renamed to _history-file_
# note that the master file ( _hostname_jobid_username_jobname_) will exist throughout the
lifecycle of the job
# if the jobtracker restart again, new updates will be lost
# there is no searching involved in any case
# for now the old jobhistory files are supported via web-ui

Tested the patch locally and so far no issues. Result of test-patch 
[exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 9 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler
warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of
release audit warnings.


Running ant tests now and testing in progress.

Things I tested
# submitted a job allowed it to completed. New job files move to done folder. 
# submitted a job and killed the jobtracker  while job files was empty, restarted the jobtracker
and the files upon completion move to done folder
# submitted a job and killed the jobtracker  while job files was written, restarted the jobtracker
and the files upon completion move to done folder. job was also recovered
# checked webui
 ## history shows old and new files (there is no difference between the layout)
 ## history pages for old and new jobs have functional links (check random links and conf
links)
 ## search facility in history works across files 

> Cleanup JobHistory file naming to do with job recovery
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-11
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-11
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Devaraj Das
>         Attachments: MAPREDUCE-11-v1.8.patch
>
>
> The JobTracker uses the job history files for doing job recovery upon startup. To handle
cases where JobTracker goes down again while the recovered job is running, there is some logic
that plays with files and it ends up having two history files for some window of time during
the life of the job - actual history file, .recover file. The idea being that upon the next
restart we should be able to the maximal number of events for the job. It led to performance
problems in the job submission / recovery (part of which got addressed in HADOOP-4372). It
also looks pretty unlikely that a running job will traverse across multiple JT restarts. Even
if it did, without the .recover file, it'd only mean that we lose some tasks that got completed
in a subsequent restart. I propose that we remove the .recover file logic and base the recovery
on only the original job history file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message