hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Kanter (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
Date Fri, 14 Feb 2014 02:45:25 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Robert Kanter updated MAPREDUCE-5641:

    Attachment: MAPREDUCE-5641.patch

I’ve attached a preliminary version of the patch.  Once we all agree on the specifics of
the design, I can add unit tests.  
The patch follows the design I outlined before where the RM will write a file when it sees
an AM die and the JHS see that and copies the jhist and similar files to the done_intermediate
dir.  I have tested this by running jobs and killing the AM.  This results in incomplete information,
as expected; however, in some cases some of the information won’t make 100% sense or is
missing (e.g. no Finish Time if the AM didn’t actually finish).  I’ve put in some code
to take care of these situations.  I’ve also attached a preliminary YARN patch to YARN-1731.

How will the JHS copy the file to the intermediate directory? It likely won't have access
to the staging directory containing the jhist file.
I modified the permissions from 0700 to 0701.

> History for failed Application Masters should be made available to the Job History Server
> -----------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-5641
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: applicationmaster, jobhistoryserver
>    Affects Versions: 2.2.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: MAPREDUCE-5641.patch
> Currently, the JHS has no information about jobs whose AMs have failed.  This is because
the History is written by the AM to the intermediate folder just before finishing, so when
it fails for any reason, this information isn't copied there.  However, it is not lost as
its in the AM's staging directory.  To make the History available in the JHS, all we need
to do is have another mechanism to move the History from the staging directory to the intermediate
directory.  The AM also writes a "Summary" file before exiting normally, which is also unavailable
when the AM fails.  

This message was sent by Atlassian JIRA

View raw message