hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6258) add support to back up JHS files from application master
Date Fri, 13 Feb 2015 01:31:11 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319387#comment-14319387

Hadoop QA commented on MAPREDUCE-6258:

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  against trunk revision 99f6bd4.

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified
test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:red}-1 eclipse:eclipse{color}.  The patch failed to build with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5191//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5191//console

This message is automatically generated.

> add support to back up JHS files from application master
> --------------------------------------------------------
>                 Key: MAPREDUCE-6258
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6258
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: applicationmaster
>    Affects Versions: 2.4.1
>            Reporter: Jian Fang
>         Attachments: MAPREDUCE-6258.patch
> In hadoop two, job history files are stored on HDFS with a default retention period of
one week. In a cloud environment, these HDFS files are actually stored on the disks of ephemeral
instances that could go away once the instances are terminated. Users may want to back up
the job history files for issue investigation and performance analysis before and after the
cluster is terminated. 
> A centralized backup mechanism could have a scalability issue for big and busy Hadoop
clusters where there are probably tens of thousands of jobs every day. As a result, it is
preferred to have a distributed way to back up the job history files in this case. To achieve
this goal, we could add a new feature to back up the job history files in Application master.
More specifically, we could copy the job history files to a backup path when they are moved
from the temporary staging directory to the intermediate_done path in application master.
Since application masters could run on any slave nodes on a Hadoop cluster, we could achieve
a better scalability by backing up the job history files in a distributed fashion.
> Please be aware, the backup path should be managed by the Hadoop users based on their
needs. For example, some Hadoop users may copy the job history files to a cloud storage directly
and keep them there forever. While some other users may want to store the job history files
on local disks and clean them up from time to time.

This message was sent by Atlassian JIRA

View raw message