hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang Wei (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6283) MRHistoryServer log files management optimization
Date Mon, 23 Mar 2015 07:19:11 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhang Wei updated MAPREDUCE-6283:
---------------------------------
    Description: 
In some heavy computation clusters, user may continually submit lots of jobs, in our scenario,
there are 240k jobs per day. On average, 5 nodes will participate in running a job. All these
job's log file will be aggregated on the hdfs. That is a big load for namenode. The total
number of generated log files in the default cleaning period (1 week) can be calculated as
follows:
AM logs per week: 7 days * 240,000 jobs/day * 2 files/job = 3360,000 files
App logs per week: 7 days * 240,000 jobs/day * 5 nodes/job * 1 file/node = 8400,000 files
There will be more than 10 million log files generated in one week. Even worse, some environments
have to keep the logs for potential issues tracking for longer time. In general, these small
log files will occupy about 12G heap size of Namenode, and impact the response speed of Namenode.

For optimizing the log management of history server, the main goals are:
1)	Reduce the total count of files in HDFS.
2)	Compatible with the former history server operation.

As per the goals above, we can mine the detail demands as follows: 
1)	Merge log files into bigger ones in HDFS periodically.
2)	Optimized design should inherits from the original architecture to make the merged logs
transparent to be browsed.
3)	Merged logs should be aged periodically just like the common logs.

The whole  life cycle of the AM logs:
1.Created by Application Master in intermediate-done-dir.
2.Moved to done-dir after the job is done.
3.Archived to archived-dir  periodically.
4.Cleaned when all the logs in harball are expired.

The whole  life cycle of the App logs:
1.Created by Applications in local-dirs.
2.Aggregated to remote-app-log-dir after the job is done.
3.Archived to archived-dir  periodically.
4.Cleaned when all the logs in harball are expired. 

  was:
In some heavy computation clusters, user may continually submit lots of jobs, in our scenario,
there are 240k jobs per day. On average, 5 nodes will participate in running a job. All these
job's log file will be aggregated on the hdfs. That is a big load for namenode. The total
number of generated log files in the default cleaning period (1 week) can be calculated as
follows:
AM logs per week: 7 days * 240,000 jobs/day * 2 files/job = 3360,000 files
App logs per week: 7 days * 240,000 jobs/day * 5 nodes/job * 1 file/node = 8400,000 files
There will be more than 10 million log files generated in one week. Even worse, some environments
have to keep the logs for potential issues tracking for longer time. In general, these small
log files will occupy about 12G heap size of Namenode, and impact the response speed of Namenode.

For optimizing the log management of history server, the main goals are:
1)	Reduce the total count of files in HDFS.
2)	Compatible with the former history server operation.

As per the goals above, we can mine the detail demands as follows: 
1)	Merge log files into bigger ones in HDFS periodically.
2)	Optimized design should inherits from the original architecture to make the merged logs
transparent to be browsed.
3)	Merged logs should be aged periodically just like the common logs.


> MRHistoryServer log files management optimization
> -------------------------------------------------
>
>                 Key: MAPREDUCE-6283
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6283
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobhistoryserver
>            Reporter: Zhang Wei
>            Assignee: Zhang Wei
>            Priority: Minor
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> In some heavy computation clusters, user may continually submit lots of jobs, in our
scenario, there are 240k jobs per day. On average, 5 nodes will participate in running a job.
All these job's log file will be aggregated on the hdfs. That is a big load for namenode.
The total number of generated log files in the default cleaning period (1 week) can be calculated
as follows:
> AM logs per week: 7 days * 240,000 jobs/day * 2 files/job = 3360,000 files
> App logs per week: 7 days * 240,000 jobs/day * 5 nodes/job * 1 file/node = 8400,000 files
> There will be more than 10 million log files generated in one week. Even worse, some
environments have to keep the logs for potential issues tracking for longer time. In general,
these small log files will occupy about 12G heap size of Namenode, and impact the response
speed of Namenode.
> For optimizing the log management of history server, the main goals are:
> 1)	Reduce the total count of files in HDFS.
> 2)	Compatible with the former history server operation.
> As per the goals above, we can mine the detail demands as follows: 
> 1)	Merge log files into bigger ones in HDFS periodically.
> 2)	Optimized design should inherits from the original architecture to make the merged
logs transparent to be browsed.
> 3)	Merged logs should be aged periodically just like the common logs.
> The whole  life cycle of the AM logs:
> 1.Created by Application Master in intermediate-done-dir.
> 2.Moved to done-dir after the job is done.
> 3.Archived to archived-dir  periodically.
> 4.Cleaned when all the logs in harball are expired.
> The whole  life cycle of the App logs:
> 1.Created by Applications in local-dirs.
> 2.Aggregated to remote-app-log-dir after the job is done.
> 3.Archived to archived-dir  periodically.
> 4.Cleaned when all the logs in harball are expired. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message