hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2942) Aggregated Log Files should be compacted
Date Wed, 10 Dec 2014 00:33:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240398#comment-14240398

Zhijie Shen commented on YARN-2942:

Thanks for the proposal, Robert! It seems that this proposal is trying to do something related
to YARN-2548, which is nice given the limitation of HDFS. Some random thoughts about the proposal:

1. We need to take care of long-running services too. It means that there may not be the moment
when all the logs have been uploaded. The logs are alway keeping being uploaded.

2. While compacting is happening, we need to be careful to let the log is continuously available
to the users.

3. The goal is to compact the log files in HDFS instead of NM local log files. So is RM the
better place to issue the command of compacting as it has better idea about app's lifecycle?

4. And there may be more problems if it's a long running service. One issue I can think of
now is that part of the pretty old logs has hit the retention threshold and should be deleted.
How do we delete part of the per-application log file? Will it be corrupt the index?

> Aggregated Log Files should be compacted
> ----------------------------------------
>                 Key: YARN-2942
>                 URL: https://issues.apache.org/jira/browse/YARN-2942
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 2.6.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: CompactedAggregatedLogsProposal_v1.pdf, YARN-2942-preliminary.001.patch
> Turning on log aggregation allows users to easily store container logs in HDFS and subsequently
view them in the YARN web UIs from a central place.  Currently, there is a separate log file
for each Node Manager.  This can be a problem for HDFS if you have a cluster with many nodes
as you’ll slowly start accumulating many (possibly small) files per YARN application.  The
current “solution” for this problem is to configure YARN (actually the JHS) to automatically
delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into one log file
per application.

This message was sent by Atlassian JIRA

View raw message