hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
Date Thu, 02 Apr 2015 23:16:55 GMT

    [ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393700#comment-14393700

Karthik Kambatla commented on YARN-2942:

(Canceled the patch to stop Jenkins from evaluating the design doc :) ) 

[~rkanter] - thanks for updating the design doc. A couple of comments:
# If there is an NM X actively concatenating its logs and NM Y can't acquire the lock, what
## Does it do a blocking-wait? If yes, this should likely be in a separate thread.
## I would like for it to be non-blocking. How about a LogConcatenationService in the NM?
This service is brought up if you enable log concatenation. This service would periodically
go through all of its past aggregated logs and concatenate those that it can acquire a lock
for. Delayed concatenation should be okay because we are doing this primarily to handle the
problem HDFS has with small files. Also, this way, we don't have do anything different for
NM restart. Forward looking, this concat service could potentially take input on how busy
HDFS is. 
# I didn't completely understand the point about a config to specify the format. Are you suggesting
we have two different on/off configs - one to turn on concatenation and one to specify the
format JHS should be reading. I think just one config that clearly states that the turning
on this on an NM (writer) requires the JHS (reader) already has this enabled. In case of rolling
upgrades, this translates to requiring a JHS upgrade prior to NM upgrade.  

> Aggregated Log Files should be combined
> ---------------------------------------
>                 Key: YARN-2942
>                 URL: https://issues.apache.org/jira/browse/YARN-2942
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 2.6.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: CombinedAggregatedLogsProposal_v3.pdf, CompactedAggregatedLogsProposal_v1.pdf,
CompactedAggregatedLogsProposal_v2.pdf, ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch,
YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, YARN-2942.003.patch
> Turning on log aggregation allows users to easily store container logs in HDFS and subsequently
view them in the YARN web UIs from a central place.  Currently, there is a separate log file
for each Node Manager.  This can be a problem for HDFS if you have a cluster with many nodes
as you’ll slowly start accumulating many (possibly small) files per YARN application.  The
current “solution” for this problem is to configure YARN (actually the JHS) to automatically
delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into one log file
per application.

This message was sent by Atlassian JIRA

View raw message