hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ledion bitincka (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1440) Yarn aggregated logs are difficult for external tools to understand
Date Tue, 26 Nov 2013 23:52:36 GMT

    [ https://issues.apache.org/jira/browse/YARN-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833237#comment-13833237
] 

ledion bitincka commented on YARN-1440:
---------------------------------------

How about allowing for AppLogAggregator to be pluggable? This would most likely be a pretty
simple patch, there's only one place where AppLogAggregatorImpl is instantiated LogAggregationService.java

{code}
324     // New application
325     final AppLogAggregator appLogAggregator =
326         new AppLogAggregatorImpl(this.dispatcher, this.deletionService,
327             getConfig(), appId, userUgi, dirsHandler,
328             getRemoteNodeLogFileForApp(appId, user), logRetentionPolicy,
329             appAcls);
330     if (this.appLogAggregators.putIfAbsent(appId, appLogAggregator) != null) {
331       throw new YarnRuntimeException("Duplicate initApp for " + appId);
332     }
{code}

> Yarn aggregated logs are difficult for external tools to understand
> -------------------------------------------------------------------
>
>                 Key: YARN-1440
>                 URL: https://issues.apache.org/jira/browse/YARN-1440
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: ledion bitincka
>              Labels: log-aggregation, logs, tfile, yarn
>
> The log aggregation feature in Yarn is awesome! However, the file type and format in
which the log files are aggregated into (TFile) should either be much simpler or be made pluggable.
The current TFile format forces anyone who wants to see the files to either 
> a) use the web UI
> b) use the CLI tools (yarn logs)  or 
> c) write custom code to read the files 
> My suggestion would be to simplify the log collection by collecting and writing the raw
log files into a directory structure as follows: 
> {noformat}
> /{log-collection-dir}/{app-id}/{container-id}/{log-file-name} 
> {noformat}
> This way the application developers can (re)use a much wider array of tools to process
the logs. 
> For the readers who are not familiar with logs and their format you can find more info
the following two blog posts:
> http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
> http://blogs.splunk.com/2013/11/18/hadoop-2-0-rant/



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message