hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6875) New aggregated log file format for YARN log aggregation.
Date Fri, 28 Jul 2017 23:49:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105871#comment-16105871

Wangda Tan commented on YARN-6875:

Thanks for comments from [~jlowe]/[~xgong]. 

I think I misled Jason before, we didn't plan to add the separate index design at beginning,
but we figured out it is required for recovery. 

I agree the points from Jason:
- Log files are rarely read after write.
- Creation of  a separate index file during write means 2x workload of Namenode. 

However, if we don't write the (temp) index file, and the approach listed in Jason's comment
will make read become very slow since it need to repeatedly find where's the last successful
write. And the worst part is, we only need to read logs when app fails or slow, it will be
likely that we will read such app logs for a couple of times. I don't think it will be a good
user experience to do this every-time. 

I agree with comments from Xuan, if partial log aggregation is not enabled, this design doesn't
increase any workload. [~jlowe], what's the percentage of apps running in your cluster which
enabled partial log aggregation? 

For partial log aggregation case, an alternative solution is to write log+index to a separate
file every time, which makes write perf exactly same as TFile, but read performance can be
much better. Jason, could you share your thoughts here?

> New aggregated log file format for YARN log aggregation.
> --------------------------------------------------------
>                 Key: YARN-6875
>                 URL: https://issues.apache.org/jira/browse/YARN-6875
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-6875-NewLogAggregationFormat-design-doc.pdf
> T-file is the underlying log format for the aggregated logs in YARN. We have seen several
performance issues, especially for very large log files.
> We will introduce a new log format which have better performance for large log files.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message