hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuan Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6875) New aggregated log file format for YARN log aggregation.
Date Fri, 28 Jul 2017 23:34:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105859#comment-16105859

Xuan Gong commented on YARN-6875:

Thanks for the comments. [~jlowe]. I fully understand your consideration. But,

bq.  I'm not a big fan of having a separate file, even temporarily, because log aggregation
can already be a large portion of the namenode's write load on large clusters. Having that
separate file will increase the namenode write load significantly (approximately 2x per log
aggregation cycle if I understand it correctly).

I agree with this. But the proposed solution will not be worse than current solution (TFile).
Also, the index file will be created only when the partially log aggregation is enabled.
If we enable partially log aggregation:
* For T-File solution (currently used), we would create a new file every time we do the log
aggregation. If we have done log aggregation three times, we would have three T-Files
* For the proposed solution, at most, we would have two files: the log file and index file.

bq. Note that the separate index file doesn't solve all the race conditions for the reader.

Yes, this corn case is valid. But I think that this is OK. The reader would fail in this case,
but we can always retry the reader later.

> New aggregated log file format for YARN log aggregation.
> --------------------------------------------------------
>                 Key: YARN-6875
>                 URL: https://issues.apache.org/jira/browse/YARN-6875
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-6875-NewLogAggregationFormat-design-doc.pdf
> T-file is the underlying log format for the aggregated logs in YARN. We have seen several
performance issues, especially for very large log files.
> We will introduce a new log format which have better performance for large log files.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message