hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Kanter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
Date Tue, 12 May 2015 17:29:02 GMT

    [ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540288#comment-14540288

Robert Kanter commented on YARN-2942:

Thanks [~jlowe] for your feedback.  It's good to get more views on this.

{quote} If I understand them correctly they both propose that the NMs upload the original
per-node aggregated log to HDFS and then something (either the NMs or the RM) later comes
along and creates the aggregate-of-aggregates log{quote}
Yes.  That's correct.  

{quote}However I didn't see details on solving the race condition where a log reader comes
along, sees from the index file that the desired log isn't in the aggregate-of-aggregates,
then opens the log and reads from it just as the log is deleted by the entity appending to
the aggregate-of-aggregates.{quote}
That's a good point.  I hadn't thought of that issue.  Thinking about it now, I think there's
a few options here:
- We could simply have the reader try again if it runs into a problem
- We could have the last NM delete the aggregated log files, so that it's less likely that
this situation can occur
- Each NM could wait some amount of time (e.g. a few mins) after appending it's log file before
deleting the original file, so that it's less likely that this situation can occur

{quote}We have an internal solution where we create per-application har files of the logs{quote}
Can you give some more details on this?  Is it something you can share?  If you've already
solved this issue, then perhaps we can just use that.  Though doesn't creating har files require
running an MR job?  

{quote}Another issue from log aggregation we've seen in practice is that the proposals don't
address the significant write load the per-node aggregate files place on the namenode.{quote}
That's a good point.  Shortly after a job finishes, all of the involved NMs would upload their
log files around the same time, which puts stress on the NN.  The NM giving the RM reports
of the current aggregation progress was recently added by YARN-1376 and related.  Having the
RM coordinate the aggregation is similar to my design with ZK, but instead of a ZK lock, the
RM orchestrates things.  I like the idea of getting rid of the original aggregation and having
the NMs all write to HDFS once, in the combined file directly.  We'd have to implement your
last bullet point to have the NMs serve the logs in the meantime, as I don't think that's
there today.  

I'll try to flesh this design out a bit more and see where it goes.  Unless we should use
har files; though that adds an MR dependency.

> Aggregated Log Files should be combined
> ---------------------------------------
>                 Key: YARN-2942
>                 URL: https://issues.apache.org/jira/browse/YARN-2942
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 2.6.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: CombinedAggregatedLogsProposal_v3.pdf, CombinedAggregatedLogsProposal_v6.pdf,
CombinedAggregatedLogsProposal_v7.pdf, CompactedAggregatedLogsProposal_v1.pdf, CompactedAggregatedLogsProposal_v2.pdf,
ConcatableAggregatedLogsProposal_v4.pdf, ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch,
YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, YARN-2942.003.patch
> Turning on log aggregation allows users to easily store container logs in HDFS and subsequently
view them in the YARN web UIs from a central place.  Currently, there is a separate log file
for each Node Manager.  This can be a problem for HDFS if you have a cluster with many nodes
as you’ll slowly start accumulating many (possibly small) files per YARN application.  The
current “solution” for this problem is to configure YARN (actually the JHS) to automatically
delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into one log file
per application.

This message was sent by Atlassian JIRA

View raw message