hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Kanter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2942) Aggregated Log Files should be compacted
Date Fri, 12 Dec 2014 00:06:14 GMT

    [ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243413#comment-14243413

Robert Kanter commented on YARN-2942:

Thanks for taking a look at the proposal Zhijie.  

Ya, it looks like YARN-2548 is related.  That one looks to be more about long running jobs,
and for this one I hadn't really considered those; this only works after the job finishes.

1. That's true.  This design doesn't currently address that.  However, the format used by
the compacted files isn't anything special; the data is just "dumped" into the file and an
index written to the index file for each container.  As far as this format is concerned, we
should be able to append more logs and indices to it.  We would just need to figure out a
good way to manage when they're appended and how this compaction process is triggered.  

2. Yes.  We'd leave the original aggregated logs until the compacted log is available.  The
JHS would continue using the aggregated log files until the compacted log file is ready. 

3. I might not have been clear about that in the design.  The RM would be the one to figure
out when the app is done and the aggregated logs can be compacted.  We'd run the actual compacting
code in one of the NMs, so that the RM isn't spending cycles doing that, and so that we don't
end up with a replica of each compacted log on one datanode (in other words, the RM would
chose, at random or round-robin, an NM to do each app's compaction; this will cause the replicas
to be spread around the cluster).

4. That's a good question; though I don't think the index is the problem here.  It's small
enough that we could always just rewrite a new index to replace the stale one.  I think the
problem would be with the compacted log file itself because we can't simply delete a chunk
of it on HDFS; and it's big enough that there would be a lot of overhead to rewriting it.
 One solution here is to write a new compacted log file every N containers or file size, and
we can do cleanup by deleting an earlier compacted log file and updating the index.  The downside
to this is that the life length of a container in a compacted log file would not all be equal,
but that's probably okay.

Perhaps we can start out with this design, and then modify it for long running jobs that support
YARN-2468 to have some other way of:
- Triggering/Managing the compaction process (#1)
- Deleting old logs (#4)

Perhaps we can use this JIRA for normal jobs and then use YARN-2548 to add support to it for
long running jobs?  What do you think [~zjshen] and [~xgong]?

> Aggregated Log Files should be compacted
> ----------------------------------------
>                 Key: YARN-2942
>                 URL: https://issues.apache.org/jira/browse/YARN-2942
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 2.6.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: CompactedAggregatedLogsProposal_v1.pdf, YARN-2942-preliminary.001.patch
> Turning on log aggregation allows users to easily store container logs in HDFS and subsequently
view them in the YARN web UIs from a central place.  Currently, there is a separate log file
for each Node Manager.  This can be a problem for HDFS if you have a cluster with many nodes
as you’ll slowly start accumulating many (possibly small) files per YARN application.  The
current “solution” for this problem is to configure YARN (actually the JHS) to automatically
delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into one log file
per application.

This message was sent by Atlassian JIRA

View raw message