Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7F8CCC75A for ; Fri, 12 Dec 2014 00:06:14 +0000 (UTC) Received: (qmail 84353 invoked by uid 500); 12 Dec 2014 00:06:14 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 84314 invoked by uid 500); 12 Dec 2014 00:06:14 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 84297 invoked by uid 99); 12 Dec 2014 00:06:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Dec 2014 00:06:14 +0000 Date: Fri, 12 Dec 2014 00:06:14 +0000 (UTC) From: "Robert Kanter (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-2942) Aggregated Log Files should be compacted MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-2942?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14243= 413#comment-14243413 ]=20 Robert Kanter commented on YARN-2942: ------------------------------------- Thanks for taking a look at the proposal Zhijie. =20 Ya, it looks like YARN-2548 is related. That one looks to be more about lo= ng running jobs, and for this one I hadn't really considered those; this on= ly works after the job finishes. 1. That's true. This design doesn't currently address that. However, the = format used by the compacted files isn't anything special; the data is just= "dumped" into the file and an index written to the index file for each con= tainer. As far as this format is concerned, we should be able to append mo= re logs and indices to it. We would just need to figure out a good way to = manage when they're appended and how this compaction process is triggered. = =20 2. Yes. We'd leave the original aggregated logs until the compacted log is= available. The JHS would continue using the aggregated log files until th= e compacted log file is ready. =20 3. I might not have been clear about that in the design. The RM would be t= he one to figure out when the app is done and the aggregated logs can be co= mpacted. We'd run the actual compacting code in one of the NMs, so that th= e RM isn't spending cycles doing that, and so that we don't end up with a r= eplica of each compacted log on one datanode (in other words, the RM would = chose, at random or round-robin, an NM to do each app's compaction; this wi= ll cause the replicas to be spread around the cluster). 4. That's a good question; though I don't think the index is the problem he= re. It's small enough that we could always just rewrite a new index to rep= lace the stale one. I think the problem would be with the compacted log fi= le itself because we can't simply delete a chunk of it on HDFS; and it's bi= g enough that there would be a lot of overhead to rewriting it. One soluti= on here is to write a new compacted log file every N containers or file siz= e, and we can do cleanup by deleting an earlier compacted log file and upda= ting the index. The downside to this is that the life length of a containe= r in a compacted log file would not all be equal, but that's probably okay. Perhaps we can start out with this design, and then modify it for long runn= ing jobs that support YARN-2468 to have some other way of: - Triggering/Managing the compaction process (#1) - Deleting old logs (#4) Perhaps we can use this JIRA for normal jobs and then use YARN-2548 to add = support to it for long running jobs? What do you think [~zjshen] and [~xgo= ng]? > Aggregated Log Files should be compacted > ---------------------------------------- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature > Affects Versions: 2.6.0 > Reporter: Robert Kanter > Assignee: Robert Kanter > Attachments: CompactedAggregatedLogsProposal_v1.pdf, YARN-2942-pr= eliminary.001.patch > > > Turning on log aggregation allows users to easily store container logs in= HDFS and subsequently view them in the YARN web UIs from a central place. = Currently, there is a separate log file for each Node Manager. This can b= e a problem for HDFS if you have a cluster with many nodes as you=E2=80=99l= l slowly start accumulating many (possibly small) files per YARN applicatio= n. The current =E2=80=9Csolution=E2=80=9D for this problem is to configure= YARN (actually the JHS) to automatically delete these files after some amo= unt of time. =20 > We should improve this by compacting the per-node aggregated log files in= to one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)