Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Fri, 12 Dec 2014 00:06:14 +0000 (UTC)
From: "Robert Kanter (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12760597.1418167637000.5453.1418342774224@Atlassian.JIRA>
In-Reply-To: <JIRA.12760597.1418167637000@Atlassian.JIRA>
References: <JIRA.12760597.1418167637000@Atlassian.JIRA>
 <JIRA.12760597.1418167637024@arcas>
Subject: [jira] [Commented] (YARN-2942) Aggregated Log Files should be
 compacted
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/YARN-2942?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14243=
413#comment-14243413 ]=20

Robert Kanter commented on YARN-2942:
-------------------------------------

Thanks for taking a look at the proposal Zhijie. =20

Ya, it looks like YARN-2548 is related.  That one looks to be more about lo=
ng running jobs, and for this one I hadn't really considered those; this on=
ly works after the job finishes.

1. That's true.  This design doesn't currently address that.  However, the =
format used by the compacted files isn't anything special; the data is just=
 "dumped" into the file and an index written to the index file for each con=
tainer.  As far as this format is concerned, we should be able to append mo=
re logs and indices to it.  We would just need to figure out a good way to =
manage when they're appended and how this compaction process is triggered. =
=20

2. Yes.  We'd leave the original aggregated logs until the compacted log is=
 available.  The JHS would continue using the aggregated log files until th=
e compacted log file is ready. =20

3. I might not have been clear about that in the design.  The RM would be t=
he one to figure out when the app is done and the aggregated logs can be co=
mpacted.  We'd run the actual compacting code in one of the NMs, so that th=
e RM isn't spending cycles doing that, and so that we don't end up with a r=
eplica of each compacted log on one datanode (in other words, the RM would =
chose, at random or round-robin, an NM to do each app's compaction; this wi=
ll cause the replicas to be spread around the cluster).

4. That's a good question; though I don't think the index is the problem he=
re.  It's small enough that we could always just rewrite a new index to rep=
lace the stale one.  I think the problem would be with the compacted log fi=
le itself because we can't simply delete a chunk of it on HDFS; and it's bi=
g enough that there would be a lot of overhead to rewriting it.  One soluti=
on here is to write a new compacted log file every N containers or file siz=
e, and we can do cleanup by deleting an earlier compacted log file and upda=
ting the index.  The downside to this is that the life length of a containe=
r in a compacted log file would not all be equal, but that's probably okay.

Perhaps we can start out with this design, and then modify it for long runn=
ing jobs that support YARN-2468 to have some other way of:
- Triggering/Managing the compaction process (#1)
- Deleting old logs (#4)

Perhaps we can use this JIRA for normal jobs and then use YARN-2548 to add =
support to it for long running jobs?  What do you think [~zjshen] and [~xgo=
ng]?

> Aggregated Log Files should be compacted
> ----------------------------------------
>
>                 Key: YARN-2942
>                 URL: https://issues.apache.org/jira/browse/YARN-2942
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 2.6.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: CompactedAggregatedLogsProposal_v1.pdf, YARN-2942-pr=
eliminary.001.patch
>
>
> Turning on log aggregation allows users to easily store container logs in=
 HDFS and subsequently view them in the YARN web UIs from a central place. =
 Currently, there is a separate log file for each Node Manager.  This can b=
e a problem for HDFS if you have a cluster with many nodes as you=E2=80=99l=
l slowly start accumulating many (possibly small) files per YARN applicatio=
n.  The current =E2=80=9Csolution=E2=80=9D for this problem is to configure=
 YARN (actually the JHS) to automatically delete these files after some amo=
unt of time. =20
> We should improve this by compacting the per-node aggregated log files in=
to one log file per application.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)