Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Tue, 12 May 2015 18:26:02 +0000 (UTC)
From: "Jason Lowe (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12760597.1418167637000.90607.1431455162868@Atlassian.JIRA>
In-Reply-To: <JIRA.12760597.1418167637000@Atlassian.JIRA>
References: <JIRA.12760597.1418167637000@Atlassian.JIRA>
 <JIRA.12760597.1418167637024@arcas>
Subject: [jira] [Commented] (YARN-2942) Aggregated Log Files should be
 combined
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/YARN-2942?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14540=
407#comment-14540407 ]=20

Jason Lowe commented on YARN-2942:
----------------------------------

bq. Can you give some more details on this? Is it something you can share?

It's a hack to help mitigate the log aggregation namespace scaling issues o=
n our large clusters.  Essentially its a periodic process to run an Oozie w=
orkflow that does the following:

# determines which applications are good candidates for log archiving (i.e.=
: lots of files and total size is not that big)
# runs a streaming job with a shell script that uses the list of applicatio=
ns to aggregate as input
# for each application it runs a local-mode archive job to archive the log =
contents
# when the archive has been created it swaps out the application directory =
with a symlink into the har archive

The symlink makes the archive transparent to the readers.  Both the JHS and=
 the "yarn logs" command use FileContext and "just worked" with the symlink=
 into the har without modifications.

So yes, we are running a MapReduce job to archive the logs which itself wil=
l create more logs.  However it processes many application logs for each ar=
chiving job.  If there is sufficient interest we can pursue how to share it=
, but the script is specific to how we configure our nodes and clusters and=
 relies on unsupported symlinks.  I'm hoping the outcome of this JIRA allow=
s us to move away from the need for it.

bq. We'd have to implement your last bullet point to have the NMs serve the=
 logs in the meantime, as I don't think that's there today.=20

That feature is indeed there today.  Links to the app logs on the NM will t=
ry to serve the local app logs first, then redirect to the log server if th=
e local logs are unavailable.  See NMController and ContainerLogsPage.  It =
only becomes an issue when things link to the aggregated log server directl=
y before the NM has finished aggregating them.

> Aggregated Log Files should be combined
> ---------------------------------------
>
>                 Key: YARN-2942
>                 URL: https://issues.apache.org/jira/browse/YARN-2942
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 2.6.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: CombinedAggregatedLogsProposal_v3.pdf, CombinedAggre=
gatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, CompactedA=
ggregatedLogsProposal_v1.pdf, CompactedAggregatedLogsProposal_v2.pdf, Conca=
tableAggregatedLogsProposal_v4.pdf, ConcatableAggregatedLogsProposal_v5.pdf=
, YARN-2942-preliminary.001.patch, YARN-2942-preliminary.002.patch, YARN-29=
42.001.patch, YARN-2942.002.patch, YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in=
 HDFS and subsequently view them in the YARN web UIs from a central place. =
 Currently, there is a separate log file for each Node Manager.  This can b=
e a problem for HDFS if you have a cluster with many nodes as you=E2=80=99l=
l slowly start accumulating many (possibly small) files per YARN applicatio=
n.  The current =E2=80=9Csolution=E2=80=9D for this problem is to configure=
 YARN (actually the JHS) to automatically delete these files after some amo=
unt of time. =20
> We should improve this by compacting the per-node aggregated log files in=
to one log file per application.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)