hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2985) YARN should support to delete the aggregated logs for Non-MapReduce applications
Date Thu, 13 Apr 2017 15:33:41 GMT

    [ https://issues.apache.org/jira/browse/YARN-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967748#comment-15967748
] 

Jason Lowe commented on YARN-2985:
----------------------------------

Based on the description of this JIRA, I think there's some confusion here.  Aggregated logs
are deleted for non-MapReduce applications as long as the deletion service is running, whether
that deletion service is hosted by the MapReduce job history server or somewhere else.  That's
why the proposed patch is so small -- it's simply reusing the same code the JHS is already
running.  The log deletion service looks at the remote log directory in HDFS.  It doesn't
filter the list of application logs it finds there based on whether it thinks the app is MapReduce
or not, rather it just treats them as generic applications.  It happens to run in the MapReduce
history server, but it is _not_ MapReduce-specific.  If users don't want to run MapReduce
applications but want to do log aggregtion then they just need to run the MapReduce history
server.  They won't use it for MapReduce job history since there are no MapReduce jobs, but
that server will perform aggregated log retention for *all* applications.

Therefore this JIRA is really about adding the ability to relocate the aggregated log deletion
service from the MapReduce job history server to the YARN timeline server.  We don't want
two of these things running in the cluster if someone has deployed the MapReduce history server
and the YARN timeline server.  That could lead to error messages in the logs as one of them
goes to traverse/delete the logs just as the other is already deleting them.  However we also
don't want to just rip it out of the MapReduce history server and move it to the timeline
server because the timeline server is still an optional server in YARN.

So we either need a way for the user to specify where they want the deletion service to run,
whether that's the legacy location in the MapReduce history server (since they aren't going
to run a timeline server which is still an optional YARN server) or in the timeline server.
 Or we need to just declare the timeline server a mandatory server to run (at least for log
aggregation support) and move it from one to the other.

In addition the MapReduce history server supports dynamic refresh of the log deletion service
configs, and it would be nice not to lose that ability when it is hosted in the timeline server.
 That could be a separate JIRA unless we're ripping it out of the JHS.  If it can only run
in the timeline server then we would lose refresh functionality unless that JIRA was completed.

As for unit tests, I agree the existing tests for the deletion service cover the correctness
of the service itself, so we just need unit tests for the timeline server and MapReduce JHS
to verify each is starting the deletion service or not starting the service based on how the
cluster is configured.

> YARN should support to delete the aggregated logs for Non-MapReduce applications
> --------------------------------------------------------------------------------
>
>                 Key: YARN-2985
>                 URL: https://issues.apache.org/jira/browse/YARN-2985
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: log-aggregation, nodemanager
>    Affects Versions: 2.8.0
>            Reporter: Xu Yang
>            Assignee: Steven Rand
>         Attachments: YARN-2985-branch-2-001.patch
>
>
> Before Hadoop 2.6, the LogAggregationService is started in NodeManager. But the AggregatedLogDeletionService
is started in mapreduce`s JobHistoryServer. Therefore, the Non-MapReduce application can aggregate
their logs to HDFS, but can not delete those logs. Need the NodeManager take over the function
of aggregated log deletion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message