hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-321) Generic application history service
Date Fri, 10 Jan 2014 17:57:09 GMT

    [ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868043#comment-13868043

Zhijie Shen commented on YARN-321:

bq. 1. Does it provide a function to set maximum files and maximum retention period of AppicationHistory
to store in HDFS?

No, currently the FS implementation doesn't discard the historic data of the applications
completed before sometime, answer users' requests based on all the stored applications. However,
via REST API, users are able to filter the applications outside a start/finish time window.

bq. 2. When there are many AppilicationHistory in HDFS, does it not limit the number of the
reading of ApplicationHistory?

As to REST API, the users are able to limit the number of applications that AHS should return.
As to HDFS access, the current implementation is going to load all the stored applications
and filtering them one-by-one, which is not a efficient way given a big application collection.
YARN-925 is reopened to discuss pushing the filtering into the implementation of the history
store, where we can prevent loading all the applications.  Meanwhile, caching (YARN-1322)
is another way to reduce I/O.

> Generic application history service
> -----------------------------------
>                 Key: YARN-321
>                 URL: https://issues.apache.org/jira/browse/YARN-321
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Luke Lu
>         Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, Generic
Application History - Design-20131219.pdf, HistoryStorageDemo.java
> The mapreduce job history server currently needs to be deployed as a trusted server in
sync with the mapreduce runtime. Every new application would need a similar application history
server. Having to deploy O(T*V) (where T is number of type of application, V is number of
version of application) trusted servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and history data
into a particular directory for later serving. Job history data is already stored as json
(or binary avro). I propose that we create only one trusted application history server, which
can have a generic UI (display json as a tree of strings) as well. Specific application/version
can deploy untrusted webapps (a la AMs) to query the application history server and interpret
the json for its specific UI and/or analytics.

This message was sent by Atlassian JIRA

View raw message