hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-321) Generic application history service
Date Mon, 23 Dec 2013 16:02:08 GMT

    [ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855707#comment-13855707

Robert Joseph Evans commented on YARN-321:

The way it currently works is based off of group permissions on a directory (this is from
memory from a while ago so I could be off on a few things).  In HDFS when you create a file
the group of the file is the group of the directory the file is a part of, similar to the
sticky bit on a directory in Linux.  When an MR job completes it will copy it's history log
file, along with a few other files, to a drop box like location called intermediate done and
atomically rename it from a temp name to the final name.  The directory is world writable,
but only readable by a special group that the history server is a part of, but general users
are not.  The history server then wakes up periodically and will scan that directory for new
files, when it sees new files it will move them to a final location that is owned by the headless
history server user.  If a query comes in for a job that the history server is not aware of,
it will also scan the intermediate done directory before failing.

Reading history data is done through RPC to the history server, or through the web interface,
including RESTful APIs.  There is no supported way for an app to read history data directly
though the file system.  I hope this helps.

> Generic application history service
> -----------------------------------
>                 Key: YARN-321
>                 URL: https://issues.apache.org/jira/browse/YARN-321
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Luke Lu
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, Generic
Application History - Design-20131219.pdf, HistoryStorageDemo.java
> The mapreduce job history server currently needs to be deployed as a trusted server in
sync with the mapreduce runtime. Every new application would need a similar application history
server. Having to deploy O(T*V) (where T is number of type of application, V is number of
version of application) trusted servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and history data
into a particular directory for later serving. Job history data is already stored as json
(or binary avro). I propose that we create only one trusted application history server, which
can have a generic UI (display json as a tree of strings) as well. Specific application/version
can deploy untrusted webapps (a la AMs) to query the application history server and interpret
the json for its specific UI and/or analytics.

This message was sent by Atlassian JIRA

View raw message