hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Wendell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data
Date Mon, 17 Feb 2014 22:58:30 GMT

    [ https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903589#comment-13903589

Patrick Wendell commented on YARN-1530:


Thanks for the explanation! To make sure I understand how this would all work by walking through
an example.

For the Spark UI we are currently implementing the ability to serialize and write events to
HDFS, then load them later from a history server that can render the UI for jobs that are
finished. AFAIK this is basically how MapReduce works as well (?)

If users have set-up a YARN cluster and they set up event ingestion to this shared store.
Then Spark would need two things to integrate with it:

1. Be able to represent our events in JSON and hook into whatever source the user has set
up for ingestion (flume, HDFS, etc).
2. Be able to render our history timeline UI by reading event data from this store.


The benefit would be that if users set something fancy like flume, they could leverage the
same infrastructure for Spark as for other applications since there is a shared event model.
Also, they would benefit from faster indexed serving offered by this application when rendering
the "history" UI... 

Is that the main idea? I'm just trying to figure out what redundant work is saved by having
a generic framework. Since each application writes their own UI and has their own event model.
From what I can tell the benefit is that a shared ingestion and serving infrastructure can
be used. 

> [Umbrella] Store, manage and serve per-framework application-timeline data
> --------------------------------------------------------------------------
>                 Key: YARN-1530
>                 URL: https://issues.apache.org/jira/browse/YARN-1530
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>         Attachments: application timeline design-20140108.pdf, application timeline design-20140116.pdf,
application timeline design-20140130.pdf, application timeline design-20140210.pdf
> This is a sibling JIRA for YARN-321.
> Today, each application/framework has to do store, and serve per-framework data all by
itself as YARN doesn't have a common solution. This JIRA attempts to solve the storage, management
and serving of per-framework data from various applications, both running and finished. The
aim is to change YARN to collect and store data in a generic manner with plugin points for
frameworks to do their own thing w.r.t interpretation and serving.

This message was sent by Atlassian JIRA

View raw message