hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vrushali C (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3817) [Aggregation] Flow and User level aggregation on Application States table
Date Tue, 07 Jul 2015 22:25:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617548#comment-14617548

Vrushali C commented on YARN-3817:

bq. This said, we may want to try to implement the offline aggregations as map-reduce jobs
as our first attempt

+1 to this. 
We haven't yet worked out the aggregation at the timeseries level for a flow. 

I too did some estimates on the sizes and actually it will be much higher than what you have
above since there is also time series data that is emitted by app master as well as individual

I will share an excel that I have created so that we can think about how we want to emit the
timeseries metrics.

> [Aggregation] Flow and User level aggregation on Application States table
> -------------------------------------------------------------------------
>                 Key: YARN-3817
>                 URL: https://issues.apache.org/jira/browse/YARN-3817
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: Detail Design for Flow and User Level Aggregation.pdf
> We need flow/user level aggregation to present flow/user related states to end users.
> Flow level aggregation involve three levels aggregations:
> - The first level is Flow_run level which represents one execution of a flow and shows
exactly aggregated data for a run of flow.
> - The 2nd level is Flow_version level which represents summary info of a version of flow.
> - The 3rd level is Flow level which represents summary info of a specific flow.
> User level aggregation represents summary info of a specific user, it should include
summary info of accumulated and statistic means (by two levels: application and flow), like:
number of Flows, applications, resource consumption, resource means per app or flow, etc.

This message was sent by Atlassian JIRA

View raw message