hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Lu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
Date Tue, 12 Apr 2016 21:45:25 GMT

     [ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Li Lu updated YARN-3816:
------------------------
    Attachment: YARN-3816-YARN-2928-v6.patch

OK v6 version of the patch. Addressed most of Sangjin's comments and removed some unnecessary
code. Specially, something I addressed in ways other than Sangjin's suggestions:

- I did not move the aggregation logic to app-level collector completely. Instead, I left
the code infrastructure in TimelineCollector but moved the logic to launch the aggregation
into app-level collector. In this way, we keep the aggregation infrastructure to be a fairly
general one for future collectors (like rack level collector proposed by Vinod a while ago)
but can have specific designs for app-level aggregations. 
- With regard to the result of the aggregations, I store them in the application entity with
entity id equals to the application id. The id for each of the aggregated metric is the original
metric plus the aggregation group. Note that I think we need to keep the "aggregation group"
information in the metric id because we may have multiple types of entities all posting the
same metric name (especially if there are user-defined metrics posted by the application itself)
and we may not want to aggregate them together. 
- I refactored RealTimeAggregationOperation into TimelineMetricOperations. My intuition here
is we can provide a basic framework to define operations between timeline metrics, no matter
it's an aggregation operation or accumulation operation. Right now the input of a timeline
metric operation is the incoming metric, the existing metric, the previous state. The output
should be a new timeline metric and the side effect can be reflected on the state. In this
way we can model aggregation operations like SUM, AVG (not supported yet) and accumulation
operations like REPLACE and MAX. 
- I changed the code so that we're not storing the metric aggregation operation. I'll rebuild
them for offline aggregations through a config. Will address that in YARN-3817. Right now,
this patch lives well with the new filter mechanism. 

Please do let me know if there are other concerns, thanks! 

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> ----------------------------------------------------------------------------
>
>                 Key: YARN-3816
>                 URL: https://issues.apache.org/jira/browse/YARN-3816
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Li Lu
>              Labels: yarn-2928-1st-milestone
>         Attachments: Application Level Aggregation of Timeline Data.pdf, YARN-3816-YARN-2928-v1.patch,
YARN-3816-YARN-2928-v2.1.patch, YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch,
YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, YARN-3816-YARN-2928-v3.patch,
YARN-3816-YARN-2928-v4.patch, YARN-3816-YARN-2928-v5.patch, YARN-3816-YARN-2928-v6.patch,
YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: resource (CPU,
Memory) consumption across all containers, number of containers launched/completed/failed,
etc. We need this for apps while they are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be aggregated to show
details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based on Application-level
aggregations rather than raw entity-level data as much less raws need to scan (with filter
out non-aggregated entities, like: events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message