hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
Date Tue, 18 Oct 2016 10:50:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585160#comment-15585160

Varun Saxena commented on YARN-3816:

bq. What you've mention here, IIUC, is something closer to the concept "accumulation" as we
discussed before. Accumulation will apply an accumulative method on the same metric for the
same timeline entity across time.
Sort of. From the earlier code on this JIRA, accumulation meant time-based integral i.e. generating
area under the curve using Trapezoidal rule. It should be fine to address this use case when
we do accumulation.

bq. We also had a discussion on how often node managers should publish container metrics (YARN-4712
and YARN-4821). Currently they emit them every 3 seconds, but I think we should do a time
average on the NMTimelinePublisher and emit them less often. It may help in this regard.
Yes, this should largely address the concern I had depending on what the configuration interval
Assume, aggregation interval is 15 seconds and the config we add in YARN-4821 is configured
as 5 seconds, then we can potentially have 3 CPU values for a container reported to Collector.
Assume these values to be (t1, 40), (t2, 30) and (t3, 7). t1,t2 and t3 are 5 seconds apart.
Currently we will pick up only 7 as the value which will be used for aggregation. My point
is should it be ((5*40) + (5*30) + (5*7)) / 15 = 26 as a value for aggregation instead ?
Because if instead of 7, this value was 70, it would be reported as 70 whereas time average
would have been around 46.

We can however assume aggregation as just the latest value at a particular time (sort of snapshot
of the system) and handle above use case during accumulation, as Li suggested. 

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> ----------------------------------------------------------------------------
>                 Key: YARN-3816
>                 URL: https://issues.apache.org/jira/browse/YARN-3816
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Li Lu
>              Labels: yarn-2928-1st-milestone
>             Fix For: 3.0.0-alpha1
>         Attachments: Application Level Aggregation of Timeline Data.pdf, YARN-3816-YARN-2928-v1.patch,
YARN-3816-YARN-2928-v2.1.patch, YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch,
YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, YARN-3816-YARN-2928-v3.patch,
YARN-3816-YARN-2928-v4.patch, YARN-3816-YARN-2928-v5.patch, YARN-3816-YARN-2928-v6.patch,
YARN-3816-YARN-2928-v7.patch, YARN-3816-YARN-2928-v8.patch, YARN-3816-YARN-2928-v9.patch,
YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: resource (CPU,
Memory) consumption across all containers, number of containers launched/completed/failed,
etc. We need this for apps while they are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be aggregated to show
details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based on Application-level
aggregations rather than raw entity-level data as much less raws need to scan (with filter
out non-aggregated entities, like: events, configurations, etc.).

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message