hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Lu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
Date Wed, 16 Sep 2015 05:51:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746925#comment-14746925

Li Lu commented on YARN-3816:

Hi [~djp], sorry about the delay. I've gone through the patch and here are some quick feedback/questions.
I may come up with more comments tomorrow daytime, but here are some quick ones so that I'm
not blocking you to refresh the patch:

- TimelineAggregationBasis.java
Shall we differentiate realtime and offline aggregations? IIUC, {{APPLICATION}} type represents
realtime aggregation while the other two represents offline aggregation. 

- TimelineMetric.java
  -- setToAggregate(), why do we need a final here? 
  -- I'm a little bit confused by the following line's comments:
  +        // set delta to 0 instead of null.
  +        Number delta = TimelineMetricCalculator.getZero(n2);
  +        // t1 = null means this is the first value get aggregated, delta should
  +        // be null.
  So in conclusion, if t1 is null, we set delta to 0 since it's the first value to be aggregated.
Or else, we aggregate the delta? 
  -- I'm fine to keep the else part of the latestMetric.getType().equals(Type.SINGLE_VALUE)
for now. Maybe we'd like to update this part when implementing TODO
  -- l.183, I noticed we're copying the whole list to rebuild the valueList, then only used
one element there? Are we sure the values list is small enough all the time? 
  -- TimelineMetric and TimelineCollector have two definitions of "_AREA". Is it possible
to only have one of them? 

- TimelineCollector.java
  -- So we put intermediate aggregation status into the collector as some hash maps. Will
there be challenges to rebuild these status? We may need to face the situations like nm restart
(we're an aux service inside nm). 
  -- We need a method similar to appendAggregatedMetricsToEntities in YARN-3817. Actually
I believe this is a very helpful method in normal aggregation workflow. It would be great
if we can find a more visible place to put it. 
  -- I have one question about {{doAccumulation}}: in offline aggregation we're using a two-staged
aggregation: we perform a flow run level aggregation in the mapper, and then aggregate entities
in the same flow again in the reducer for entities from different mappers. In this case, we
need to set the {{doAccumulation}} parameter into false for the aggregation in the reducer,

I think we can fine tune the logic inside the aggregations (SUM/REP/etc,.) later. Right now
the basic structure of this in memory solution LGTM. 

> [Aggregation] App-level Aggregation for YARN system metrics
> -----------------------------------------------------------
>                 Key: YARN-3816
>                 URL: https://issues.apache.org/jira/browse/YARN-3816
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: Application Level Aggregation of Timeline Data.pdf, YARN-3816-YARN-2928-v1.patch,
YARN-3816-YARN-2928-v2.1.patch, YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch,
YARN-3816-YARN-2928-v2.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: resource (CPU,
Memory) consumption across all containers, number of containers launched/completed/failed,
etc. We need this for apps while they are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be aggregated to show
details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based on Application-level
aggregations rather than raw entity-level data as much less raws need to scan (with filter
out non-aggregated entities, like: events, configurations, etc.).

This message was sent by Atlassian JIRA

View raw message