hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Linlin Zhou (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-7109) Extend aggregation operation for new ATS design
Date Fri, 13 Oct 2017 06:43:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203115#comment-16203115
] 

Linlin Zhou edited comment on YARN-7109 at 10/13/17 6:42 AM:
-------------------------------------------------------------

Totally 7 aggregation functions are added in the patch, they are:
Min, Avg, Count, Median, TopKMin, TopKMax, MaxFreq.
Some comments and questions on the patch:
1. What's the expected functionality of the third parameter 'state'
in below method of TimelineMetricOperation?
abstract TimelineMetric exec(TimelineMetric incoming, TimelineMetric base,
   Map<Object, Object> state);

All the newly added functions except for Min are 'stateful',
e.g. Avg needs to record both the current average value
and the number of metrics which produced the value,
Median needs to store all the metrics fed to it.
But I don’t prefer to put the extra data structures into ‘state’ object.
In the patch, I only use 'state' to hold the k value of TopKMin and TopKMax and the capacity
param of Median and MaxFreq.

2. Introduction of TimelineMetricAggregator
This is an abstract base class which encapsulatesthe data and operations needed by an aggregated
function. It offers two methods: update(TimelineMetric incoming) and List<Number> getAggregatedValues().
Each aggregation function has a corresponding subclass, e.g. TimelineMetricAggregatorMedian
maps to the Median function. 
When we need to add a new aggregation function, we add a new subclass of TimelineMetricAggregator.


3. Does aggregation only apply to SINGEL_VALUE type? Do we need to support TIME_SERIES?
Implementation of ‘Max’ and ‘Sum’ function treat the incoming metric as SINGLE_VALUE.
In the modified version, I do not make such assumption. Each function will aggregate all the
value in values(TreeMap), that is I take TIME_SERIES as multiple SINGLE_VALUE metrics and
aggregate them one by one.

4. MaxFreq and Median store all the metrics feeded bounded with or without capacity limit?
To avoid out of memory, user is allowed to set a capacity to restrict the max number of metrics
stored.  When metrics reach the capacity, the oldest metrics will be removed.
If user do not specified the capacity, we will store all the metrics.

5. The capacity and k parameters can only be set at the initial time.
For simplicity and elegance, currently the two parameters can only set once.

6. ‘Max’ and ‘Sum’ function will change the first incoming metric, is it ok?
Originally Max and Sum will take the first incoming metric as the aggregated metric, this
metric will be changed after more aggregations. In the new version, the aggregated metric
is always a new instance and won’t change the first incoming metric.

7. TimelineMetric offers getAggregatedValues() method.
I assume we only care value in an aggregated metric, timestamp can be omitted. TopKMax, TopKMin
and MaxFreq may return multiple values, for convenience and meaningful I added the method
getAggregatedValues(). For compatibility, we can still use getSingelDateValue and getValues()
to get the aggregated results.

8. The result of TopK function does not guarantee any particular order.

9. Do I need to set aggregator as xmlElement?
“TimelineMetricAggregator aggregator;” is a new field in TimelineMetric, I see other fields
like type, id, values all have annotation of ‘@XmlElement’, should aggregator do the same
setting?

10. TimelineMetricCalculator do not support comparasion between Integer/Long and Float/Double,
is this by design?

11. Do we really need the replace aggregation function?
Currently I do not modify it as ‘Max’ and ‘Sum’, and I can’t think of a scenario
where this function is needed. 

11. Is there any other aggregation function need to add?


was (Author: littlestone00):
Totally 7 aggregation functions are added in the patch, they are:
Min, Avg, Count, Median, TopKMin, TopKMax, MaxFreq.
Some comments and questions on the patch:
1. What's the expected functionality of the third parameter 'state'
in below method of TimelineMetricOperation?
abstract TimelineMetric exec(TimelineMetric incoming, TimelineMetric base,
   Map<Object, Object> state);

All the newly added functions except for Min are 'stateful',
e.g. Avg needs to record both the current average value
and the number of metrics which produced the value,
Median needs to store all the metrics fed to it.
But I don’t prefer to put the extra data structures into ‘state’ object.
In the patch, I only use 'state' to hold the k value of TopKMin and TopKMax and the capacity
param of Median and MaxFreq.

2. Introduction of TimelineMetricAggregator
This is an abstract base class which encapsulatesthe data and operations needed by an aggregated
function. It offers two methods: update(TimelineMetric incoming) and List<Number> getAggregatedValues().
Each aggregation function has a corresponding subclass, e.g. TimelineMetricAggregatorMedian
maps to the Median function. 
When we need to add a new aggregation function, we add a new subclass of TimelineMetricAggregator.


3. Does aggregation only apply to SINGEL_VALUE type? Do we need to support TIME_SERIES?
Implementation of ‘Max’ and ‘Sum’ function treat the incoming metric as SINGLE_VALUE.
In the modified version, I do not make such assumption. Each function will aggregate all the
value in values(TreeMap), that is I take TIME_SERIES as multiple SINGLE_VALUE metrics and
aggregate them one by one.

4. MaxFreq and Median store all the metrics feeded bounded with or without capacity limit?
To avoid out of memory, user is allowed to set a capacity to restrict the max number of metrics
stored.  When metrics reach the capacity, the oldest metrics will be removed.
If user do not specified the capacity, we will store all the metrics.

5. The capacity and k parameters can only be set at the initial time.
For simplicity and elegance, currently the two parameters can only set once.

6. ‘Max’ and ‘Sum’ function will change the first incoming metric, is it ok?
Originally Max and Sum will take the first incoming metric as the aggregated metric, this
metric will be changed after more aggregations. In the new version, the aggregated metric
is always a new instance and won’t change the first incoming metric.

7. TimelineMetric offers getAggregatedValues() method.
I assume we only care value in an aggregated metric, timestamp can be omitted. TopKMax, TopKMin
and MaxFreq may return multiple values, for convenience and meaningful I added the method
getAggregatedValues(). For compatibility, we can still use getSingelDateValue and getValues()
to get the aggregated results.

8. The result of TopK function does not guarantee any particular order.

9. Do I need to set aggregator as xmlElement?
“TimelineMetricAggregator aggregator;” is a new field in TimelineMetric, I see other fields
like type, id, values all have annotation of ‘@XmlElement’, should aggregator do the same
setting?

10. TimelineMetricCalculator do not support comparasion between Integer/Long and Float/Double,
is this by design?

11. Do we really need the replace aggregation function?
Currently I do not modify it as ‘Max’ and ‘Sum’, and I can’t think of a scenario
where this function is needed. 

11. Is there any other aggregation function need to add?[^attachment-name.zip]

> Extend aggregation operation for new ATS design
> -----------------------------------------------
>
>                 Key: YARN-7109
>                 URL: https://issues.apache.org/jira/browse/YARN-7109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zian Chen
>            Assignee: Linlin Zhou
>              Labels: patch
>         Attachments: yarn-7109.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message