hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joep Rottinghuis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
Date Fri, 19 Jun 2015 19:54:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593867#comment-14593867
] 

Joep Rottinghuis commented on YARN-3815:
----------------------------------------

Very much agree with separation into 2 categories "online" versus "periodic". I think this
will be natural split between the native HBase tables for the former and the Phoenix approach
for the latter to each emphasize their relative strengths.

A few thoughts around time-based aggregations:
- If the aggregation time is smaller than the runtime of apps/flows we need to consider what
that means for an aggregate. As an extreme example consider hourly aggregates for applications
that take hours to complete. What do we actually count in that one hour? Do we only attribute
to that hour the specific total metric that came in at that time, or do we try to apportion
part of the increment to what happened only in that one hour? Ditto goes for daily aggregates
when we have long running jobs. In hRaven we simply don't deal with this at all by making
the simplifying assumption that all metrics and usage all happen in the instant that the job
is completed. With ATSv2 being (near) real-time that will simply not work, so we need to consider
what that means. Are we requiring apps to write at least once within each aggregation period?
- If we store aggregates in columns (hourly columns, daily columns) we need to limit the growth
of # columns by making the next level aggregate part of the rowkey. This would limit 24 hourly
columns to a single day row. Similarly we'd have 7 dailies in a week, or perhaps just up to
31 dailies in a month. All of these considerations come from a strong need to be able to limit
the range over which we scan in order to get a reasonable performance in the face of lots
of data.

{quote}
Flow level:
○ expect return: aggregated stats for a flow_run, flow_version and flow
{quote}
I think "flow" level aggregations should really only mean flow-run level aggregation in the
sense of the separation that [~sjlee0] mentioned above for HBase native online aggregations.
I'm not sure that flow_version rollups even make sense. Flow_version are important to be able
to pass in as a filter: give me stats for this flow only matching this version. This is useful
for cases such as reducer estimation where a job can make effective use only of previous run
data if the version of the flow hasn't changed. The fact that there were three version of
a Hive query is good to now. Knowing when each version first appeared is good to know. Knowing
the total cost for version 2 is probably less useful.
Flow level aggregates are useful only with a particular timerange in mind. What was the cost
for the DailyActiveUsers job (no matter the version) for the last week? How many bytes did
the SearchJob read from HDFS in the last month?

Thoughts around queue level aggregation (in addition to Sangjin's comments that these should
be time-based):
Queue level aggregates have additional complexities. First queues can come and go very quickly
and apps can be moved from queue to queue. For the purpose of normal shorter lived applications
it might be tempting to use the final queue that a job ran in (this is the assumption we make
in hRaven). With long running apps this assumption breaks down.
Now if an app runs for an hour and accumulates some value X for a metric Y it will be recorded
as such in the original queue agg. Now the application gets moved and the new value of metric
Y is now Z. Are we going to aggregate Z-X in the new queue, or simply all of Z? The sums of
all metrics Z in the queues will not be the same as the sums of all apps or flows.

In addition, queues can grow and shrink on the fly. Are we going to record that? In the very
least we need to prefix the cluster in the rowkey so that we can differentiate different queues
from different clusters.

And then there are hierarchical queues. Are we thinking of rolling stats to each level, or
just in the individual leaf queue? Will we structure the rowkeys that we can do prefix scans
for queues called /cluster/parent/childa /cluster/parent/childb ?



> [Aggregation] Application/Flow/User/Queue Level Aggregations
> ------------------------------------------------------------
>
>                 Key: YARN-3815
>                 URL: https://issues.apache.org/jira/browse/YARN-3815
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: Timeline Service Nextgen Flow, User, Queue Level Aggregations (v1).pdf
>
>
> Per previous discussions in some design documents for YARN-2928, the basic scenario is
the query for stats can happen on:
> - Application level, expect return: an application with aggregated stats
> - Flow level, expect return: aggregated stats for a flow_run, flow_version and flow 
> - User level, expect return: aggregated stats for applications submitted by user
> - Queue level, expect return: aggregated stats for applications within the Queue
> Application states is the basic building block for all other level aggregations. We can
provide Flow/User/Queue level aggregated statistics info based on application states (a dedicated
table for application states is needed which is missing from previous design documents like
HBase/Phoenix schema design). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message