hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
Date Fri, 19 Jun 2015 17:12:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593649#comment-14593649

Sangjin Lee commented on YARN-3815:

Thanks [~djp] for putting this together. I added comments in the offline doc, but I'll move
the main one (high level comments) over here.

(0) on “aggregation”
Like you mentioned, I think it is helpful to make distinction on different types of aggregation
we’re talking about here. These are somewhat separate functionalities. My sense of the types
of aggregation is similar to yours, but not exactly the same. It would be good if we can converge
on their definitions.

I see 4 types of aggregation:
- app-level aggregation
- app-to-flow aggregation (“online” or “real time”)
- time-based flow aggregation (“batch” or “periodic”)
- user/queue aggregation

I’ll explain my definitions in more detail below.

(1) app-level aggregation
This is aggregating metrics from sub-app entities (e.g. containers) to the YARN application.
This can include both framework-specific metrics (e.g. HDFS bytes written for mapreduce) and
YARN-system metrics (e.g. container CPU %).

It would be ideal for app entities to have values for these metrics aggregated from sub-app
entities. How we do that is going to be different between framework-specific metrics and YARN-system

For framework-specific metrics, I would say this falls on the individual frameworks. The framework
AM usually already aggregates them in memory (consider MR job counters for example). So for
them it is straightforward to write them out directly onto the YARN app entities. Furthermore,
it is problematic to add them to the sub-app YARN entities and ask YARN to aggregate them
to the application. Framework’s sub-app entities may not even align with YARN’s sub-app
entities. For example, in case of MR, there is a reasonable one-to-one mapping between a mapper/reducer
task attempt and a container, but for other applications that may not be true. Forcing all
frameworks to hang values at containers may not be practical. I think it’s far easier for
frameworks to write aggregated values to the YARN app entities.

For YARN-system metrics, this would need to be done by YARN. I think we can have the timeline
collector aggregate the values in memory and write them out periodically. The details need
to be worked out, but that is definitely one way to go. The only tricky thing is then the
container metrics should flow through the per-app timeline collector, and cannot come from
the RM timeline collector (Junping pointed that out already).

(2) app-to-flow online aggregation
This is more or less live aggregated metrics at the flow level. This will still be based on
the native HBase schema.

Actually doing the above for the app-level integration makes app-to-flow online aggregation
simpler. It now only has to look at app entities to collect the data.

Initially we were thinking of leveraging a HBase co-processor, but there are some technical
challenges with that. We had a discussion on possible ways of doing this, and [~jrottinghuis]
has a proposal for this. I’ll let Joep chime in on this.

(3) time-based flow aggregation
This is different than the online aggregation in the sense that it is aggregated along the
time boundary (e.g. “daily”, “weekly”, etc.).

This can be based on the Phoenix schema. This can be populated in an offline fashion (e.g.
running a mapreduce job).

(4) user/queue aggregation
This is another “offline” aggregation type. Also, I believe we’re talking about only
time-based aggregation. In other words, we would aggregate values for users only with a well-defined
time window. There won’t be a “real-time” aggregation of values, similar to the flow

> [Aggregation] Application/Flow/User/Queue Level Aggregations
> ------------------------------------------------------------
>                 Key: YARN-3815
>                 URL: https://issues.apache.org/jira/browse/YARN-3815
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: Timeline Service Nextgen Flow, User, Queue Level Aggregations (v1).pdf
> Per previous discussions in some design documents for YARN-2928, the basic scenario is
the query for stats can happen on:
> - Application level, expect return: an application with aggregated stats
> - Flow level, expect return: aggregated stats for a flow_run, flow_version and flow 
> - User level, expect return: aggregated stats for applications submitted by user
> - Queue level, expect return: aggregated stats for applications within the Queue
> Application states is the basic building block for all other level aggregations. We can
provide Flow/User/Queue level aggregated statistics info based on application states (a dedicated
table for application states is needed which is missing from previous design documents like
HBase/Phoenix schema design). 

This message was sent by Atlassian JIRA

View raw message