hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
Date Tue, 22 Dec 2015 00:22:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067321#comment-15067321
] 

Sangjin Lee commented on YARN-3816:
-----------------------------------

It seems the latest patch (v.4.1) is mostly a rebase change, so I'll wait for an updated patch
that addresses the comments. To comment on some of the questions and comments,

{quote}
That sounds a reasonable concern here. I agree that we should get rid of metrics get messed
up between system metrics and application's metrics. However, I think our goal here is not
just aggregate/accumulate container metrics, but also provide aggregation service to applications'
metrics (other than MR). Isn't it? If so, may be a better way is to aggregate metrcis along
not only metric name but also its original entity type (so memory metrics for ContainerEntity
won't be aggregated against memory metrics from Application Entity). Sangjin Lee, What do
you think?
{quote}
If I understood your suggestion correctly, you're talking about qualifying (or scoping) the
metric with the entity type so that they don't get mixed up, right?

I still see that this can be problematic. Let me illustrate an example. Suppose there is an
app framework called "Foo". Let's suppose Foo has a notion of "jobs" (entity type = "FooJob"),
"tasks" (entity type = "FooTask") and "subtasks" (entity type = "FooSubTask"), so that a job
is made up of a bunch of tasks, and each task can be made up of subtasks. Furthermore, suppose
all of them emit metrics called "MEMORY" where the sum of all subtasks' memory is the same
as the parent task's memory, and the sum of all tasks' memory is the same as the parent job's
memory.

With the idea of qualifying metrics with the entity type, still all these types will contribute
MEMORY to aggregation (FooJob-to-application, FooTask-to-application, and FooSubTask-to-application),
in addition to the YARN-generic container-to-application aggregation. But given their nature,
things like FooSubTask-to-application and FooTask-to-application aggregation are very much
redundant and thus wasteful. It's basically doing the same summation multiple times.

As you suggested later, we could utilize the "toAggregate" flag for applications to exclude
certain metrics from aggregation (in this case FOO would need to set toAggregate = false for
all its types). But I think we need to determine how valuable it is to open this up to app-specific
metrics.

Also, if we were to qualify the metric names with the entity type, another complicating factor
is the HBase column names for metrics. Now the aggregated metric names in the application
table would need to be prefixed (or encoded in some form) with the entity type. We need to
think about the implication of queries, filters, etc.

To me, the most important thing we need to get right is the *YARN-generic container-to-application
aggregation*. That needs to be correct and perform well in all cases. Supporting \*-to-application
aggregation for app-specific metrics is somewhat secondary IMO. How about keeping it simple,
and focusing on the container-to-application aggregation?


> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> ----------------------------------------------------------------------------
>
>                 Key: YARN-3816
>                 URL: https://issues.apache.org/jira/browse/YARN-3816
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Junping Du
>              Labels: yarn-2928-1st-milestone
>         Attachments: Application Level Aggregation of Timeline Data.pdf, YARN-3816-YARN-2928-v1.patch,
YARN-3816-YARN-2928-v2.1.patch, YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch,
YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, YARN-3816-YARN-2928-v3.patch,
YARN-3816-YARN-2928-v4.patch, YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch,
YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: resource (CPU,
Memory) consumption across all containers, number of containers launched/completed/failed,
etc. We need this for apps while they are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be aggregated to show
details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based on Application-level
aggregations rather than raw entity-level data as much less raws need to scan (with filter
out non-aggregated entities, like: events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message