hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joep Rottinghuis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
Date Fri, 19 Jun 2015 20:10:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593890#comment-14593890

Joep Rottinghuis commented on YARN-3815:

For flow-level aggregates I'll separately write up ideas about how to do that.
In short we need to focus on write performance, plus the fact that we have to deal with the
need to aggregate increments to aggregates from running applications. This makes it tricky
to do correctly, specifically when apps (and ATS writers) can crash and need to restart. We'll
have to keep track of the last values written. Initially I thought that using a coprocessor
to do this server side solves the problem. The challenge is that it will be invoked in the
write-path of individual stats, so slow writes to a second region server (hosting the agg
table/row) can have a rippling affect on many writes. Even worse, we can end up with a deadlock
situation under load conditions when the agg table/row happens to be hosted on the same region
server and the current write is blocked on the completion of coprocessor which needs to write
but is blocked on a full queue on its own region server.

It think the solution will be to do something in the spirit of readless increments as used
in Tephra. Similarly we'd collapse values only when flushes or compactions happen, and then
aggregation is restricted to a single row which is locked without issues. On reads we collapse
the pre-aggregated values plus the values from currently running jobs. The significant difference
will be that we can compact only when jobs are complete. I'll try to write up a more detailed
design for this.

If we follow [~sjlee0]'s suggestion to make all the other aggregates periodic, then we can
use mapreduce for those. The big advantage is that we can then use control records like we
do in hRaven to efficiently keep track of what we have already aggregated. The tricky ones
will be the long running ones we have to keep getting back to. Ideally we should be able to
read the raw values once and then "spray" they out to the various aggregate tables (cluster,
queue, user) per time period. Otherwise we end up scanning over the raw values over and over

> [Aggregation] Application/Flow/User/Queue Level Aggregations
> ------------------------------------------------------------
>                 Key: YARN-3815
>                 URL: https://issues.apache.org/jira/browse/YARN-3815
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: Timeline Service Nextgen Flow, User, Queue Level Aggregations (v1).pdf
> Per previous discussions in some design documents for YARN-2928, the basic scenario is
the query for stats can happen on:
> - Application level, expect return: an application with aggregated stats
> - Flow level, expect return: aggregated stats for a flow_run, flow_version and flow 
> - User level, expect return: aggregated stats for applications submitted by user
> - Queue level, expect return: aggregated stats for applications within the Queue
> Application states is the basic building block for all other level aggregations. We can
provide Flow/User/Queue level aggregated statistics info based on application states (a dedicated
table for application states is needed which is missing from previous design documents like
HBase/Phoenix schema design). 

This message was sent by Atlassian JIRA

View raw message