hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
Date Thu, 04 Jun 2015 23:47:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573830#comment-14573830

Zhijie Shen commented on YARN-3051:

[~varun_saxena], thanks for working on the new patch. It seems to be a complete reader side
protype, which is nice. I still need some time to take thorough look, but I'd like to my thoughts
about the reader APIs.

IMHO, we may want to have or start with two sets of APIs: 1) the APIs to query the raw data
and 2) the APIs to query the aggregation data.

1) APIs to query the raw data:

We would like to have the APIs to let users zoom into the details about their jobs, and give
users the freedom to fetch the raw data and do the customized process that ATS will not do.
For example, Hive/Pig on Tez need this set of APIs to get the framework specific data, process
it and render it on their on web UI. We basically need 2 such APIs.

a. Get a single entity given an ID that uniquely locates the entity in the backend (We assume
the uniqueness is assured somehow). 
* This API can be extended or split into multiple sub-APIs to get a single element of the
entity, such as events, metrics and configuration.

b. Search for a set entities that match the given predicates.
* We can start from the predicates that we used in ATS v1 (also for the compatibility purpose),
but some of them may no longer apply.
* We may want to add more predicates to check the newly added element in v2.
* With more predefined semantics, we can even query entities that belong to some container/attempt/application
and so on.

2) APIs to query the aggregation data

These are complete new in v2 and are the advantage. With the aggregation, we can answer some
statistical questions about the job, the user, the queue, the flow and the cluster. These
APIs are not directing users to the individual entities put by the application, but returning
statistical data (carried by Application|User|Queue|Flow|ClusterEntity). 

a. Get certain level aggregation data given the ID of the concept on that level, i.e.,  the
job, the user, the queue, the flow and the cluster.

b. Search for the the jobs, the users, the queues, the flows and the clusters given predicates.
* For the predicates, we could learn from the examples in hRaven.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---------------------------------------------------------------------------
>                 Key: YARN-3051
>                 URL: https://issues.apache.org/jira/browse/YARN-3051
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>         Attachments: YARN-3051-YARN-2928.003.patch, YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch,
YARN-3051.wip.patch, YARN-3051_temp.patch
> Per design in YARN-2928, create backing storage read interface that can be implemented
by multiple backing storage implementations.

This message was sent by Atlassian JIRA

View raw message