hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
Date Tue, 16 Jun 2015 23:51:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589036#comment-14589036
] 

Sangjin Lee commented on YARN-3051:
-----------------------------------

Sorry it has taken me a while to chime in on this JIRA. I've just gone over the recent comments,
and also skimmed through the latest patch. BTW, the latest patch doesn't seem to apply cleanly
(conflicts on {{yarn.cmd}}). [~varun_saxena], could you kindly check the latest patch to see
if it needs to be updated?

I agree with most of the ideas put forward by folks in the comments. I agree with [~zjshen]
that it'd be desirable to have more specific APIs for the user-oriented side of the code and
have bit more generic (for lack of a better term) APIs on the side of the storage interaction
(namely the {{TimelineReader}} interface in its current form).

The goals of the {{TimelineReader}} API is, first, it should be generic/flexible enough to
accommodate a wide range of queries being asked, including the current queries as well as
possible future queries, and second, it should help the storage implementations translate
them into efficient queries onto the storage itself.

One idea that may help in this regard is to create further coarse-grained concepts and use
them in the {{TimelineReader}} API. It's already doing that to some extent, and we should
push that some more. For instance, it might be helpful to create *{{Context}}*. The unique
context for most of the queries would involve the cluster id and the app id. So we can make
cluster id and the app id part of the {{Context}} object and have {{TimelineReader}} deal
with {{Context}} instead of enumerating things like cluster id explicitly in its methods.

Similarly, we might want to define *predicates and/or filters*, and use them in the {{TimelineReader}}
API. In essence, one way to look at it is that a query onto the storage is really (context)
+ (predicate/filters) + (contents to retrieve). Then we could consolidate arguments into these
coarse-grained things.

Also, for the context, I don't think we need to require things like flow id or flow run id.
The storage should be able to define the context and locate entities only with cluster id
and the app id.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---------------------------------------------------------------------------
>
>                 Key: YARN-3051
>                 URL: https://issues.apache.org/jira/browse/YARN-3051
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>         Attachments: YARN-3051-YARN-2928.003.patch, YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch,
YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be implemented
by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message