hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
Date Wed, 16 Sep 2015 19:01:47 GMT

    [ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790948#comment-14790948

Sangjin Lee commented on YARN-4074:

Thanks for your comments [~varun_saxena]!

bq. This I guess is to retrieve entities by a certain time range. I havent added this in REST
related JIRA and dont see support even here. Will handle it after PoC I guess. Correct ?

The flow activity table is a time-based set of data. The timestamp (day marker really) is
there to order the activity in time. It is feasible to query the flow activity table based
on time (e.g. "give me all the activity in the past 3 days"). I didn't get around to it, but
it should be pretty straightforward to support that after the POC. I'll file a JIRA for adding
that support.

bq. Also I had added user as an optional query param in REST API code. I think querying by
user wont really be a good idea looking at the row key. Will remove it.

Yes, the way the data is laid out, cluster + user will not be an efficient query, as time
is the component that gets in before the user.

The major code added in this patch is about the different readers based on table and a factory.
It looks fine. However, number of parameters to the methods are quite a lot just like Reader
API. As you mentioned elsewhere, maybe I can refactor this later. We should club together
some things into logical things like context, filters, etc. That will reduce number of params.

That is spot on. I really didn't like having to repeat the long list of arguments. But since
you're looking into a better way of capturing the filters and predicates, I'm not really changing
things as part of this JIRA. Hope that is consistent with your understanding.

In the factory class, we have a sequence of if-else statements. Although its a matter of perspective,
sequence of if-else look a little inelegant. But we may not have too many great options here.
Thought of enums i.e. having create methods with implementation tied to each enum but entity
type enum is not HBase specific. Any other option ? I guess if-else should be fine for now
because not too many tables should be added in future, if any.

I agree. I wanted to use the switch-case statements, but the main issue was that the input
is string, not enums. If it were enums, it could have been trivial...

In xxxEntityReader classes, maybe createTable should be renamed to getTable because we are
not really creating any table here. We are just getting/creating a table object. 
Also if I am not wrong, no need to create this object again and again as well. All it really
holds is static information such as table name and conf.

Those are good suggestions. Yes, the {{BaseTable}} instances are thread-safe, and I think
they can be reused. I'll update the patch to make those changes.

> [timeline reader] implement support for querying for flows and flow runs
> ------------------------------------------------------------------------
>                 Key: YARN-4074
>                 URL: https://issues.apache.org/jira/browse/YARN-4074
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch,
YARN-4074-YARN-2928.POC.003.patch, YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as implementation
of the API.

This message was sent by Atlassian JIRA

View raw message