hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints
Date Thu, 22 Sep 2016 08:59:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15512673#comment-15512673

Varun Saxena commented on YARN-5585:

Just to clarify, what I meant by having another index table was not to store data in it. It
only stores the entityID for cluster!user!flow!run!app!entitytype and inverted created time.
The write to this table will only be when created time is reported i.e. when application reports
created time on start event (most probably).

As as part of the interface, we are claiming entities will be returned, descendingly sorted
by created time, I felt this use case we should definitely support. 
Whether we support sorting by some other parameter or not.
Currently we iterate over all the entities within the scope of entity type to arrive at the
sorted set of entities. So, this IMO should definitely be fixed by providing some sort of
index table.

In the 2nd point in my  [comment above | https://issues.apache.org/jira/browse/YARN-5585?focusedCommentId=15494251&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15494251]
we can query entity ID specific entities directly from entity table.
One more suggestion was to open up an interface which can be used to provide encoding and
decoding for specific Entity IDs' (based on entity type) as part of row key.
This would not require any extra write or read. However, Li and Rohith seemed to be a little
reluctant with that solution as Tez or Spark will have to add code for it, albeit only a little

However, as [~vrushalic] suggested we can also create an auxiliary table, and specify the
key in timeline entity. Issue with this is we are sort of exposing internal implementation.
This however can be useful if we want to sort by something else as well as pointed out, not
merely created time. Problem though can be double write. How about having this auxiliary table
as an index table ? And have one write just to make an entry into this table. 
On read side though we can refer to this index table depending on the suggestion made by Vrushali
i.e. specify the index table and start row key and then use MultiRowRangeFilter to get records
from entity table.
Thoughts ?

However, I do feel we inherently need to support created time based sorting scenario (i.e.
have created time based index table as a mandatory table without user needing to specify it
in REST) as we promise in the interface that entities will be sorted in that fashion.

Probably we can discuss further on this in call today

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: YARN-5585.v0.patch
> TimelineReader REST API's provides lot of filters to retrieve the applications. Along
with those, it would be good to add new filter i.e fromId so that entities can be retrieved
after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities then REST
call gives first/last 100 entities. How to retrieve next set of 100 entities i.e 101 to 200
OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is no way
to achieve this. 
> So proposal is to have fromId in the filter like *getApps?limit=5&&fromId=app-5*
which gives list of apps from app-6 to app-10. 
> Since ATS is targeting large number of entities storage, it is very common use case to
get next set of entities using fromId rather than querying all the entites. This is very useful
for pagination in web UI.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message