hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints
Date Thu, 15 Sep 2016 19:05:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494251#comment-15494251

Varun Saxena commented on YARN-5585:

Just to summarise the suggestions given for folks to refer to.

* Applications (like Tez) would know best how to interpret their entity IDs' and how they
can be descendingly sorted. Most entity IDs' seem to have some sort of monotonically increasing
sequence like app ID. We can hence open up a PUBLIC interface which ATSv2 users like Tez can
implement to decide how to encode and decode a particular entity type so that it is stored
in descending sorted fashion (based on creation time) in ATSv2. Encoding and decoding similar
to AppIDConverter written in our code.Because if row keys themselves can be sorted, this will
be performance wise the best possible solution. Refer to [comment | https://issues.apache.org/jira/browse/YARN-5585?focusedCommentId=15470803&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15470803]
** _Pros of the approach:_ 
**# Lookup will be fast.
** _Cons of the approach:_ 
**# We are depending on application to provide some code for this to work. Corresponding JAR
will have to be placed in classpath. Folks in other projects may not be pleased to not have
inbuilt support for this in ATS.
**# Entity IDs' may not always have a monotonically increasing sequence like App IDs'.

* We can keep another table, say EntityCreationTable or EntityIndexTable with row key as {{cluster!user!flow!flowrun!app!entitytype!reverse
entity creation time!entityid}}. We will make an entry into this table whenever created time
is reported for the entity. The real data would still reside in the main entity table. Entities
in this table will be sorted descendingly. On read side, we can first peek into this table
to get relevant records in descending fashion (based on limit and/or fromId) and then use
this info to query entity table. We can do this in two ways. We can get created times from
querying this index table and apply a filter of created time range. Or alternatively we can
try out MultiRowRangeFilter. That from javadoc of HBase seems to be efficient. We will have
to do some processing to determine these multiple row key ranges.  Refer to [comment | https://issues.apache.org/jira/browse/YARN-5585?focusedCommentId=15472669&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15472669]
** _Note:_  Client should not send different created times for the same entity otherwise that
will lead to an additional row.  If different created time would be reported more than once
we will have to consider the latest one.
** _Pros of the approach:_ 
**# Solution provided within ATS.
**# Extra write only when created time is reported.
** _Cons of the approach:_ 
**# Extra peek into the index table on the read side. Single entity read can still be served
directly from entity table though.

* Another option would be to change the row key of entity table to cluster!user!flow!flowrun!app!entitytype!reverse
entity creation time!entityid and have another table to map cluster!user!flow!flowrun!app!entitytype!entityid
to entity created time.
So for a single entity call (HBase Get) we will have to first peek into the new table and
then get records from entity table.
** _Cons of the approach:_ 
**# On write side, we will have to first lookup into the index table which has the entity
created time or on every write client should supply entity created time. First would impact
write performance and latter may not be feasible for client to send.
**# What should be the row key if client does not supply created time on first write but supplies
the created time on a subsequent write.

cc [~sjlee0], [~vrushalic], [~rohithsharma], [~gtCarrera9]

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: YARN-5585.v0.patch
> TimelineReader REST API's provides lot of filters to retrieve the applications. Along
with those, it would be good to add new filter i.e fromId so that entities can be retrieved
after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities then REST
call gives first/last 100 entities. How to retrieve next set of 100 entities i.e 101 to 200
OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is no way
to achieve this. 
> So proposal is to have fromId in the filter like *getApps?limit=5&&fromId=app-5*
which gives list of apps from app-6 to app-10. 
> Since ATS is targeting large number of entities storage, it is very common use case to
get next set of entities using fromId rather than querying all the entites. This is very useful
for pagination in web UI.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message