hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints
Date Wed, 07 Sep 2016 07:04:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15469802#comment-15469802

Rohith Sharma K S commented on YARN-5585:

bq. Are we selecting entities whose ID is less than start value, or we're filtering them out?
According to your description fromId = app-5 should return something like app-6 to 10, right?
I think it's very important to clearly define the exact meaning of "fromId"?
*fromId* is to the users to pass as an query parameter in REST URL similar to limit.  When
entities are being retrieved from storage i.e HBase, entities whose ID is less than start
value are given to HBase client. Then HBase client process this ResultScanner and return entites.

Ex : Assume that *entity-1 entity-2.. entity-10* are stored in HBase in a row. 
Current Behavior without fromId : 
# When REST call is made to obtaining entities , then out put get it as *entity-10 entity-9...
entity-2, entity-1*. 
# When REST call is made along with filter {{limit=5}}, then out put get it as *entity-10,
entity-9... entity-6*.  Note that limit is not applied at storage level.  Rather limit is
applied on scanned rows i.e HBase ResultScanner gives *ALL* the rows i.e entities1 to entities-10.
And  {{TimelineEntityReader#readEntities}} limit number of rows to be given to user. 

After patch i.e fromId as filter : 
# When REST call is made along with filter {{limit=5}} and {{fromIid=entity-6}}, then *HBase
it self gives rows which are less than entity-6* i.e entity-5 to entity-1. It is much more
optimization rather that processing all the rows at HBaseclient i.e at {{TimelineEntityReader#readEntities}}

Basically to the user, fromId is nothing but starting point for next set of entities.

bq. Because we're selecting entities starting from a given ID, can we directly pass in the
fromID's key when creating the scan? In this way seems like we saved one filter? For example,
if fromId is not provided, we may want to scan from cluster!user!flow!flowrun!appId!type,
but if fromId is provided, we can start from cluster!user!flow!flowrun!appId!type!fromId (or
the next available entity)?
This is good point. But as you said in earlier comment that entities are not stored in-order.
It can be like entites-9,entitis-5,entites-6,entites-2...entities-10. So, IIUC this can not
be achieved

bq. For pagination on containers, why do we need to care about actual creation time when the
entity ids have already been sorted? This said, supporting paginations for generic timeline
entities should not be blocked by YARN-5094?
Any entities with creationTime set will get descending order of entityId. If creationtime
is not set than there result is reverse order i.e ascending order of entityId. This is because
of implementation of {{TimelineEntitiy#compareTo}}. So, say {{limit=2 and fromId=enitytId-6}}
then from storage rows retrieved are i.e entity-5 to entity-1. And to the user, REST output
get as entity-1 and entity-2 rather than getting entity-5 and entity-4.  This is because of
{{TimelineEntityReader#readEntities}} implementation.  YARN-5094 blocks for testing YARN-CONTAINER
entities because most of the events are -1 creation time which always result will be first
N number of containers when fromId is used. I have tested for TEZ application where fromId
works right way. 

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: YARN-5585.v0.patch
> TimelineReader REST API's provides lot of filters to retrieve the applications. Along
with those, it would be good to add new filter i.e fromId so that entities can be retrieved
after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is difficult.
> So proposal is to have fromId in the filter like *getApps?limit=5&&fromId=app-5*
which gives list of apps from app-6 to app-10. 
> This is very useful for pagination in web UI.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message