hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-5585) [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters
Date Thu, 22 Dec 2016 03:01:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768842#comment-15768842
] 

Rohith Sharma K S edited comment on YARN-5585 at 12/22/16 3:01 AM:
-------------------------------------------------------------------

Thanks [~sjlee0] for review comments

bq. I don't think we should set the info from the fromId to entity id prefix and entity id.
The entity id prefix and the entity id should be used for a true single-entity query context.
It would be confusing to "reuse" them to indicate the fromId. I would prefer an explicit fromId
fields in the context so it's crystal clear what they are.
I am not sure why do we need an extra field fromId in context. However, these are part of
existing context which can be re used. At most importantly, entityIdPrefix and entityId are
used for multiple-entity query context also which can be used for setting start row in range
scan. Lets take example for multiple rows, {code}
rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER!1!entityId-1
rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER!2!entityId-2
rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER!3!entityId-3
{code}
# When NO fromId is specified, then range scan start with below range. So, basically it scan
for all rows of given entityType like below.{code}
startRow: rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER!
stopRow: rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER"{code}
# When fromId=2:entity-2. Here scan start from 2nd row. {code}
startRow: rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER!2!entityId-2
stopRow: rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER"{code}
# When fromId=2. Here scan start from 2nd row. Note the difference from 2nd point, start row
is from entityIdPrefix.{code}
startRow: rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER!2!
stopRow: rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER"{code}

bq. Long story short, I think we can support (2) with Varun's suggestion:
Right, I have incorporated in the patch. Here need not scan from 0 to Long.Max rather we can
range scan for given entity type with filter. Scan range is like 1st point in above example.

bq. Finally, I know it's no longer directly used, but I think TimelineEntity.compareTo() needs
updating. It does not use the entity id prefix at all, and it's using the creation time which
is not very consistent with what we're doing. Can we update that method as part of this JIRA?

Sure, will handle in this JIRA only. 

bq. I am leaning slightly towards the former with the assumption that it should be truly rare
that there are multiple rows for the same entity id (otherwise it would be a bug in the write
path) and also for performance reasons.
Right, the reader will throw an error if it found more than one row. 


was (Author: rohithsharma):
Thanks [~sjlee0] for review comments

bq. I don't think we should set the info from the fromId to entity id prefix and entity id.
The entity id prefix and the entity id should be used for a true single-entity query context.
It would be confusing to "reuse" them to indicate the fromId. I would prefer an explicit fromId
fields in the context so it's crystal clear what they are.
I am not sure why do we need an extra field fromId in context. However, these are part of
existing context which can be re used. At most importantly, entityIdPrefix and entityId are
used for multiple-entity query context also which can be used for setting start row in range
scan. Lets take example for multiple rows, {code}
rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER!1!entityId-1
rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER!2!entityId-2
rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER!3!entityId-3
{code}
## When NO fromId is specified, then range scan start with below range. So, basically it scan
for all rows of given entityType like below.{code}
startRow: rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER!
stopRow: rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER"{code}
## When fromId=2:entity-2. Here scan start from 2nd row. {code}
startRow: rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER!2!entityId-2
stopRow: rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER"{code}
## When fromId=2. Here scan start from 2nd row. {code}
startRow: rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER!2!entityId-2
stopRow: rohithsharmaks!yarn_cluster!SleepJob!12345!application_1482156550070_0001!YARN_CONTAINER"{code}

bq. Long story short, I think we can support (2) with Varun's suggestion:
Right, I have incorporated in the patch. Here need not scan from 0 to Long.Max rather we can
range scan for given entity type with filter. Scan range is like 1st point in above example.

bq. Finally, I know it's no longer directly used, but I think TimelineEntity.compareTo() needs
updating. It does not use the entity id prefix at all, and it's using the creation time which
is not very consistent with what we're doing. Can we update that method as part of this JIRA?

Sure, will handle in this JIRA only. 

bq. I am leaning slightly towards the former with the assumption that it should be truly rare
that there are multiple rows for the same entity id (otherwise it would be a bug in the write
path) and also for performance reasons.
Right, the reader will throw an error if it found more than one row. 



> [Atsv2] Reader side changes for entity prefix and support for pagination via additional
filters
> -----------------------------------------------------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>              Labels: yarn-5355-merge-blocker
>         Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, YARN-5585-YARN-5355.0002.patch,
YARN-5585-YARN-5355.0003.patch, YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the applications. Along
with those, it would be good to add new filter i.e fromId so that entities can be retrieved
after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities then REST
call gives first/last 100 entities. How to retrieve next set of 100 entities i.e 101 to 200
OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is no way
to achieve this. 
> So proposal is to have fromId in the filter like *getApps?limit=5&&fromId=app-5*
which gives list of apps from app-6 to app-10. 
> Since ATS is targeting large number of entities storage, it is very common use case to
get next set of entities using fromId rather than querying all the entites. This is very useful
for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message