hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5585) [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters
Date Mon, 19 Dec 2016 17:21:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15761743#comment-15761743
] 

Varun Saxena commented on YARN-5585:
------------------------------------

Thanks [~rohithsharma] for the patch.

bq. For single entity retrieval, when IdPrefix is not known, need to match column value for
entityType by doing range scan. Any other way this can achieve this?
I am wondering that can we utilize setting the start and stop row in Scan for this. Reason
being we know idprefix can have a range of 0 to max value of long. Thus, our start row can
be {{cluster!user!flow!runid!appid!entitytype!0!entityid}} and as stop row in not inclusive,
we can call TimelineStorageUtils#calculateTheClosestNextRowKeyForPrefix for {{cluster!user!flow!runid!appid!entitytype!LONG_MAX!entityid}}.
This would mean that typically only one row will be scanned. We can anyways break out of the
loop as soon as first row (which will be true for almost all the cases) is found. We can use
PageFilter of 1 to keep the Scan and result retrieved via it as small. Thoughts ?

bq.  FromId can be passed as filter where in fromId=idPrefix!entityId
As idPrefix is numeric any separator should be fine as we won't have to encode it. Prefer
to use those separators which do not require URL encoding.

bq.  If we plan to reuse same API's.
I think we can reuse same APIs'. We can add a new query param, say idprefix and we can document
that query retrieval will be slightly faster if idprefix is provided. Would like to know what
others think about this though.

bq. we need to handle one scenario where same entityId is published with 2 entityIdPrefix.
entityIdPrefix is mandatorily written even though user do not provide any idPrefix while publishing
entities. So, if case of idPrefix is not known, should we use default idPrefix to get a row?
This will be tricky. We can follow what I mentioned in point 1 (if feasible) and break out
of the loop on first row. 
If we just use 0 (default idprefix) we wont be able to support direct queries by user based
on say, container id, task id, etc. where the user may not know about the corresponding prefix.
Another option could be that if more than one row is encountered for a single entity read,
we send some sort of error message indicating multiple idprefixes in backend which can alert
the user/application of some issue on the write side.

> [Atsv2] Reader side changes for entity prefix and support for pagination via additional
filters
> -----------------------------------------------------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>              Labels: oct16-hard
>         Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, YARN-5585-YARN-5355.0002.patch,
YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the applications. Along
with those, it would be good to add new filter i.e fromId so that entities can be retrieved
after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities then REST
call gives first/last 100 entities. How to retrieve next set of 100 entities i.e 101 to 200
OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is no way
to achieve this. 
> So proposal is to have fromId in the filter like *getApps?limit=5&&fromId=app-5*
which gives list of apps from app-6 to app-10. 
> Since ATS is targeting large number of entities storage, it is very common use case to
get next set of entities using fromId rather than querying all the entites. This is very useful
for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message