hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5585) [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters
Date Wed, 21 Dec 2016 19:35:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767944#comment-15767944

Sangjin Lee commented on YARN-5585:

Sorry for chiming in late on the discussion. I haven't reviewed the patch yet, but just to
state my opinion,

I'm fine with passing {{fromId}} with the prefix and id concatenated with a colon (":") for
multi-entity queries. I'm also OK with using only the prefix portion for such queries although
I don't expect this to be an important use case.

As for only specifying only the entity id for {{fromId}}, I don't know that this is important
at all. Pagination requests would be coming mostly from non-human clients (e.g. UI, scripted
REST clients, etc.), and as such they always have both pieces of information. It would be
strange for them not to provide the id prefix. I am comfortable with just throwing an exception
if the id prefix is missing in {{fromId}}.

For queries by entity id (i.e. single entity queries), as noted there are really 2 distinct
use cases: (1) queries with both id prefix and entity id (which would be mostly coming from
non-human clients), and (2) queries with only entity id. (1) is not ambiguous at all.

(2) can be further divided into 2 cases: (2-1) there was no id prefix written to the storage
(i.e. default prefix = 0), and (2-2) the client (most likely human) simply does not know the
id prefix.

Long story short, I think we can support (2) with Varun's suggestion:
I am wondering that can we utilize setting the start and stop row in Scan for this. Reason
being we know idprefix can have a range of 0 to max value of long. Thus, our start row can
be cluster!user!flow!runid!appid!entitytype!0!entityid and as stop row in not inclusive, we
can call TimelineStorageUtils#calculateTheClosestNextRowKeyForPrefix for cluster!user!flow!runid!appid!entitytype!LONG_MAX!entityid.
This would mean that typically only one row will be scanned. We can anyways break out of the
loop as soon as first row (which will be true for almost all the cases) is found. We can use
PageFilter of 1 to keep the Scan and result retrieved via it as small. Thoughts ?

If entity prefix was not specified, we could do this range scan. The only point to clarify
then is whether to stop at the first result or detect the case where there are multiple rows
and return an error. I am leaning slightly towards the former with the assumption that it
should be truly rare that there are multiple rows for the same entity id (otherwise it would
be a bug in the write path) and also for performance reasons.

For those cases where there was no id prefix (i.e. default) written, clients should still
set the id prefix (to 0) so that it becomes the first use case (1).

I'll go over the patch and post my feedback today. Thanks.

> [Atsv2] Reader side changes for entity prefix and support for pagination via additional
> -----------------------------------------------------------------------------------------------
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>              Labels: yarn-5355-merge-blocker
>         Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, YARN-5585-YARN-5355.0002.patch,
YARN-5585-YARN-5355.0003.patch, YARN-5585-workaround.patch, YARN-5585.v0.patch
> TimelineReader REST API's provides lot of filters to retrieve the applications. Along
with those, it would be good to add new filter i.e fromId so that entities can be retrieved
after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities then REST
call gives first/last 100 entities. How to retrieve next set of 100 entities i.e 101 to 200
OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is no way
to achieve this. 
> So proposal is to have fromId in the filter like *getApps?limit=5&&fromId=app-5*
which gives list of apps from app-6 to app-10. 
> Since ATS is targeting large number of entities storage, it is very common use case to
get next set of entities using fromId rather than querying all the entites. This is very useful
for pagination in web UI.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message