Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 21 Dec 2016 19:35:58 +0000 (UTC)
From: "Sangjin Lee (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.13001293.1472562035000.574518.1482348958446@Atlassian.JIRA>
In-Reply-To: <JIRA.13001293.1472562035000@Atlassian.JIRA>
References: <JIRA.13001293.1472562035000@Atlassian.JIRA> <JIRA.13001293.1472562035465@arcas>
Subject: [jira] [Commented] (YARN-5585) [Atsv2] Reader side changes for
 entity prefix and support for pagination via additional filters
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 21 Dec 2016 19:36:00 -0000


    [ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767944#comment-15767944 ] 

Sangjin Lee commented on YARN-5585:
-----------------------------------

Sorry for chiming in late on the discussion. I haven't reviewed the patch yet, but just to state my opinion,

I'm fine with passing {{fromId}} with the prefix and id concatenated with a colon (":") for multi-entity queries. I'm also OK with using only the prefix portion for such queries although I don't expect this to be an important use case.

As for only specifying only the entity id for {{fromId}}, I don't know that this is important at all. Pagination requests would be coming mostly from non-human clients (e.g. UI, scripted REST clients, etc.), and as such they always have both pieces of information. It would be strange for them not to provide the id prefix. I am comfortable with just throwing an exception if the id prefix is missing in {{fromId}}.

For queries by entity id (i.e. single entity queries), as noted there are really 2 distinct use cases: (1) queries with both id prefix and entity id (which would be mostly coming from non-human clients), and (2) queries with only entity id. (1) is not ambiguous at all.

(2) can be further divided into 2 cases: (2-1) there was no id prefix written to the storage (i.e. default prefix = 0), and (2-2) the client (most likely human) simply does not know the id prefix.

Long story short, I think we can support (2) with Varun's suggestion:
{quote}
I am wondering that can we utilize setting the start and stop row in Scan for this. Reason being we know idprefix can have a range of 0 to max value of long. Thus, our start row can be cluster!user!flow!runid!appid!entitytype!0!entityid and as stop row in not inclusive, we can call TimelineStorageUtils#calculateTheClosestNextRowKeyForPrefix for cluster!user!flow!runid!appid!entitytype!LONG_MAX!entityid. This would mean that typically only one row will be scanned. We can anyways break out of the loop as soon as first row (which will be true for almost all the cases) is found. We can use PageFilter of 1 to keep the Scan and result retrieved via it as small. Thoughts ?
{quote}

If entity prefix was not specified, we could do this range scan. The only point to clarify then is whether to stop at the first result or detect the case where there are multiple rows and return an error. I am leaning slightly towards the former with the assumption that it should be truly rare that there are multiple rows for the same entity id (otherwise it would be a bug in the write path) and also for performance reasons.

For those cases where there was no id prefix (i.e. default) written, clients should still set the id prefix (to 0) so that it becomes the first use case (1).

I'll go over the patch and post my feedback today. Thanks.

> [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters
> -----------------------------------------------------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>              Labels: yarn-5355-merge-blocker
>         Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, YARN-5585-YARN-5355.0002.patch, YARN-5585-YARN-5355.0003.patch, YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the applications. Along with those, it would be good to add new filter i.e fromId so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities then REST call gives first/last 100 entities. How to retrieve next set of 100 entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is no way to achieve this. 
> So proposal is to have fromId in the filter like *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to app-10. 
> Since ATS is targeting large number of entities storage, it is very common use case to get next set of entities using fromId rather than querying all the entites. This is very useful for pagination in web UI.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org