hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints
Date Thu, 01 Sep 2016 12:40:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15455257#comment-15455257

Varun Saxena commented on YARN-5585:

So I though a little bit over it and I think there is a solution possible for fetching apps
within a cluster without much of performance impact. Because this seems to be your use case.

What we can do is that  we can get the required App IDs' from App to flow table first as app
ids' in this table are sorted and extract applicable flows from there. And then get data from
the application table using these unique flows to get more specific information about the
apps. Say pass a flow to appids' map. We have something called MultiRowRangeFilter in HBase
which can help us specify multiple row key ranges.
We can only return those apps which we found from app to flow table. 
And from a performance viewpoint we can assume there will always be a reasonable limit specified.
Assume, in a cluster we have applications from application_1111111_0001 to application_1111111_0034
(running or completed).
These apps will be stored in a descending order in app to flow table. 
Let us say you want to get latest 10 apps (i.e. limit in your query is 10).
What we can do is get first 10 apps from app to flow table i.e. application_1111111_0034 to
application_1111111_0025. We can use PageFilter to return only first 10 records. This is the
result set we can return back.
Assume application IDs' ending with _0034, _0031 and _0027 belong to flow1 and rest to flow2.
We can then use this info to query app table.

So to get detailed info for these 10 apps in a single shot from application table, what we
can do is as under :
* Create a MultiRowRangeFilter
* For flow1. add start row as {{cluster!user!flow1!application_1111111_0034}} and stop row
as {{cluster!user!flow1!application_1111111_0027}}. We can make stop row inclusive. We can
then add this start/stop row pair into the multi row range filter created.
* And for flow2, start row can be  {{cluster!user!flow2!application_1111111_0033}} and stop
row as  {{cluster!user!flow2!application_1111111_0024}}. We can then add this start/stop row
pair into the multi row range filter created.

This would be slower than getting all apps when flow or flow run is specified but would be
faster than doing full table scan of application table, especially when it grows large.

Maybe I can raise a separate JIRA for this and handle it there if this is a real use case.

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
> TimelineReader REST API's provides lot of filters to retrieve the applications. Along
with those, it would be good to add new filter i.e fromId so that entities can be retrieved
after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is difficult.
> So proposal is to have fromId in the filter like *getApps?limit=5&&fromId=app-5*
which gives list of apps from app-6 to app-10. 
> This is very useful for pagination in web UI.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message