Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 15 Sep 2016 19:05:20 +0000 (UTC)
From: "Varun Saxena (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.13001293.1472562035000.583255.1473966320726@Atlassian.JIRA>
In-Reply-To: <JIRA.13001293.1472562035000@Atlassian.JIRA>
References: <JIRA.13001293.1472562035000@Atlassian.JIRA> <JIRA.13001293.1472562035465@arcas>
Subject: [jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in
 REST endpoints
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Thu, 15 Sep 2016 19:05:22 -0000


    [ https://issues.apache.org/jira/browse/YARN-5585?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D15494=
251#comment-15494251 ]=20

Varun Saxena commented on YARN-5585:
------------------------------------

Just to summarise the suggestions given for folks to refer to.

* Applications (like Tez) would know best how to interpret their entity IDs=
' and how they can be descendingly sorted. Most entity IDs' seem to have so=
me sort of monotonically increasing sequence like app ID. We can hence open=
 up a PUBLIC interface which ATSv2 users like Tez can implement to decide h=
ow to encode and decode a particular entity type so that it is stored in de=
scending sorted fashion (based on creation time) in ATSv2. Encoding and dec=
oding similar to AppIDConverter written in our code.Because if row keys the=
mselves can be sorted, this will be performance wise the best possible solu=
tion. Refer to [comment | https://issues.apache.org/jira/browse/YARN-5585?f=
ocusedCommentId=3D15470803&page=3Dcom.atlassian.jira.plugin.system.issuetab=
panels:comment-tabpanel#comment-15470803]
** _Pros of the approach:_=20
**# Lookup will be fast.
** _Cons of the approach:_=20
**# We are depending on application to provide some code for this to work. =
Corresponding JAR will have to be placed in classpath. Folks in other proje=
cts may not be pleased to not have inbuilt support for this in ATS.
**# Entity IDs' may not always have a monotonically increasing sequence lik=
e App IDs'.

* We can keep another table, say EntityCreationTable or EntityIndexTable wi=
th row key as {{cluster!user!flow!flowrun!app!entitytype!reverse entity cre=
ation time!entityid}}. We will make an entry into this table whenever creat=
ed time is reported for the entity. The real data would still reside in the=
 main entity table. Entities in this table will be sorted descendingly. On =
read side, we can first peek into this table to get relevant records in des=
cending fashion (based on limit and/or fromId) and then use this info to qu=
ery entity table. We can do this in two ways. We can get created times from=
 querying this index table and apply a filter of created time range. Or alt=
ernatively we can try out MultiRowRangeFilter. That from javadoc of HBase s=
eems to be efficient. We will have to do some processing to determine these=
 multiple row key ranges.  Refer to [comment | https://issues.apache.org/ji=
ra/browse/YARN-5585?focusedCommentId=3D15472669&page=3Dcom.atlassian.jira.p=
lugin.system.issuetabpanels:comment-tabpanel#comment-15472669]
** _Note:_  Client should not send different created times for the same ent=
ity otherwise that will lead to an additional row.  If different created ti=
me would be reported more than once we will have to consider the latest one=
.
** _Pros of the approach:_=20
**# Solution provided within ATS.
**# Extra write only when created time is reported.
** _Cons of the approach:_=20
**# Extra peek into the index table on the read side. Single entity read ca=
n still be served directly from entity table though.

* Another option would be to change the row key of entity table to cluster!=
user!flow!flowrun!app!entitytype!reverse entity creation time!entityid and =
have another table to map cluster!user!flow!flowrun!app!entitytype!entityid=
 to entity created time.
So for a single entity call (HBase Get) we will have to first peek into the=
 new table and then get records from entity table.
** _Cons of the approach:_=20
**# On write side, we will have to first lookup into the index table which =
has the entity created time or on every write client should supply entity c=
reated time. First would impact write performance and latter may not be fea=
sible for client to send.
**# What should be the row key if client does not supply created time on fi=
rst write but supplies the created time on a subsequent write.

cc [~sjlee0], [~vrushalic], [~rohithsharma], [~gtCarrera9]

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the applica=
tions. Along with those, it would be good to add new filter i.e fromId so t=
hat entities can be retrieved after the fromId.=20
> Current Behavior : Default limit is set to 100. If there are 1000 entitie=
s then REST call gives first/last 100 entities. How to retrieve next set of=
 100 entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=3D5* gives app-1 to app-5. But to retrieve next 5 apps, th=
ere is no way to achieve this.=20
> So proposal is to have fromId in the filter like *getApps?limit=3D5&&from=
Id=3Dapp-5* which gives list of apps from app-6 to app-10.=20
> Since ATS is targeting large number of entities storage, it is very commo=
n use case to get next set of entities using fromId rather than querying al=
l the entites. This is very useful for pagination in web UI.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org