Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5FA56200B85 for ; Thu, 15 Sep 2016 21:05:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5E306160AC6; Thu, 15 Sep 2016 19:05:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AC4AE160ABA for ; Thu, 15 Sep 2016 21:05:21 +0200 (CEST) Received: (qmail 21296 invoked by uid 500); 15 Sep 2016 19:05:20 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 21276 invoked by uid 99); 15 Sep 2016 19:05:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Sep 2016 19:05:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id B1F932C1B80 for ; Thu, 15 Sep 2016 19:05:20 +0000 (UTC) Date: Thu, 15 Sep 2016 19:05:20 +0000 (UTC) From: "Varun Saxena (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 15 Sep 2016 19:05:22 -0000 [ https://issues.apache.org/jira/browse/YARN-5585?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D15494= 251#comment-15494251 ]=20 Varun Saxena commented on YARN-5585: ------------------------------------ Just to summarise the suggestions given for folks to refer to. * Applications (like Tez) would know best how to interpret their entity IDs= ' and how they can be descendingly sorted. Most entity IDs' seem to have so= me sort of monotonically increasing sequence like app ID. We can hence open= up a PUBLIC interface which ATSv2 users like Tez can implement to decide h= ow to encode and decode a particular entity type so that it is stored in de= scending sorted fashion (based on creation time) in ATSv2. Encoding and dec= oding similar to AppIDConverter written in our code.Because if row keys the= mselves can be sorted, this will be performance wise the best possible solu= tion. Refer to [comment | https://issues.apache.org/jira/browse/YARN-5585?f= ocusedCommentId=3D15470803&page=3Dcom.atlassian.jira.plugin.system.issuetab= panels:comment-tabpanel#comment-15470803] ** _Pros of the approach:_=20 **# Lookup will be fast. ** _Cons of the approach:_=20 **# We are depending on application to provide some code for this to work. = Corresponding JAR will have to be placed in classpath. Folks in other proje= cts may not be pleased to not have inbuilt support for this in ATS. **# Entity IDs' may not always have a monotonically increasing sequence lik= e App IDs'. * We can keep another table, say EntityCreationTable or EntityIndexTable wi= th row key as {{cluster!user!flow!flowrun!app!entitytype!reverse entity cre= ation time!entityid}}. We will make an entry into this table whenever creat= ed time is reported for the entity. The real data would still reside in the= main entity table. Entities in this table will be sorted descendingly. On = read side, we can first peek into this table to get relevant records in des= cending fashion (based on limit and/or fromId) and then use this info to qu= ery entity table. We can do this in two ways. We can get created times from= querying this index table and apply a filter of created time range. Or alt= ernatively we can try out MultiRowRangeFilter. That from javadoc of HBase s= eems to be efficient. We will have to do some processing to determine these= multiple row key ranges. Refer to [comment | https://issues.apache.org/ji= ra/browse/YARN-5585?focusedCommentId=3D15472669&page=3Dcom.atlassian.jira.p= lugin.system.issuetabpanels:comment-tabpanel#comment-15472669] ** _Note:_ Client should not send different created times for the same ent= ity otherwise that will lead to an additional row. If different created ti= me would be reported more than once we will have to consider the latest one= . ** _Pros of the approach:_=20 **# Solution provided within ATS. **# Extra write only when created time is reported. ** _Cons of the approach:_=20 **# Extra peek into the index table on the read side. Single entity read ca= n still be served directly from entity table though. * Another option would be to change the row key of entity table to cluster!= user!flow!flowrun!app!entitytype!reverse entity creation time!entityid and = have another table to map cluster!user!flow!flowrun!app!entitytype!entityid= to entity created time. So for a single entity call (HBase Get) we will have to first peek into the= new table and then get records from entity table. ** _Cons of the approach:_=20 **# On write side, we will have to first lookup into the index table which = has the entity created time or on every write client should supply entity c= reated time. First would impact write performance and latter may not be fea= sible for client to send. **# What should be the row key if client does not supply created time on fi= rst write but supplies the created time on a subsequent write. cc [~sjlee0], [~vrushalic], [~rohithsharma], [~gtCarrera9] > [Atsv2] Add a new filter fromId in REST endpoints > ------------------------------------------------- > > Key: YARN-5585 > URL: https://issues.apache.org/jira/browse/YARN-5585 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader > Reporter: Rohith Sharma K S > Assignee: Rohith Sharma K S > Priority: Critical > Attachments: YARN-5585.v0.patch > > > TimelineReader REST API's provides lot of filters to retrieve the applica= tions. Along with those, it would be good to add new filter i.e fromId so t= hat entities can be retrieved after the fromId.=20 > Current Behavior : Default limit is set to 100. If there are 1000 entitie= s then REST call gives first/last 100 entities. How to retrieve next set of= 100 entities i.e 101 to 200 OR 900 to 801? > Example : If applications are stored database, app-1 app-2 ... app-10. > *getApps?limit=3D5* gives app-1 to app-5. But to retrieve next 5 apps, th= ere is no way to achieve this.=20 > So proposal is to have fromId in the filter like *getApps?limit=3D5&&from= Id=3Dapp-5* which gives list of apps from app-6 to app-10.=20 > Since ATS is targeting large number of entities storage, it is very commo= n use case to get next set of entities using fromId rather than querying al= l the entites. This is very useful for pagination in web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org