hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Lu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints
Date Wed, 07 Sep 2016 23:05:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15472083#comment-15472083

Li Lu commented on YARN-5585:

I think we're overcomplicating the problem here... I believe the general use case of this
JIRA is mostly on pagination: given an uniquely defined type of entities in one application,
if the total number of entities is greater than the given limit, can we provide an API to
allow fetching data in multiple batches. So right now we have <entity_001>, <entity_002>,
..., <entity_100>, and limit = 10. What we want is initially we fetch <entity_001>
to <entity_010>, then given fromId = entity_010, we fetch <entity_011> to <entity_020>,
and so on and so forth. According to Rohith's use case, I think it's totally fine to say that
all entities are ordered by their Ids lexicographically (especially for entities with proper
padding on numbers like container id). Actually, any consistent order will do the work for
pagination, the only problem is how to make it makes sense to the users. 

The real problem here is we need to return everything in an order sorted by their creation
time, which seems to be quite hard in our current data model. This was pretty easy in ATS
v1, where creation time is baked in the row key for each entity. I remember there were some
discussions about this a while ago, but the general conclusion was that we mainly rely on
the use cases themselves to guarantee consistency between creation time and entity id. To
me, the potential problem of sorting entities according to their creation time to implement
pagination is that we have to firstly fetch _all_ of them from HBase to form the order, which
really kills the most advantage of pagination. 

An ID encoder/decoder will be very helpful to this use case. However, having the application
write the encode/decode process seems to be introducing more load to application programmers.
It also introduces extra work for deployments since cluster operators need to handle third-party
plugins. Can we provide several "SORT BY" options for timeline entity types, so that we store
their ids accordingly? 

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: YARN-5585.v0.patch
> TimelineReader REST API's provides lot of filters to retrieve the applications. Along
with those, it would be good to add new filter i.e fromId so that entities can be retrieved
after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is difficult.
> So proposal is to have fromId in the filter like *getApps?limit=5&&fromId=app-5*
which gives list of apps from app-6 to app-10. 
> This is very useful for pagination in web UI.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message