hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints
Date Thu, 06 Oct 2016 05:29:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550930#comment-15550930

Rohith Sharma K S commented on YARN-5585:

bq. We also need to be *crystal clear* that timeline clients *must* provide the same prefix
for all subsequent updates of the same entity. I cannot stress that point enough. Rohith,
could you confirm that it is not an issue with Tez to provide the created time for any subsequent
updates for Tez entities?
This is very important point for TimelineClient users who wants to use prefixId. Even though
I am in minority side of introducing *optional* prefixId, convinced myself to go ahead with
it because of at least optionality(flexibility) is better than predefined storage specific
sort order.  And knowing the issue is with storage layer which trying to solve popping the
issue up to API by providing an optionality prefix, which exposing flaw in API so that user
can mess up the storage which result in inconsistent data while retrieving. 
I had offline talk with one of the Tez developer, and he is fine to provide prefixId. Some
concerns expressed by him are, Firstly about multi JVM which makes application programmer
to define new protocol for transferring prefixId.  Secondly, what if users misses providing
an prefixId in subsequent updates.? This will makes storage mess up with data stored in 2
different entry or it can be multiple entry.

bq. I'm also realizing that we might have a bug in how we deal with entity id's. I would have
thought that we store the entities in the reverse entity id order, but it appears that the
entity id is encoded into the row key as is (EntityRowKey). Am I reading that right? If so,
this is a bug to fix.
Sorry I could not get much. Could you explain bit elaborately. Do you mean reversing the only
entityId i.e if entityId is "12345" then "54321" OR row-key itself?

bq. One other thing to deal with is the query by id. There, we need to be able to distinguish
the case where the data do not have the prefix to begin with and that where data do. Ideally
we would simply use the row key explicitly in the case of data that don't have the prefix
to begin with. For those that do have the prefix, we cannot use the row key to fetch the row
so we need to do something different. I don't think this was done in the current patch, but
this is TBD.
I was thinking to use same REST API for both by using SingleColumnFilter. One cons I see is
table scan for all the entityType i.e reflect in read performance.

Other comments, let me handle it. And also, I will create patch on YARN-5355 branch.

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: 0001-YARN-5585.patch, YARN-5585-workaround.patch, YARN-5585.v0.patch
> TimelineReader REST API's provides lot of filters to retrieve the applications. Along
with those, it would be good to add new filter i.e fromId so that entities can be retrieved
after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities then REST
call gives first/last 100 entities. How to retrieve next set of 100 entities i.e 101 to 200
OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is no way
to achieve this. 
> So proposal is to have fromId in the filter like *getApps?limit=5&&fromId=app-5*
which gives list of apps from app-6 to app-10. 
> Since ATS is targeting large number of entities storage, it is very common use case to
get next set of entities using fromId rather than querying all the entites. This is very useful
for pagination in web UI.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message