hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints
Date Thu, 29 Sep 2016 12:02:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532592#comment-15532592

Varun Saxena commented on YARN-5585:

bq. In a distributed cluster, we can expect source of origin of same entity types from different
JVM. For example in MR, what if YarnChild's want to publish its entities with taskId? How
can each yarn child knows about entityPrefixId? Only uniqueness in cluster will be timestamp.
Frankly, by design, application level entities will be published by AM. Only it has access
to the collector address and in a secure setup will have access to token to publish to collectors.
We do not forward this info to containers. AM can however forward this information to other
processes which can then potentially publish entities but if specific AMs' can do that, they
can easily push the prefix as well. However, task level or its child entities will be different
and will frankly have their own unique prefix.

bq. If entityPrefixId is string
We were thinking of it to be a long. Intention of prefix is to help get a sort order. Numbers
can easily achieve that. Haven't reached a conclusion on this though. Needs to be further

bq. If we look at the problem , this issue is from storage layer. 
Frankly we cannot necessarily say ordering is a storage issue as no storage would naturally
provide a created time sort ordering. Even insertion order is not necessary. We had to do
some plumbing up even for Level DB and this would be even more difficult for HDFS storage.
Even for timeline service as a whole (irrespective of storage), technically it should be fine
if it provides you a way to retrieve the entities which you want. 
I understand though entity retrieval by created time sort order, is the most common use case.
That is why even I was initially of the opinion that we should have inherent support for created
time ordering. We can go with an index table for created time as suggested earlier. But this
would incur read side penalty. Or we can have created time as part of entity table row key
but this would mean write side penalty too because you would not know what was the created
time of the entity supplied. We can however force user to send created time in every entity.

As you were not there in last meeting, your point of view was missing. We can revisit this
again in today's meeting.
The only way this can be solved at timeline service layer without invoking API change is to
have another table to assist in retrieval. But this would then incur read/write penalties.
Can we do something in coprocessor i.e. do something in prePut or preScan to support created
time use case ? Well I am not really aware of the cost incurred due to this so will have to

bq. In future, if any other storage is plugged entity prefix would become stale.
Maybe or maybe not. They can potentially use it for indexing as well.

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
> TimelineReader REST API's provides lot of filters to retrieve the applications. Along
with those, it would be good to add new filter i.e fromId so that entities can be retrieved
after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities then REST
call gives first/last 100 entities. How to retrieve next set of 100 entities i.e 101 to 200
OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is no way
to achieve this. 
> So proposal is to have fromId in the filter like *getApps?limit=5&&fromId=app-5*
which gives list of apps from app-6 to app-10. 
> Since ATS is targeting large number of entities storage, it is very common use case to
get next set of entities using fromId rather than querying all the entites. This is very useful
for pagination in web UI.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message