hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints
Date Thu, 29 Sep 2016 13:24:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532764#comment-15532764

Sangjin Lee commented on YARN-5585:

Thanks [~rohithsharma] for your comments and input!

I'd like to structure the proposal in a way that hopefully answers some of your questions
and moves this forward.

To me one of the key goals here is to keep writes lean. In other words, we would like to avoid
write amplifications (no more auxiliary tables or double writes). Then it follows that the
client would need to provide this entity prefix not only when the entity is written for the
first time but also *on all subsequent updates*.

Providing this entity prefix on all writes and updates may not be practical or desired for
all cases. I can certainly see that this is not practical for YARN-generic entities (e.g.
containers). So IMO the *optionality* is a must here. If you don't want to have a different
sort order than the entity id order, you shouldn't be forced to do it.

In terms of what the entity prefix should be if you need it, a strong argument can be made
for using created time for everyone. However, again, providing the created timestamp for all
subsequent writes may not be practical. That would mean that the AM would need to keep track
of the created time for all their entities at all times. Perhaps that is trivial for certain
AMs, and not for others. It's all the more reason to come up with a simple prefix scheme that
can be easily provided in many situations. For example, if there is a number that can be easily
computed for your entity, that would be a perfect candidate for the entity prefix.

For Tez, if we introduce the entity prefix and you use the created time for this, either way
it would look exactly the same from the tez perspective. Whether we have a more flexible entity
prefix or explicit created time (both would be in the row key), it would work the same. The
client code would do either
client.writeEntity(entity); // pseudo-code
client.writeEntity(entity); // pseudo-code
The rest of the server code or how data is written, fetched and sorted would work in the same

Unfortunately I won't be able to attend today's call as I am away on a conference. Hopefully
this would help the discussion move forward.

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
> TimelineReader REST API's provides lot of filters to retrieve the applications. Along
with those, it would be good to add new filter i.e fromId so that entities can be retrieved
after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities then REST
call gives first/last 100 entities. How to retrieve next set of 100 entities i.e 101 to 200
OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is no way
to achieve this. 
> So proposal is to have fromId in the filter like *getApps?limit=5&&fromId=app-5*
which gives list of apps from app-6 to app-10. 
> Since ATS is targeting large number of entities storage, it is very common use case to
get next set of entities using fromId rather than querying all the entites. This is very useful
for pagination in web UI.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message