hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6027) Improve /flows API for more flexible filters fromid, collapse, userid
Date Wed, 08 Feb 2017 10:27:42 GMT

    [ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857803#comment-15857803

Varun Saxena commented on YARN-6027:

bq.  But it is expected to collapse with date range
Ok, then its fine. I was thinking we would try to display all the flows (say 10 on each page)
on UI. If it is based on daterange then it should be fine in terms of performance.
I guess we will probably display flows for the current day only.
We can probably leave a note in javadoc that a suitable daterange should be provided in general
for this REST endpoint.

bq.  User can directly provide flow entity ID as fromId.
Ohh you are providing ID itself. Maybe we would like to leave a note in javadoc and documentation
that cluster part of it will be ignored. And in case of collapse, cluster and timestamp will
be ignored. In UI case, cluster would be same as the one in REST endpoint but you can form
fromID manually as well and provide a different cluster ID than the one in REST URL path param
in that case. So we can make the behavior clear.

bq. If need to parse the errors, then why flow entity id is providing full row key as id?
I think need to change flow entity id format itself.
That is just for read. We do not make any decisions with it. But now we will. We can encode
or escape cluster and other stuff while creating ID in FlowActivityEntity itself but when
UI displays it, it may have to unescape it. Also we would need to unescape it after splitting
fromId. Changing format wont make much difference as some delimiter or the other will have
to be used and that will have to be escaped too. Right? Cluster ID is a plain string and we
have to assume it can be anything. This would have to be done just to make the system more
robust even if we are unlikely to have a certain delimiter in cluster or elsewhere.

bq. One optimization I can do is PageFilter can be applied in non-collapse mode
Yeah that can be done.

bq. If you look at the patch, I have removed PageFilter while scanning which gives all the
Ok...Cant we apply PageFilter in steps in collapse mode? Maybe override getResults itself.
When we use it with daterange it should be fine but in cases where daterange is not specified,
this may help. What I mean is get results from backend with PageFilter equivalent to limit.
Then collapse and go back and fetch results again if more records are required(based on limit).
Something like below. We need to check with however, if PageFilter, with limited but possible
multiple fetches will be better or getting all the data. I suspect former may be better especially
when size of table grows. Not a 100% sure though. 
int tmp=0;
while(tmp <= limit)
   get results with PageFilter= limit
   collapse records
   tmp=tmp + number of collpased flow entities in this iteration.
end while

Additionally, a few other comments.
# In TimelineEntityFilters class javadoc, we should document collapse.
# In javadoc for fromId you mention "The fromId values should be same as fromId info field
in flow entities. It defines flow entity id.". We do not have a fromId field in flow entities.
I guess you mean id.
# In TimelineReaderWebServices#getFlows, NumberFormatException can come for fromId as well.
In handleException we should pass the correct message for this.

> Improve /flows API for more flexible filters fromid, collapse, userid
> ---------------------------------------------------------------------
>                 Key: YARN-6027
>                 URL: https://issues.apache.org/jira/browse/YARN-6027
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>              Labels: yarn-5355-merge-blocker
>         Attachments: YARN-6027-YARN-5355.0001.patch
> In YARN-5585 , fromId is supported for retrieving entities. We need similar filter for
flows/flowRun apps and flow run and flow as well. 
> Along with supporting fromId, this JIRA should also discuss following points
> * Should we throw an exception for entities/entity retrieval if duplicates found?
> * TimelieEntity :
> ** Should equals method also check for idPrefix?
> ** Does idPrefix is part of identifiers?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message