hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6027) Support fromId for flows/flowrun apps
Date Wed, 04 Jan 2017 13:55:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798315#comment-15798315

Rohith Sharma K S commented on YARN-6027:

Thanks folks for the discussion. 

I had offline talk with Sunil regarding Yarn UI integration with ATSv2. Some of discussion
points are
# When flows are queried without any filters, there are duplicate flows entities are sent
from reader. Each duplicated flow contains aggregated flow runs details. CMIIAW, I think this
is because when same flow is run daily,  there will be 2(assuming 2 days it has run) entries
in FlowActivityTable. *While reading, if no filters are given then full table scan happens
for FlowActivityTable.* This result in duplicate entries of same flow name. To me, *current
behavior of retrieving flows should be restricted to current day only*(not even for last 24
hours, which can cause duplicated entries). 
# Now lets take for single day flow activities, if number of flows run is huge, lets say 1000,
then the REST API result is only 100 flow names where 100 is limit.  User can query by increasing
limit to 1000, but it is not ideal for UI rendering which would go into toss with many issues
like browser OOM. Issues is UI does not know how many flow are exist, and better solution
here is to render page by page for a single day. *At least pagination should be supported
for single day flow activities.* I know that in current HBase schema of flowActivity table,
pagination would be difficult to achieve but from API layer there should be filter for it.
Otherwise it is very pain full for UI developers who relay on ATSv2 data. 
# Date range and limit filter do not solve UI rendering issues which pagination solves. It
can only minimizes number of flows. Date range is supported, but with in a day, ranges are
not supported like 10 AM to 11 AM range. 
# And also I see that flow entities contains all the flow run details. Do we really need to
embed flowruns details in flow entities? Does not it become heavy? I think, flowrun information
in flow entities should treated as filter. However  there is a separate API to get all the

> Support fromId for flows/flowrun apps
> -------------------------------------
>                 Key: YARN-6027
>                 URL: https://issues.apache.org/jira/browse/YARN-6027
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>              Labels: yarn-5355-merge-blocker
> In YARN-5585 , fromId is supported for retrieving entities. We need similar filter for
flows/flowRun apps and flow run and flow as well. 
> Along with supporting fromId, this JIRA should also discuss following points
> * Should we throw an exception for entities/entity retrieval if duplicates found?
> * TimelieEntity :
> ** Should equals method also check for idPrefix?
> ** Does idPrefix is part of identifiers?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message