hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6027) Improve /flows API for more flexible filters fromid, collapse, userid
Date Wed, 08 Feb 2017 02:40:41 GMT

    [ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857267#comment-15857267

Rohith Sharma K S commented on YARN-6027:

Thanks [~varun_saxena] for the review.. 
bq.Do we need cluster ID in fromId because we are ignoring it completely?
Yes, it is required even though it is ignored, considering when fromId is being used. Do not
want user to parse something and provide it as fromId. User can directly provide flow entity
ID as fromId. Lets reader server handles it. Cluster Id check can be done to verify context
cluster and from clusterId are equal. Ideally both should match. Otherwise we can throw exception.

bq. If there is a / in cluster ID we may have to escape it to avoid parsing errors.
If need to parse the errors, then why flow entity id is providing full row key as id? I think
need to change flow entity id format itself. 

bq. If we use collapse, even with fromId, there seems to be a full table scan which will impact
Yes, it does table scan. But it is expected to collapse with date range otherwise default
behavior of  /flows should be changed to give one day flows rather than full table data. It
is a engineering issue, and may be can mention like performance will be bit slow. 

bq. Maybe we can send the last real ID in info field of last flow activity entity if previous
query was made with collapse field
Initially idea was to send last real id as fromId field info. But flows are stored per day
for each user which not useful. Note that when collapse is used, we must scan to get all entities
and apply fromId. Scanning can't be done half the way which end up in redundant entries for
the user. Given previous comment is satisfied this should not be an issue. 

bq. you have mentioned that fromId validation is happening in getResult method. Could not
find it
ahh, I think I have missed it at global level. I have validating in one condition. Will validate
at global level.

bq. In processResults we first get the result from backend while applying limit and then process
result for collapse and fromId filters.
If you look at the patch, I have removed PageFilter while scanning which gives all the data.
One optimization I can do is PageFilter can be applied in non-collapse mode because in non
collapse mode scanning will start from given fromId. But the same logic can not be used for
collapse mode. 

> Improve /flows API for more flexible filters fromid, collapse, userid
> ---------------------------------------------------------------------
>                 Key: YARN-6027
>                 URL: https://issues.apache.org/jira/browse/YARN-6027
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>              Labels: yarn-5355-merge-blocker
>         Attachments: YARN-6027-YARN-5355.0001.patch
> In YARN-5585 , fromId is supported for retrieving entities. We need similar filter for
flows/flowRun apps and flow run and flow as well. 
> Along with supporting fromId, this JIRA should also discuss following points
> * Should we throw an exception for entities/entity retrieval if duplicates found?
> * TimelieEntity :
> ** Should equals method also check for idPrefix?
> ** Does idPrefix is part of identifiers?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message