hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3981) offline collector: support timeline clients not associated with an application
Date Thu, 11 May 2017 11:45:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16006278#comment-16006278

Rohith Sharma K S commented on YARN-3981:

Thanks [~vrushalic] for skimming through doc. 

bq. Do I understand it correctly that flow collectors will run on each node that runs an NM
in the cluster?
No. Plan in to start one flow collector service that start only one container only. One flow
collector service serve for all offline timeline clients. 
However, number of flow collectors are admin configurable. So, there can be N collector service.
Our TimelineClient will be able to discover all N collector service and make use of only one
at a time. 

bq. How much traffic do we think might come in? Would it be similar to app table writes? If
not, is there a possibility we can run this on head node of the cluster like where RM or NNs
run? Not on the same node as RM but a node similar to RM, so that it's "outside" the cluster.
We have fairly big sized clusters and having each node run a collector may not be optimal.
As of today, traffic is very less compared to app collectors. Let say, when ever a HIVE executes
a query, this query details are published to atsv2. But we can not take a call on traffic
which is not guessable. 

bq. aggregation is not relevant I think for a flow collector. Or do we want to support it?
If not, we don't need to mention it under challenges, it is a non issue.
Yep, aggregation is not relevant. I will update doc. Btw, is it possible to support aggregation
at flow-run level? 

> offline collector: support timeline clients not associated with an application
> ------------------------------------------------------------------------------
>                 Key: YARN-3981
>                 URL: https://issues.apache.org/jira/browse/YARN-3981
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Rohith Sharma K S
>              Labels: YARN-5355
>         Attachments: YARN-3981- offline-collector-draft.pdf
> In the current v.2 design, all timeline writes must belong in a flow/application context
(cluster + user + flow + flow run + application).
> But there are use cases that require writing data outside the context of an application.
One such example is a higher level client (e.g. tez client or hive/oozie/cascading client)
writing flow-level data that spans multiple applications. We need to find a way to support

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message