hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Lu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3981) support timeline clients not associated with an application
Date Tue, 30 Aug 2016 19:29:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449912#comment-15449912

Li Lu commented on YARN-3981:

Thanks [~rohithsharma]! 

bq. As part of NM daemon, start new service same as TimeLineWriterWebService. Idea is NM reports
all these collector address to RM. Introduce new API in clientRMservice to get collector address.
Address is given by RM in random(This can be decided later). This address is used by timeline
client. TimeLineClient exposes new constructor with an flowName. So system properties can
be written at flow level.
Actually this looks a little bit similar to the current collector discovery mechanism, where
the NM reports app level collector information to RM, and RM distributes such information
to all containers. 

The difference is we need to explicitly decide where and when to launch the collectors. The
RM can decide where to launch collectors, but as of now, all collectors are associated with
some concrete application's life-cycles (launched as aux-services). We can launch collectors
as separate process for this use case? 

One concern is this will increase the load on the RM again. Not sure if this will be a problem
on busy clusters with a lot of client connections. However, this is definitely better than
launching a central server daemon to handle all client requests (which falls back to old ATS
v1 architecture). 

For storing those entities posted from clients, can we put them in the entity table, but just
leave some unknown fields empty? Will that be a concern for the storage API's semantics? 

> support timeline clients not associated with an application
> -----------------------------------------------------------
>                 Key: YARN-3981
>                 URL: https://issues.apache.org/jira/browse/YARN-3981
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Rohith Sharma K S
>              Labels: YARN-5355
> In the current v.2 design, all timeline writes must belong in a flow/application context
(cluster + user + flow + flow run + application).
> But there are use cases that require writing data outside the context of an application.
One such example is a higher level client (e.g. tez client or hive/oozie/cascading client)
writing flow-level data that spans multiple applications. We need to find a way to support

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message