hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
Date Tue, 03 Mar 2015 10:48:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344914#comment-14344914

Junping Du commented on YARN-3039:

Thanks for comments, [~Naganarasimha]!
bq. +1 for this approach. Also if NM uses this new blocking call in AMRMClient to get aggregator
address then there might not be any race conditions for posting AM container's life cycle
events by NM immediately after creation of appAggregator through Aux service.
Discussed with [~vinodkv] and [~zjshen] on this again offline. It looks heavy weight to make
TimelineClient to wrap AMRMClient especially for security reason it make NM to take AMRMTokens
for using TimelineClient in future which make less sense. To get rid of rack condition you
mentioned above, we propose to use observer pattern to make TimelineClient can listen aggregator
address update in AM or NM (wrap with retry logic to tolerant connection failure).

bq. Are we just adding a method to get the aggregator address aggregator address ? or what
other API's are planned ?
Per above comments, we have no plan to add API to TimelineClient to talk to RM directly.

bq. I beleive the idea of using AUX service was to to decouple NM and Timeline service. If
NM will notify RM about new appAggregator creation (based on AUX service) then basically NM
should be aware of PerNodeAggregatorServer is configured as AUX service, and and if it supports
rebinding appAggregator for failure then it should be able to communicate with this Auxservice
too, whether would this be clean approach?
I agree we want to decouple things here. However, AUX service is not the only way to deploy
app aggregators. There are other ways (check from diagram in YARN-3033) that app aggregators
could be deployed in a separate process or an independent container which make less sense
to have a protocol between AUX service and RM. I think now we should plan to add a protocol
between aggregator and NM, and then notify RM through NM-RM heartbeat on registering/rebind
for aggregator.

bq. I also feel we need to support to start per app aggregator only if app requests for it
(Zhijie also had mentioned abt this). If not we can make use of one default aggregator for
all these kind of apps launched in NM, which is just used to post container entities from
different NM's for these apps.
My 2 cents here is app aggregator should have logic to consolidate all messages (events and
metrics) for one application into more complex and flexible new data model. If each NM do
aggregation separately, then it still a *writer* (like old timeline service), but not an *aggregator*.

bq. Any discussions happened wrt RM having its own Aggregator ? I feel it would be better
for RM to have it as it need not depend on any NM's to post any entities.
Agree. I think we are on the same page now.
Will update proposal to reflect all these discussions (JIRA's and offline).

> [Aggregator wireup] Implement ATS app-appgregator service discovery
> -------------------------------------------------------------------
>                 Key: YARN-3039
>                 URL: https://issues.apache.org/jira/browse/YARN-3039
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Junping Du
>         Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, YARN-3039-no-test.patch
> Per design in YARN-2928, implement ATS writer service discovery. This is essential for
off-node clients to send writes to the right ATS writer. This should also handle the case
of AM failures.

This message was sent by Atlassian JIRA

View raw message