hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
Date Tue, 13 Jan 2015 02:11:36 GMT

    [ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274597#comment-14274597
] 

Vinod Kumar Vavilapalli commented on YARN-2928:
-----------------------------------------------

Thanks for the design summary, Sangjin.

For public disclosure, a bunch of YARN community members synced offline about this design
discussion - tx to Joep Rottinghuis, Karthik Kambatla,, Li Lu, Mayank Bansal, Maysam Yabandeh,
Mohammad Kamrul Islam, Ram Venkatesh, Robert Kanter, Sangjin Lee, Vinod Kumar Vavilapalli,
Vrushali Channapattan, Zhijie Shen in no order.

Overall I'd like to push other efforts like YARN-2141, YARN-1012 to fit into the current architecture
being proposed in this JIRA. This is so that we don't duplicate stats collection between efforts.

One suggestion to the proposal - for the first cut, instead of spawning per AM container (Section
4.1) to represent an Application Level Aggregator (call it ALA), we can have a per-node agent
which serves multiple AMs running on the same node. Nothing else changes - NMs sending data
still have to discover the ALA, only the ALAs can send data to the underlying storage etc.
It's just that the ALA is not a special container to begin with. The advantages are that we
can postpone the hard part of scheduling, fault-tolerance of a special ALA container till
after we wire everything else. Even long term, for small apps in a cluster, ALA running inside/side-by-side
of NM with rate-limits reduces the 'heaviness' of the system. This per-node agent is very
useful outside of this context too. An additional shortcut for now is to also potentially
embed the ALA inside NM using say Aux Services. Obviously the biggest problem with a single
ALA per node or embedded ALA per node is resource-management - which we can defer for now
given it still runs system code and till we have everything else figured out.

On the process side, I propose we do work on a branch with a goal to borrow whatever code
is possible to from current Timeline service.

Regarding timelines (pun intended) I'd like to think that we have a first alpha release of
this as part of say 2.8.

> Application Timeline Server (ATS) next gen: phase 1
> ---------------------------------------------------
>
>                 Key: YARN-2928
>                 URL: https://issues.apache.org/jira/browse/YARN-2928
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Critical
>         Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and YARN-321.
Although it is a great feature, we have recognized several critical issues and features that
need to be addressed.
> This JIRA proposes the design and implementation changes to address those. This is phase
1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message