hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Templeton <dan...@cloudera.com>
Subject Re: [Vote] merge feature branch YARN-2915 (Federation) to trunk
Date Tue, 01 Aug 2017 20:25:35 GMT
Subru, sorry for the last minute contribution... :)  I've been looking 
at the branch, and I have two questions.

First, what's the out-of-box experience regarding the data store? Is the 
expectation that the user will have a database set up and ready to go?  
Will the state store set up the schema automatically, or is that on the 
user?  I don't see that in the docs.

Second, how well does federation play with Kerberos?  Anything special 
that needs to be configured to make it work?


On 7/25/17 8:24 PM, Subru Krishnan wrote:
> Hi all,
> Per earlier discussion [9], I'd like to start a formal vote to merge
> feature YARN Federation (YARN-2915) [1] to trunk. The vote will run for 7
> days, and will end Aug 1 7PM PDT.
> We have been developing the feature in a branch (YARN-2915 [2]) for a
> while, and we are reasonably confident that the state of the feature meets
> the criteria to be merged onto trunk.
> *Key Ideas*:
> YARN’s centralized design allows strict enforcement of scheduling
> invariants and effective resource sharing, but becomes a scalability
> bottleneck (in number of jobs and nodes) well before reaching the scale of
> our clusters (e.g., 20k-50k nodes).
> To address these limitations, we developed a scale-out, federation-based
> solution (YARN-2915). Our architecture scales near-linearly to datacenter
> sized clusters, by partitioning nodes across multiple sub-clusters (each
> running a YARN cluster of few thousands nodes). Applications can span
> multiple sub-clusters *transparently (i.e. no code change or recompilation
> of existing apps)*, thanks to a layer of indirection that negotiates with
> multiple sub-clusters' Resource Managers on behalf of the application.
> This design is structurally scalable, as it bounds the number of nodes each
> RM is responsible for. Appropriate policies ensure that the majority of
> applications reside within a single sub-cluster, thus further controlling
> the load on each RM. This provides near linear scale-out by simply adding
> more sub-clusters. The same mechanism enables pooling of resources from
> clusters owned and operated by different teams.
> Status:
>     - The version we would like to merge to trunk is termed "MVP" (minimal
>     viable product). The feature will have a complete end-to-end application
>     execution flow with the ability to span a single application across
>     multiple YARN (sub) clusters.
>     - There were 50+ sub-tasks that were that were completed as part of this
>     effort. Every patch has been reviewed and +1ed by a committer. Thanks to
>     Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
>     - Federation is designed to be built around YARN and consequently has
>     minimal code changes to core YARN. The relevant JIRAs that modify existing
>     YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
>     attention to ensure that if federation is disabled there is zero impact to
>     existing functionality (disabled by default).
>     - We found a few bugs as we went along which we fixed directly upstream
>     in trunk and/or branch-2.
>     - We have continuously rebasing the feature branch [2] so the merge
>     should be a straightforward cherry-pick.
>     - The current version has been rather thoroughly tested and is currently
>     deployed in a *10,000+ node federated YARN cluster that's running
>     upwards of 50k jobs daily with a reliability of 99.9%*.
>     - We have few ideas for follow-up extensions/improvements which are
>     tracked in the umbrella JIRA YARN-5597[3].
> Documentation:
>     - Quick start guide (maven site) - YARN-6484[4].
>     - Overall design doc[5] and the slide-deck [6] we used for our talk at
>     Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.
> Credits:
> This is a group effort that could have not been possible without the ideas
> and hard work of many other folks and we would like to specifically call
> out Giovanni, Botong & Ellen for their invaluable contributions. Also big
> thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
> Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
> many more) that helped us shape our ideas and code with very insightful
> feedback and comments.
> Cheers,
> Subru & Carlo
> [1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
> [2] https://github.com/apache/hadoop/tree/YARN-2915
> [3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
> [4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
> [5] https://issues.apache.org/jira/secure/attachment/12733292/Ya
> rn_federation_design_v1.pdf
> [6] https://issues.apache.org/jira/secure/attachment/1281922
> 9/YARN-Federation-Hadoop-Summit_final.pptx
> [7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
> [8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
> [9]
> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201706.mbox/%3CCAOScs9bSsZ7mzH15Y%2BSPDU8YuNUAq7QicjXpDoX_tKh3MS4HsA%40mail.gmail.com%3E

To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

View raw message