hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Templeton <dan...@cloudera.com>
Subject Re: [Vote] merge feature branch YARN-2915 (Federation) to trunk
Date Tue, 01 Aug 2017 20:50:52 GMT
Thanks, Subru!  Carry on. :)

Daniel

On 8/1/17 1:42 PM, Subru Krishnan wrote:
> Hi Daniel,
>
> You were just on time, myself & Carlo were just talking about moving
> forward with the merge :).
>
> To answer your questions:
>
>     1. The expectation about the store is that user will have a database set
>     up (we only link to install instructions page) but we do have the scripts
>     for the schema and stored procedures. This is in fact called out in the doc
>     in the *State Store* section (just before *Running a Sample Job).
> *Additionally
>     we are working on a ZK based implementation of the store. Inigo has patch
>     in YARN-6900[1].
>     2. We rely on existing YARN/Hadoop security mechanisms for running
>     application on Federation as-is so you should not need any additional
>     Kerberos configuration. Disclaimer: we don't use Kerberos for securing
>     Hadoop but rely on our production infrastructure.
>
> Thanks,
> Subru
>
> [1] https://issues.apache.org/jira/browse/YARN-6900
>
> On Tue, Aug 1, 2017 at 1:25 PM, Daniel Templeton <daniel@cloudera.com>
> wrote:
>
>> Subru, sorry for the last minute contribution... :)  I've been looking at
>> the branch, and I have two questions.
>>
>> First, what's the out-of-box experience regarding the data store? Is the
>> expectation that the user will have a database set up and ready to go?
>> Will the state store set up the schema automatically, or is that on the
>> user?  I don't see that in the docs.
>>
>> Second, how well does federation play with Kerberos?  Anything special
>> that needs to be configured to make it work?
>>
>> Daniel
>>
>> On 7/25/17 8:24 PM, Subru Krishnan wrote:
>>
>>> Hi all,
>>>
>>> Per earlier discussion [9], I'd like to start a formal vote to merge
>>> feature YARN Federation (YARN-2915) [1] to trunk. The vote will run for 7
>>> days, and will end Aug 1 7PM PDT.
>>>
>>> We have been developing the feature in a branch (YARN-2915 [2]) for a
>>> while, and we are reasonably confident that the state of the feature meets
>>> the criteria to be merged onto trunk.
>>>
>>> *Key Ideas*:
>>>
>>> YARN’s centralized design allows strict enforcement of scheduling
>>> invariants and effective resource sharing, but becomes a scalability
>>> bottleneck (in number of jobs and nodes) well before reaching the scale of
>>> our clusters (e.g., 20k-50k nodes).
>>>
>>>
>>> To address these limitations, we developed a scale-out, federation-based
>>> solution (YARN-2915). Our architecture scales near-linearly to datacenter
>>> sized clusters, by partitioning nodes across multiple sub-clusters (each
>>> running a YARN cluster of few thousands nodes). Applications can span
>>> multiple sub-clusters *transparently (i.e. no code change or recompilation
>>> of existing apps)*, thanks to a layer of indirection that negotiates with
>>> multiple sub-clusters' Resource Managers on behalf of the application.
>>>
>>>
>>> This design is structurally scalable, as it bounds the number of nodes
>>> each
>>> RM is responsible for. Appropriate policies ensure that the majority of
>>> applications reside within a single sub-cluster, thus further controlling
>>> the load on each RM. This provides near linear scale-out by simply adding
>>> more sub-clusters. The same mechanism enables pooling of resources from
>>> clusters owned and operated by different teams.
>>>
>>> Status:
>>>
>>>      - The version we would like to merge to trunk is termed "MVP" (minimal
>>>      viable product). The feature will have a complete end-to-end
>>> application
>>>      execution flow with the ability to span a single application across
>>>      multiple YARN (sub) clusters.
>>>      - There were 50+ sub-tasks that were that were completed as part of
>>> this
>>>      effort. Every patch has been reviewed and +1ed by a committer. Thanks
>>> to
>>>      Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
>>>      - Federation is designed to be built around YARN and consequently has
>>>      minimal code changes to core YARN. The relevant JIRAs that modify
>>> existing
>>>      YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
>>>      attention to ensure that if federation is disabled there is zero
>>> impact to
>>>      existing functionality (disabled by default).
>>>      - We found a few bugs as we went along which we fixed directly
>>> upstream
>>>      in trunk and/or branch-2.
>>>      - We have continuously rebasing the feature branch [2] so the merge
>>>      should be a straightforward cherry-pick.
>>>      - The current version has been rather thoroughly tested and is
>>> currently
>>>      deployed in a *10,000+ node federated YARN cluster that's running
>>>      upwards of 50k jobs daily with a reliability of 99.9%*.
>>>      - We have few ideas for follow-up extensions/improvements which are
>>>      tracked in the umbrella JIRA YARN-5597[3].
>>>
>>>
>>> Documentation:
>>>
>>>      - Quick start guide (maven site) - YARN-6484[4].
>>>      - Overall design doc[5] and the slide-deck [6] we used for our talk at
>>>      Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.
>>>
>>>
>>> Credits:
>>>
>>> This is a group effort that could have not been possible without the ideas
>>> and hard work of many other folks and we would like to specifically call
>>> out Giovanni, Botong & Ellen for their invaluable contributions. Also big
>>> thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
>>> Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
>>> many more) that helped us shape our ideas and code with very insightful
>>> feedback and comments.
>>>
>>> Cheers,
>>> Subru & Carlo
>>>
>>> [1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
>>> [2] https://github.com/apache/hadoop/tree/YARN-2915
>>> [3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
>>> [4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
>>> [5] https://issues.apache.org/jira/secure/attachment/12733292/Ya
>>> rn_federation_design_v1.pdf
>>> [6] https://issues.apache.org/jira/secure/attachment/1281922
>>> 9/YARN-Federation-Hadoop-Summit_final.pptx
>>> [7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
>>> [8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
>>> [9]
>>> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201
>>> 706.mbox/%3CCAOScs9bSsZ7mzH15Y%2BSPDU8YuNUAq7QicjXpDoX_tKh3M
>>> S4HsA%40mail.gmail.com%3E
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: yarn-dev-help@hadoop.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Mime
View raw message