hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carlo Aldo Curino <carlo.cur...@gmail.com>
Subject Re: [Vote] merge feature branch YARN-2915 (Federation) to trunk
Date Thu, 27 Jul 2017 21:30:55 GMT
+1

Cheers,
Carlo

On Thu, Jul 27, 2017 at 12:45 PM, Arun Suresh <asuresh@apache.org> wrote:

> +1
>
> Cheers
> -Arun
>
> On Jul 25, 2017 8:24 PM, "Subru Krishnan" <subru@apache.org> wrote:
>
>> Hi all,
>>
>> Per earlier discussion [9], I'd like to start a formal vote to merge
>> feature YARN Federation (YARN-2915) [1] to trunk. The vote will run for 7
>> days, and will end Aug 1 7PM PDT.
>>
>> We have been developing the feature in a branch (YARN-2915 [2]) for a
>> while, and we are reasonably confident that the state of the feature meets
>> the criteria to be merged onto trunk.
>>
>> *Key Ideas*:
>>
>> YARN’s centralized design allows strict enforcement of scheduling
>> invariants and effective resource sharing, but becomes a scalability
>> bottleneck (in number of jobs and nodes) well before reaching the scale of
>> our clusters (e.g., 20k-50k nodes).
>>
>>
>> To address these limitations, we developed a scale-out, federation-based
>> solution (YARN-2915). Our architecture scales near-linearly to datacenter
>> sized clusters, by partitioning nodes across multiple sub-clusters (each
>> running a YARN cluster of few thousands nodes). Applications can span
>> multiple sub-clusters *transparently (i.e. no code change or recompilation
>> of existing apps)*, thanks to a layer of indirection that negotiates with
>> multiple sub-clusters' Resource Managers on behalf of the application.
>>
>>
>> This design is structurally scalable, as it bounds the number of nodes
>> each
>> RM is responsible for. Appropriate policies ensure that the majority of
>> applications reside within a single sub-cluster, thus further controlling
>> the load on each RM. This provides near linear scale-out by simply adding
>> more sub-clusters. The same mechanism enables pooling of resources from
>> clusters owned and operated by different teams.
>>
>> Status:
>>
>>    - The version we would like to merge to trunk is termed "MVP" (minimal
>>    viable product). The feature will have a complete end-to-end
>> application
>>    execution flow with the ability to span a single application across
>>    multiple YARN (sub) clusters.
>>    - There were 50+ sub-tasks that were that were completed as part of
>> this
>>    effort. Every patch has been reviewed and +1ed by a committer. Thanks
>> to
>>    Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
>>    - Federation is designed to be built around YARN and consequently has
>>    minimal code changes to core YARN. The relevant JIRAs that modify
>> existing
>>    YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
>>    attention to ensure that if federation is disabled there is zero
>> impact to
>>    existing functionality (disabled by default).
>>    - We found a few bugs as we went along which we fixed directly upstream
>>    in trunk and/or branch-2.
>>    - We have continuously rebasing the feature branch [2] so the merge
>>    should be a straightforward cherry-pick.
>>    - The current version has been rather thoroughly tested and is
>> currently
>>    deployed in a *10,000+ node federated YARN cluster that's running
>>    upwards of 50k jobs daily with a reliability of 99.9%*.
>>    - We have few ideas for follow-up extensions/improvements which are
>>    tracked in the umbrella JIRA YARN-5597[3].
>>
>>
>> Documentation:
>>
>>    - Quick start guide (maven site) - YARN-6484[4].
>>    - Overall design doc[5] and the slide-deck [6] we used for our talk at
>>    Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.
>>
>>
>> Credits:
>>
>> This is a group effort that could have not been possible without the ideas
>> and hard work of many other folks and we would like to specifically call
>> out Giovanni, Botong & Ellen for their invaluable contributions. Also big
>> thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
>> Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
>> many more) that helped us shape our ideas and code with very insightful
>> feedback and comments.
>>
>> Cheers,
>> Subru & Carlo
>>
>> [1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
>> [2] https://github.com/apache/hadoop/tree/YARN-2915
>> [3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
>> [4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
>> [5] https://issues.apache.org/jira/secure/attachment/12733292/Ya
>> rn_federation_design_v1.pdf
>> [6] https://issues.apache.org/jira/secure/attachment/1281922
>> 9/YARN-Federation-Hadoop-Summit_final.pptx
>> [7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
>> [8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
>> [9]
>> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201
>> 706.mbox/%3CCAOScs9bSsZ7mzH15Y%2BSPDU8YuNUAq7QicjXpDoX_tKh3M
>> S4HsA%40mail.gmail.com%3E
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message