incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Shah <hit...@apache.org>
Subject Re: [VOTE] Accept REEF into the Apache Incubator
Date Tue, 12 Aug 2014 01:20:01 GMT
+1 ( non-binding )

— Hitesh 

On Aug 8, 2014, at 10:40 PM, Byung-Gon Chun <bgchun@gmail.com> wrote:

> Hi,
> 
> Thanks for participating in the proposal discussion on REEF. The discussion
> has calmed. I would like to call a vote for acceptance of REEF into the
> Apache Incubator.
> 
> The proposal is attached below, and it is also available at
> https://wiki.apache.org/incubator/ReefProposal
> 
> Let's keep this vote open for three business days, closing the voting on
> August 11, 11:59PM (PDT).
> 
> [] +1 Accept REEF into the Incubator
> [] 0 Don't care
> [] -1 Don't accept REEF because...
> 
> Thanks!
> -Gon
> 
> -- 
> Byung-Gon Chun
> 
> 
> # REEFProposal - Incubator
> 
> 
> # Abstract
> 
> REEF (Retainable Evaluator Execution Framework) is a scale-out
> computing fabric that eases the development of Big Data applications
> on top of resource managers such as Apache YARN and Mesos.
> 
> 
> # Proposal
> 
> REEF is a Big Data system that makes it easy to implement scalable,
> fault-tolerant runtime environments for a range of data processing
> models (e.g., graph processing and machine learning) on top of
> resource managers such as Apache YARN and Mesos. REEF provides
> capabilities to run multiple heterogeneous frameworks and workflows of
> those efficiently.
> 
> Additionally, REEF contains two libraries that are of independent
> value: Wake is an event-based-programming framework inspired by Rx and
> SEDA.  Tang is a dependency injection framework inspired by Google
> Guice, but designed specifically for configuring distributed systems.
> 
> 
> # Background
> 
> The resource management layer such as Apache YARN and Mesos has
> emerged as a critical layer in the new scale-out data processing
> stack; resource managers assume the responsibility of multiplexing a
> cluster of shared-nothing machines across heterogeneous
> applications. They operate behind an interface for leasing containers
> - a slice of a machine’s resources - to computations in an elastic
> fashion. However, building data processing frameworks directly on this
> layer comes at a high cost: each framework must tackle the same
> challenges (e.g., fault-tolerance, task scheduling and coordination)
> and reimplement common mechanisms (e.g., caching, bulk transfers).
> 
> REEF provides a reusable control-plane for scheduling and coordinating
> task-level work on cluster resource managers. The REEF design enables
> sophisticated optimizations, such as container re-use and data
> caching, and facilitates workflows that span multiple
> frameworks. Examples include pipelining data between different
> operators in a relational system, retaining state across iterations in
> iterative or recursive data flow, and passing the result of a
> MapReduce job to a Machine Learning computation.
> 
> 
> # Rationale
> 
> Since REEF is a library that makes it easy to write distributed
> applications on top of Apache YARN or Mesos, the Apache Software Foundation
> is the perfect home for hosting REEF.
> 
> 
> # Current Status
> 
> REEF has been developed mostly by Microsoft, UCLA and the Seoul
> National University.  The REEF codebase is open-sourced under Apache
> License 2.0 and is currently hosted in a public repository at
> github.com.
> 
> 
> # Meritocracy
> 
> We plan to build a strong open community by following the Apache
> meritocracy principles. We will work with those who contribute
> significantly to the project and invite them to be its committers.
> 
> 
> # Community
> 
> REEF is currently being used internally at Microsoft.  Also, SK
> Telecom builds their data analytics infrastructure on top of REEF in
> collaboration with Seoul National University.  We hope to extend our
> contributor base by becoming an Apache incubator project. REEF will
> attract developers who are interested in creating common building
> blocks for simplifying the development of large-scale big data
> applications.
> 
> 
> # Core Developers
> 
> Core developers are engineers from Microsoft, Purestorage, UCB, UCLA,
> UW and Seoul National University.
> 
> 
> # Alignment
> 
> REEF depends on many Apache projects and dependencies. REEF is built
> on resource managers such as Apache YARN and Apache Mesos. REEF also
> uses HDFS as a distributed storage layer.
> 
> 
> # Known Risks
> ## Orphaned Products
> 
> The risk of REEF being orphaned is small because Microsoft products
> are built on REEF. The core REEF developers continue to work on REEF
> at Microsoft, UCLA, and Seoul National University. The REEF project is
> gaining interest from other institutions to be used as their
> infrastructure.
> 
> ## Inexperience with Open Source
> 
> Several core developers have experience with open source development.
> REEF committers will be guided by the mentors with strong Apache open
> source project backgrounds.
> 
> ## Homogeneous Developers
> 
> The initial committers include developers from several institutions
> including Microsoft, Purestorage, UCB, UCLA, and Seoul National
> University.
> 
> ## Reliance on Salaried Developers
> 
> Developers from Microsoft are paid to work on REEF. Since the work is
> used internally at Microsoft, Microsoft will keep supporting the
> developers to work on REEF. There are also engineers and graduate
> students that contribute to REEF from UCLA, UCB, UW and Seoul National
> University.  We plan to attract active developers from other
> institutions.
> 
> ## Relationships with Other Apache Products
> 
> Given REEF's position in the big data stack, there are three
> relationships to consider: Projects that fit below, on top of, or
> alongside REEF in the stack.
> 
> ### Below REEF: Mesos and YARN
> 
> REEF is designed to facilitate application development on top of
> resource managers.  Hence, its relationship with the aforementioned
> resource managers is symbiotic by design.
> 
> ### On Top of REEF
> 
> Apache Spark, Giraph, MapReduce and Flink are only some of the
> projects that logically belong at a higher layer of the big data stack
> than REEF.  Of course, none of these today actually are leveraging
> REEF and had to each individually solve some of the issues REEF
> addresses.  It is our goal that REEF will help developers create
> an even richer set of future big data frameworks.
> 
> ### Alongside REEF
> 
> Apache hosts several projects building intermediate, library layers on
> top of a resource management platform. Twill, Slider, and Tez are
> notable examples in the incubator. These projects share many
> objectives with REEF (and each other).  We expect these parallel
> explorations to converge and differentiate within Apache, as the space
> for distributed applications and deployment is too vast for a single
> answer.
> 
> Apache Twill and REEF both aim to simplify application development on
> top of resource managers.  However, REEF and Twill go about this in
> different ways: Twill simplifies programming by exposing a programming
> model, Java Threads.  REEF on the other hand provides a set of common
> building blocks (e.g., job coordination, state passing, cluster
> membership) for building big data processing applications and
> virtualizes underlying resources managers.  None of this prescribes a
> specific programming model.  As such, REEF occupies a slot ever so
> slightly below Twill in an architecture stack.
> 
> Apache Slider is a framework to make it easy to deploy and manage
> long-running static applications in a YARN cluster. The focus is to
> adapt existing applications such as HBase and Accumulo to run on YARN
> with little modification. Therefore, the goals of Slider and REEF are
> different.
> 
> Apache Tez is a project to develop a generic Directed Acyclic Graph (DAG)
> processing framework with a reusable set of data processing primitives.
> The initial focus is to provide improved data processing capabilities for
> projects like Apache Hive, Apache Pig, and Cascading. Tez is still a single
> framework for DAG processing.  In contrast, REEF provides a generic
> layer on which diverse computation models (DAG, ML, Graph processing,
> and Interactive query processing) can be built.  More importantly,
> REEF provides a layer that facilitates inter-framework resource and
> in-memory state use and virtualizes resource managers. Regarding
> re-usable data processing primitives, Tez and REEF share the same
> goal.  We hope to collaborate on features which can be shared between
> Tez and REEF.
> 
> Apache Helix automates application-wide management operations which require
> global knowledge and coordination, such as repartitioning of resources and
> scheduling of maintenance tasks. Helix separates global coordination
> concerns from the functional tasks of the application with a state machine
> abstraction. REEF's generic layer makes it easy to program the functional
> and management tasks, which may span small or large groups within the
> application. Helix can work hand-in-hand with REEF, by providing the global
> management component for REEF applications.
> 
> ## An Excessive Fascination with the Apache Brand
> 
> The Apache Software Foundation has a reputation of being the best place to
> host open source projects. We believe that we will attract many developers
> who want to contribute to innovating in the Big Data platform space by
> joining the Apache Software Foundation.
> 
> 
> # Documentation
> 
> The current documentation for REEF is at
> https://github.com/Microsoft-CISL/REEF as well as on
> http://www.reef-project.org
> 
> 
> # Initial Source
> 
> The REEF codebase is currently hosted at
> https://github.com/Microsoft-CISL/REEF.
> 
> 
> # External Dependencies
> 
> REEF makes extensive use of the vast array of Java libraries from the
> Apache Software Foundation, namely:
> 
> * avro (Apache 2.0)
> * hadoop (Apache 2.0)
> * hdfs (Apache 2.0)
> * yarn (Apache 2.0)
> * commons-cli (Apache 2.0)
> * commons-configuration (Apache 2.0)
> * commons-lang (Apache 2.0)
> * commons-logging (Apache 2.0)
> 
> To the best of our knowledge, the external dependencies of REEF are
> distributed under Apache compatible licenses:
> 
> * guava-libraries (Apache 2.0)
> * protobuf (BSD)
> * asm (BSD)
> * netty (Apache 2.0)
> * mockito (MIT)
> * junit (EPL 1.0)
> * slf4j (MIT)
> 
> 
> # Cryptography
> 
> REEF will depend on secure Hadoop, which can optionally use Kerberos.
> 
> # Required Resources
> 
> ## Mailing Lists
> 
>  * reef-private for private PMC discussions
>  * reef-dev for technical discussions among contributors and
>                 notification about commits
> 
> ## Subversion Directory
> 
> The REEF team uses Git for source version control:
> git://git.apache.org/reef
> 
> ## Issue Tracking
> 
> JIRA REEF (REEF)
> 
> ## Other Resources
> 
> Jenkins continuous integration testing
> 
> # Initial Committers
> 
> * Markus Weimer
> * Sergiy Matusevych
> * Julia Wang
> * Shravan M Narayanamurthy
> * Yingda Chen
> * Tony Majestro
> * Beysim Sezgin
> * Boris Shulman
> * Russell Sears
> * Jung Ryong Lee
> * You Sun Jung
> * Dong Joon Hyun
> * Josh Rosen
> * Tyson Condie
> * Brandon Myers
> * Yunseong Lee
> * Taegeon Um
> * Youngseok Yang
> * Brian Cho
> * Byung-Gon Chun
> 
> # Affiliations
> 
> * Microsoft:
>  * Markus Weimer
>  * Sergiy Matusevych
>  * Julia Wang
>  * Shravan M Narayanamurthy
>  * Yingda Chen
>  * Tony Majestro
>  * Beysim Sezgin
>  * Boris Shulman
> * Purestorage:
>  * Russell Sears
> * SK Telecom:
>  * Jung Ryong Lee
>  * You Sun Jung
>  * Dong Joon Hyun
> * University of California:
>  * Josh Rosen (Berkeley)
>  * Tyson Condie (LA)
> * University of Washington:
>  * Brandon Myers
> * Seoul National University:
>  * Yunseong Lee
>  * Taegeon Um
>  * Youngseok Yang
>  * Brian Cho
>  * Byung-Gon Chun
> 
> 
> # Sponsors
> 
> ## Champions
> Chris Douglas <cdouglas@apache.org>
> 
> ## Nominated Mentors
> * Chris Mattmann <mattmann@apache.org>
> * Ross Gardler <rgardler@apache.org>
> * Owen O'Malley <omalley@apache.org>
> 
> ## Sponsoring Entity
> The Apache Incubator


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message