incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Shaposhnik <...@apache.org>
Subject Re: [PROPOSAL] REEF for the Apache Incubator
Date Fri, 08 Aug 2014 23:32:08 GMT
Looks like the feedback has been well received.

Any reason not to start a vote?

Thanks,
Roman.

On Mon, Aug 4, 2014 at 11:12 PM, Byung-Gon Chun <bgchun@gmail.com> wrote:
> Hi Jake,
>
> Thank you for the comment.
>
> We had discussions on how to structure mailing lists with our mentors.
> We took our mentors' suggestions to start with a minimal set (two mailing
> lists) not to miss important discussions and to split them if there are
> demands.
>
> Thanks!
> -Gon
>
> ---
> Byung-Gon Chun
>
>
>
>
>
>
> On Tue, Aug 5, 2014 at 3:04 AM, Jake Farrell <jfarrell@apache.org> wrote:
>
>> Would suggest you use the following format for the mailing lists (you have
>> the older format listed) and also split the dev and commits. Also a lot of
>> new projects have been also splitting out the jira issues from dev to cut
>> down on noise on the dev list, would add issues@reef if you want to do
>> this.
>>
>> private@reef for private PMC discussions
>> dev@reef for technical discussions
>> commits@reef notification about commits
>> issues@reef jira notifications
>>
>> -Jake
>>
>>
>>
>> On Fri, Aug 1, 2014 at 3:14 AM, Byung-Gon Chun <bgchun@gmail.com> wrote:
>>
>> > Hi everyone,
>> >
>> > I would like to propose REEF to be an Apache Incubator project. REEF is a
>> > scale-out computing fabric that eases the development of Big Data
>> > applications on top of resource managers such as Apache YARN and Mesos.
>> >
>> > The proposal is included in plain text below. I would also like to put
>> this
>> > on wiki but I don't have privileges to create wiki pages.
>> >
>> > I look forward to hearing everyone's thoughts and feedback!
>> >
>> > -Gon
>> >
>> > --
>> > Byung-Gon Chun
>> >
>> >
>> > ===
>> >
>> > # REEFProposal - Incubator
>> >
>> >
>> > # Abstract
>> >
>> > REEF (Retainable Evaluator Execution Framework) is a scale-out
>> > computing fabric that eases the development of Big Data applications
>> > on top of resource managers such as Apache YARN and Mesos.
>> >
>> >
>> > # Proposal
>> >
>> > REEF is a Big Data system that makes it easy to implement scalable,
>> > fault-tolerant runtime environments for a range of data processing
>> > models (e.g., graph processing and machine learning) on top of
>> > resource managers such as Apache YARN and Mesos. REEF provides
>> > capabilities to run multiple heterogeneous frameworks and workflows of
>> > those efficiently.
>> >
>> > Additionally, REEF contains two libraries that are of independent
>> > value: Wake is an event-based-programming framework inspired by Rx and
>> > SEDA.  Tang is a dependency injection framework inspired by Google
>> > Guice, but designed specifically for configuring distributed systems.
>> >
>> >
>> > # Background
>> >
>> > The resource management layer such as Apache YARN and Mesos has
>> > emerged as a critical layer in the new scale-out data processing
>> > stack; resource managers assume the responsibility of multiplexing a
>> > cluster of shared-nothing machines across heterogeneous
>> > applications. They operate behind an interface for leasing containers
>> > - a slice of a machine’s resources - to computations in an elastic
>> > fashion. However, building data processing frameworks directly on this
>> > layer comes at a high cost: each framework must tackle the same
>> > challenges (e.g., fault-tolerance, task scheduling and coordination)
>> > and reimplement common mechanisms (e.g., caching, bulk transfers).
>> >
>> > REEF provides a reusable control-plane for scheduling and coordinating
>> > task-level work on cluster resource managers. The REEF design enables
>> > sophisticated optimizations, such as container re-use and data
>> > caching, and facilitates workflows that span multiple
>> > frameworks. Examples include pipelining data between different
>> > operators in a relational system, retaining state across iterations in
>> > iterative or recursive data flow, and passing the result of a
>> > MapReduce job to a Machine Learning computation.
>> >
>> >
>> > # Rationale
>> >
>> > Since REEF is a library that makes it easy to write distributed
>> > applications on top of Apache YARN or Mesos, the Apache Software
>> Foundation
>> > is the perfect home for hosting REEF.
>> >
>> >
>> > # Current Status
>> >
>> > REEF has been developed mostly by Microsoft, UCLA and the Seoul
>> > National University.  The REEF codebase is open-sourced under Apache
>> > License 2.0 and is currently hosted in a public repository at
>> > github.com.
>> >
>> >
>> > # Meritocracy
>> >
>> > We plan to build a strong open community by following the Apache
>> > meritocracy principles. We will work with those who contribute
>> > significantly to the project and invite them to be its committers.
>> >
>> >
>> > # Community
>> >
>> > REEF is currently being used internally at Microsoft.  Also, SK
>> > Telecom builds their data analytics infrastructure on top of REEF in
>> > collaboration with Seoul National University.  We hope to extend our
>> > contributor base by becoming an Apache incubator project. REEF will
>> > attract developers who are interested in creating common building
>> > blocks for simplifying the development of large-scale big data
>> > applications.
>> >
>> >
>> > # Core Developers
>> >
>> > Core developers are engineers from Microsoft, Purestorage, UCB, UCLA,
>> > UW and Seoul National University.
>> >
>> >
>> > # Alignment
>> >
>> > REEF depends on many Apache projects and dependencies. REEF is built
>> > on resource managers such as Apache YARN and Apache Mesos. REEF also
>> > uses HDFS as a distributed storage layer.
>> >
>> >
>> > # Known Risks
>> > ## Orphaned Products
>> >
>> > The risk of REEF being orphaned is small because Microsoft products
>> > are built on REEF. The core REEF developers continue to work on REEF
>> > at Microsoft, UCLA, and Seoul National University. The REEF project is
>> > gaining interest from other institutions to be used as their
>> > infrastructure.
>> >
>> > ## Inexperience with Open Source
>> >
>> > Several core developers have experience with open source development.
>> > REEF committers will be guided by the mentors with strong Apache open
>> > source project backgrounds.
>> >
>> > ## Homogeneous Developers
>> >
>> > The initial committers include developers from several institutions
>> > including Microsoft, Purestorage, UCB, UCLA, and Seoul National
>> > University.
>> >
>> > ## Reliance on Salaried Developers
>> >
>> > Developers from Microsoft are paid to work on REEF. Since the work is
>> > used internally at Microsoft, Microsoft will keep supporting the
>> > developers to work on REEF. There are also engineers and graduate
>> > students that contribute to REEF from UCLA, UCB, UW and Seoul National
>> > University.  We plan to attract active developers from other
>> > institutions.
>> >
>> > ## Relationships with Other Apache Products
>> >
>> > Given REEF's position in the big data stack, there are three
>> > relationships to consider: Projects that fit below, on top of, or
>> > alongside REEF in the stack.
>> >
>> > ### Below REEF: Mesos and YARN
>> >
>> > REEF is designed to facilitate application development on top of
>> > resource managers.  Hence, its relationship with the aforementioned
>> > resource managers is symbiotic by design.
>> >
>> > ### On Top of REEF
>> >
>> > Apache Spark, Giraph, MapReduce and Flink are only some of the
>> > projects that logically belong at a higher layer of the big data stack
>> > than REEF.  Of course, none of these today actually are leveraging
>> > REEF and had to each individually solve some of the issues REEF
>> > addresses.  It is our goal that REEF will help developers create
>> > an even richer set of future big data frameworks.
>> >
>> > ### Alongside REEF
>> >
>> > Apache hosts several projects building intermediate, library layers on
>> > top of a resource management platform. Twill, Slider, and Tez are
>> > notable examples in the incubator. These projects share many
>> > objectives with REEF (and each other).  We expect these parallel
>> > explorations to converge and differentiate within Apache, as the space
>> > for distributed applications and deployment is too vast for a single
>> > answer.
>> >
>> > Apache Twill and REEF both aim to simplify application development on
>> > top of resource managers.  However, REEF and Twill go about this in
>> > different ways: Twill simplifies programming by exposing a programming
>> > model, Java Threads.  REEF on the other hand provides a set of common
>> > building blocks (e.g., job coordination, state passing, cluster
>> > membership) for building big data processing applications and
>> > virtualizes underlying resources managers.  None of this prescribes a
>> > specific programming model.  As such, REEF occupies a slot ever so
>> > slightly below Twill in an architecture stack.
>> >
>> > Apache Slider is a framework to make it easy to deploy and manage
>> > long-running static applications in a YARN cluster. The focus is to
>> > adapt existing applications such as HBase and Accumulo to run on YARN
>> > with little modification. Therefore, the goals of Slider and REEF are
>> > different.
>> >
>> > Apache Tez is a project to develop a generic Directed Acyclic Graph (DAG)
>> > processing framework with a reusable set of data processing primitives.
>> > The initial focus is to provide improved data processing capabilities for
>> > projects like Apache Hive, Apache Pig, and Cascading. Tez is still a
>> single
>> > framework for DAG processing.  In contrast, REEF provides a generic
>> > layer on which diverse computation models (DAG, ML, Graph processing,
>> > and Interactive query processing) can be built.  More importantly,
>> > REEF provides a layer that facilitates inter-framework resource and
>> > in-memory state use and virtualizes resource managers. Regarding
>> > re-usable data processing primitives, Tez and REEF share the same
>> > goal.  We hope to collaborate on features which can be shared between
>> > Tez and REEF.
>> >
>> >
>> > ## An Excessive Fascination with the Apache Brand
>> >
>> > The Apache Software Foundation has a reputation of being the best place
>> to
>> > host open source projects. We believe that we will attract many
>> developers
>> > who want to contribute to innovating in the Big Data platform space by
>> > joining the Apache Software Foundation.
>> >
>> >
>> > # Documentation
>> >
>> > The current documentation for REEF is at
>> > https://github.com/Microsoft-CISL/REEF as well as on
>> > http://www.reef-project.org
>> >
>> >
>> > # Initial Source
>> >
>> > The REEF codebase is currently hosted at
>> > https://github.com/Microsoft-CISL/REEF.
>> >
>> >
>> > # External Dependencies
>> >
>> > REEF makes extensive use of the vast array of Java libraries from the
>> > Apache Software Foundation, namely:
>> >
>> >  * avro (Apache 2.0)
>> >  * hadoop (Apache 2.0)
>> >  * hdfs (Apache 2.0)
>> >  * yarn (Apache 2.0)
>> >  * commons-cli (Apache 2.0)
>> >  * commons-configuration (Apache 2.0)
>> >  * commons-lang (Apache 2.0)
>> >  * commons-logging (Apache 2.0)
>> >
>> > To the best of our knowledge, the external dependencies of REEF are
>> > distributed under Apache compatible licenses:
>> >
>> >  * guava-libraries (Apache 2.0)
>> >  * protobuf (BSD)
>> >  * asm (BSD)
>> >  * netty (Apache 2.0)
>> >  * mockito (MIT)
>> >  * junit (EPL 1.0)
>> >  * slf4j (MIT)
>> >
>> >
>> > # Cryptography
>> >
>> > REEF will depend on secure Hadoop, which can optionally use Kerberos.
>> >
>> > # Required Resources
>> >
>> > ## Mailing Lists
>> >
>> >   * reef-private for private PMC discussions
>> >   * reef-dev for technical discussions among contributors and
>> >                  notification about commits
>> >
>> > ## Subversion Directory
>> >
>> > The REEF team uses Git for source version control:
>> > git://git.apache.org/reef
>> >
>> > ## Issue Tracking
>> >
>> > JIRA REEF (REEF)
>> >
>> > ## Other Resources
>> >
>> > Jenkins continuous integration testing
>> >
>> > # Initial Committers
>> >
>> >  * Markus Weimer
>> >  * Sergiy Matusevych
>> >  * Julia Wang
>> >  * Shravan M Narayanamurthy
>> >  * Yingda Chen
>> >  * Tony Majestro
>> >  * Beysim Sezgin
>> >  * Boris Shulman
>> >  * Russell Sears
>> >  * Jung Ryong Lee
>> >  * You Sun Jung
>> >  * Dong Joon Hyun
>> >  * Josh Rosen
>> >  * Tyson Condie
>> >  * Brandon Myers
>> >  * Yunseong Lee
>> >  * Taegeon Um
>> >  * Youngseok Yang
>> >  * Brian Cho
>> >  * Byung-Gon Chun
>> >
>> > # Affiliations
>> >
>> >  * Microsoft:
>> >   * Markus Weimer
>> >   * Sergiy Matusevych
>> >   * Julia Wang
>> >   * Shravan M Narayanamurthy
>> >   * Yingda Chen
>> >   * Tony Majestro
>> >   * Beysim Sezgin
>> >   * Boris Shulman
>> >  * Purestorage:
>> >   * Russell Sears
>> >  * SK Telecom:
>> >   * Jung Ryong Lee
>> >   * You Sun Jung
>> >   * Dong Joon Hyun
>> >  * University of California:
>> >   * Josh Rosen (Berkeley)
>> >   * Tyson Condie (LA)
>> >  * University of Washington:
>> >   * Brandon Myers
>> >  * Seoul National University:
>> >   * Yunseong Lee
>> >   * Taegeon Um
>> >   * Youngseok Yang
>> >   * Brian Cho
>> >   * Byung-Gon Chun
>> >
>> >
>> > # Sponsors
>> >
>> > ## Champions
>> > Chris Douglas <cdouglas@apache.org>
>> >
>> > ## Nominated Mentors
>> >  * Chris Mattmann <mattmann@apache.org>
>> >  * Ross Gardler <rgardler@apache.org>
>> >  * Owen O'Malley <omalley@apache.org>
>> >
>> > ## Sponsoring Entity
>> > The Apache Incubator
>> >
>>
>
>
>
> --
> Byung-Gon Chun

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message