incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (398J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: [PROPOSAL] Apache Spark for the Incubator
Date Tue, 04 Jun 2013 01:44:06 GMT
Dear Konstantin,

Thanks! The incoming Spark project is excited about the relationship
with Bigtop that could happen here.

As for new committers, after conferring with the Spark project
members, we would like to adopt a simple policy of having all new
committers not add themselves to the wiki as of yet, but simply
join the project mailing lists when they are created, and then from
there, contribute. I and other mentors, and the Spark community are
committed to being inclusive, so hopefully won't take too long for
anybody to become a PPMC member/committer on the project after some
demonstrated contributions.

Thanks for your interest and again for your kind words.

Cheers!

Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Konstantin Boudnik <cos@apache.org>
Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
Date: Friday, May 31, 2013 12:29 PM
To: "general@incubator.apache.org" <general@incubator.apache.org>
Subject: Re: [PROPOSAL] Apache Spark for the Incubator

>Great news!
>
>Definitely +1 (non-binding, I guess) on adding Spark to the family
>of ASF project!
>
>I also express the interest to contribute to the project and move it
>forward
>to the graduation! Bigtop has been packaging and providing Spark as a
>part of
>Hadoop 1.x software stacks for some time; and hopefully would be able to
>offer
>it as a part of Hadoop 2.x line in the coming days.
>
>Dr. Konstantin Boudnik
>  Hadoop committer
>  BigTop PMC
>
>On Fri, May 31, 2013 at 06:03PM, Mattmann, Chris A (398J) wrote:
>> Hi Folks,
>> 
>> I'm pleased to bring you a proposal to the Apache Incubator for the
>>Apache
>> Spark project: https://wiki.apache.org/incubator/SparkProposal
>> 
>> The work originates from the Berkeley AMPLab and through a number of
>> industry
>> participants, and other institutions. Spark is a framework for
>>large-scale
>> data 
>> analysis on clusters, with a particular focus on low latency operations.
>> The
>> source code is written in Scala, and provides a number of APIs and
>>bindings
>> in various programming languages.
>> 
>> The proposal text is copied to the bottom of this email. I'm going to
>>leave
>> this thread open for the next week for discussion. Once it's died down,
>> I'll
>> call an official VOTE.
>> 
>> Suresh, Ross G. -- heads up -- this project may be of interest to you
>>both
>> and would welcome you guys as additional mentors. We currently have 3
>> mentors
>> committed to the project, but would love to have more. People
>>interested in
>> contributing should declare their interest here on the general@incubator
>> thread
>> and those potential contributors will be discussed by the incoming Spark
>> community.
>> 
>> Questions -- let's hear em'! :)
>> 
>> Cheers,
>> Chris
>> ("Champion", incoming Apache Spark)
>> 
>> === Abstract ===
>> Spark is an open source system for large-scale data analysis on
>>clusters.
>> 
>> === Proposal ===
>> Spark is an open source system for fast and flexible large-scale data
>> analysis. Spark provides a general purpose runtime that supports
>> low-latency execution in several forms. These include interactive
>> exploration of very large datasets, near real-time stream processing,
>>and
>> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
>> with HDFS, HBase, Cassandra and several other storage storage layers,
>>and
>> exposes APIs in Scala, Java and Python.
>> Background
>> Spark started as U.C. Berkeley research project, designed to efficiently
>> run machine learning algorithms on large datasets. Over time, it has
>> evolved into a general computing engine as outlined above. Spark╧s
>> developer community has also grown to include additional institutions,
>> such as universities, research labs, and corporations. Funding has been
>> provided by various institutions including the U.S. National Science
>> Foundation, DARPA, and a number of industry sponsors. See:
>> https://amplab.cs.berkeley.edu/sponsors/ for full details.
>> 
>> === Rationale ===
>> As the number of contributors to Spark has grown, we have sought for a
>> long-term home for the project, and we believe the Apache foundation
>>would
>> be a great fit. Spark is a natural fit for the Apache foundation: Spark
>> already interoperates with several existing Apache projects (HDFS,
>>HBase,
>> Hive, Cassandra, Avro and Flume to name a few). The Spark team is
>>familiar
>> with the Apache process and and subscribes to the Apache mission - the
>> team includes multiple Apache committers already. Finally, joining
>>Apache
>> will help coordinate the development effort of the growing number of
>> organizations which contribute to Spark.
>> 
>> == Initial Goals ==
>> The initial goals will most likely be to move the existing codebase to
>> Apache and integrate with the Apache development process. Furthermore,
>>we
>> plan for incremental development, and releases along with the Apache
>> guidelines.
>> 
>> === Current Status ===
>> == Meritocracy ==
>> The Spark project already operates on meritocratic principles. Today,
>> Spark has several developers and has accepted multiple major patches
>>from
>> outside of U.C. Berkeley. While this process has remained mostly
>>informal
>> (we do not have an official committer list), an implicit organization
>> exists in which individuals who contribute major components act as
>> maintainers for those modules. If accepted, the Spark project would
>> include several of these participants as committers from the onset. We
>> will work to identify all committers and PPMC members for the project
>>and
>> to operate under the ASF meritocratic principles.
>> 
>> === Community ===
>> Acceptance into the Apache foundation would bolster the already strong
>> user and developer community around Spark. That community includes
>>dozens
>> of contributors from several institutions, a meetup group with several
>> hundred members, and an active mailing list composed of hundreds of
>>users.
>> Core Developers
>> The core developers of our project are listed in our contributors and
>> initial PPMC below. Though many exist at UC Berkeley, there is a
>> representative cross sampling of other organizations including
>>Quantifind,
>> Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.
>> 
>> 
>> === Alignment ===
>> Our proposed effort aligns with several ongoing BIGDATA and U.S.
>>National
>> priority funding interests including the NSF and its Expeditions
>>program,
>> and the DARPA XDATA project. Our industry partners and collaborators are
>> well aligned with our code base.
>> 
>> There are also a number of related Apache projects and dependencies,
>>that
>> will be mentioned in the Relationships with Other Apache products
>>section.
>> 
>> == Known Risks ==
>> 
>> === Orphaned Products ===
>> Given the current level of investment in Spark - the risk of the project
>> being abandoned is minimal. There are several constituents who are
>>highly
>> incentivized to continue development. The U.C. Berkeley AMPLab relies on
>> Spark as a platform for a large number of long-term research projects.
>> Several companies have build verticalized products which are tightly
>> dependent on Spark. Other companies have devoted significant internal
>> infrastructure investment in Spark.
>> 
>> === Inexperience with Open Source ===
>> Spark has existed as a healthy open source project for several years.
>> During that time, Matei and others have curated an open-source community
>> successfully, attracting developers from a diverse group of companies
>> including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel,
>>and
>> Webtrends. 
>> 
>> === Homogenous Developers ===
>> The initial list of committers includes developers from several
>> institutions, including Quantifind, Microsoft, Yahoo!, ClearStory Data,
>> Bizo, Intel, and Webtrends.
>> 
>> === Reliance on Salaried Developers ===
>> Like most open source projects, Spark receives a substantial support
>>from
>> salaried developers. A large fraction of Spark development is supported
>>by
>> graduate students at U.C. Berkeley in the course of research degrees -
>> this is more a Ёvolunteer╡ relationship, since in most cases students
>> contribute vastly more than is necessary to immediately support
>>research.
>> In addition, those working from within corporations often devote Ёafter
>> hours╡ or spare time in the project - and these come from several
>> organizations. We will work to ensure that the ability for the project
>>to
>> continuously be stewarded and to proceed forward independent of salaried
>> developers is continued.
>> 
>> 
>> === Relationship with Other Apache Products ===
>> Spark inter-operates with several existing Apache products by supporting
>> them as storage layers: Apache Cassandra, Apache HBase, and Apache
>>Hadoop
>> (HDFS). It also uses several Apache components internally including
>>Apache
>> Maven and several Apache Commons libraries. Finally, Shark (a higher
>>layer
>> framework built on Spark) inter-operates with Apache Hive. We will
>>explore
>> the relationship between Spark and Apache Gora, which also provides
>> in-memory object storage (Champion Mattmann was the Champion for Apace
>> Gora so we expect alignment and cross pollination between our efforts).
>> 
>> Spark offers an alternative computation engine to Apache Hadoop
>> (MapReduce). Unlike MapReduce, Spark is designed for lower-latency and
>> interactive workloads. This makes the projects complimentary: many users
>> run MapReduce and Spark side-by-side.
>> 
>> === A Excessive Fascination with the Apache Brand ===
>> Spark is already a healthy and relatively well known open source
>>project.
>> This proposal is not for the purpose of generating publicity. Rather,
>>the
>> primary benefits to joining Apache are those outlined in the Rationale
>> section.
>> 
>> === Documentation ===
>> The reader will find these websites highly relevant:
>>  * Spark website: http://spark-project.org/
>>  * Spark documentation: http://spark-project.org/documentation/
>>  * Issue tracking: https://spark-project.atlassian.net/
>>  * Codebase: https://github.com/mesos/spark
>>  * User group: https://groups.google.com/group/spark-users
>> 
>> == Initial Source ==
>> The Spark codebase is currently hosted on Github:
>> https://github.com/mesos/spark. This is the exact codebase that we would
>> migrate to the Apache foundation.
>> Source and Intellectual Property Submission Plan
>> Currently, the Spark codebase is distributed under a BSD license. The
>>vast
>> majority of code has copyright held by the University of California.
>>Upon
>> entering Apache, Spark will migrate to an Apache License with all
>> copyright assigned to the Apache Foundation. The University of
>>California
>> will transfer all copyright to the Apache Foundation. In certain cases
>> where individuals hold copyright, we will have individuals sign over
>> copyright to the Apache foundation as well.
>> 
>> Going forward, all commits would assign copyright directly to the Apache
>> foundation through our signed Individual Contributor License Agreements
>> for all initial committers on the project.
>> 
>> 
>> == External Dependencies ==
>> To the best of our knowledge, all dependencies of Spark are distributed
>> under Apache compatible licenses. Upon acceptance to the incubator, we
>> would begin a thorough analysis of all transitive dependencies to verify
>> this fact and introduce license checking into the build and release
>> process (for instance integrating Apache Rat).
>> 
>> == Required Resources ==
>> === Mailing list ===
>> We will migrate the existing Spark mailing lists as follows:
>> 
>>  * spark-users@googlegroups --> users@spark.incubator.apache.org
>>  * spark-developers@googlegroups --> dev@spark.incubator.apache.org
>>  * spark-commits are hosted on Github, so we would request
>> commits@spark.incubator.apache.org
>> 
>> The latter is to be consistent with the new PIAO naming scheme for
>> podlings.
>> 
>> === Source control ===
>> The Spark team would like to use Git for source control, due to our
>> current use of Git.
>> We request a writeable Git repo for Spark, and mirroring to be set up to
>> Github through INFRA. Champion Mattmann can assist with creating INFRA
>> tickets for this.
>> 
>> === Issue Tracking ===
>> Spark currently uses a hosted JIRA deployment for issue tracking. We
>>will
>> migrate to the Apache JIRA.
>> http://issues.apache.org/jira/browse/SPARK
>> 
>> == Initial Committers ==
>>  * Matei Zaharia <matei@apache.org>
>>  * Ankur Dave <ankurdave@gmail.com>
>>  * Tathagata Das <tdas@eecs.berkeley.edu>
>>  * Haoyuan Li <haoyuan@cs.berkeley.edu>
>>  * Josh Rosen <joshrosen@cs.berkeley.edu>
>>  * Reynold Xin <rxin@cs.berkeley.edu>
>>  * Shivaram Venkataraman <shivaram@eecs.berkeley.edu>
>>  * Mosharaf Chowdhury <mosharaf@cs.berkeley.edu>
>>  * Charles Reiss <charles@eecs.berkeley.edu>
>>  * Andy Konwinski <andykonwinski@gmail.com>
>>  * Patrick Wendell <pwendell@eecs.berkeley.edu>
>>  * Imran Rashid <imran@quantifind.com>
>>  * Ryan LeCompte <lecompte@gmail.com>
>>  * Ravi Pandya <ravip@exchange.microsoft.com>
>>  * Ram Sriharsha <harshars@yahoo-inc.com>
>>  * Robert Evans <evans@yahoo-inc.com>
>>  * Mridul Muralidharan <mridulm@yahoo-inc.com>
>>  * Thomas Dudziak <tomdz@clearstorydata.com>
>>  * Mark Hamstra <mark@clearstorydata.com>
>>  * Stephen Haberman <stephen.haberman@gmail.com>
>>  * Shane Huang <shannie.huang@gmail.com>
>>  * Andrew xia <xiajunluan@gmail.com>
>>  * Nick Pentreath <nick.pentreath@gmail.com>
>>  * Sean McNamara <sean.mcnamara@webtrends.com>
>> 
>> == Affiliations ==
>> The initial committers are from nine organizations: UC Berkeley,
>> Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Mxit and
>> Webtrends.
>> 
>>  * Matei Zaharia (UCB)
>>  * Ankur Dave (UCB)
>>  * Tathagata Das (UCB)
>>  * Haoyuan Li (UCB)
>>  * Josh Rosen (UCB)
>>  * Reynold Xin (UCB)
>>  * Shivaram Venkataraman (UCB)
>>  * Mosharaf Chowdhury (UCB)
>>  * Charles Reiss (UCB)
>>  * Andy Konwinski (UCB)
>>  * Patrick Wendell (UCB)
>>  * Imran Rashid (Quantifind)
>>  * Ryan LeCompte (Quantifind)
>>  * Ravi Pandya (Microsoft)
>>  * Ram Sriharsha (Yahoo!)
>>  * Robert Evans (Yahoo!)
>>  * Mridul Muralidharam (Yahoo!)
>>  * Thomas Dudziak (ClearStory)
>>  * Mark Hamstra (ClearStory)
>>  * Stephen Haberman (Bizo)
>>  * Shane Huang (Intel)
>>  * Andrew Xia (Intel)
>>  * Nick Pentreath (Mxit)
>>  * Sean McNamara (Webtrends)
>> 
>> == Sponsors ==
>> === Champion ===
>>  * Chris Mattmann
>> 
>> === Nominated Mentors ===
>>  * Chris Mattmann
>>  * Paul Ramirez 
>>  * Andrew Hart 
>> 
>> === Sponsoring Entity ===
>>  The Apache Incubator
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>> 
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>For additional commands, e-mail: general-help@incubator.apache.org
>

Mime
View raw message