Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9222510C9D for ; Sat, 29 Jun 2013 05:42:14 +0000 (UTC) Received: (qmail 77298 invoked by uid 500); 29 Jun 2013 05:42:09 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 76287 invoked by uid 500); 29 Jun 2013 05:41:59 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 76277 invoked by uid 99); 29 Jun 2013 05:41:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 29 Jun 2013 05:41:56 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [76.96.30.24] (HELO qmta02.emeryville.ca.mail.comcast.net) (76.96.30.24) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 29 Jun 2013 05:41:50 +0000 Received: from omta20.emeryville.ca.mail.comcast.net ([76.96.30.87]) by qmta02.emeryville.ca.mail.comcast.net with comcast id u5fU1l0031smiN4A25h8MH; Sat, 29 Jun 2013 05:41:08 +0000 Received: from boudnik.org ([24.4.185.157]) by omta20.emeryville.ca.mail.comcast.net with comcast id u5h71l00R3QAh8g8g5h8f3; Sat, 29 Jun 2013 05:41:08 +0000 Received: from localhost (tpx.boudnik.org [192.168.102.148]) by boudnik.org (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id r5T5f70O009368 for ; Fri, 28 Jun 2013 22:41:07 -0700 Date: Fri, 28 Jun 2013 22:41:07 -0700 From: Konstantin Boudnik To: general@incubator.apache.org Subject: Re: [PROPOSAL] Apache Spark for the Incubator Message-ID: <20130629054107.GC4212@tpx> References: <20130531192943.GD17179@linspire.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="vOmOzSkFvhd7u8Ms" Content-Disposition: inline In-Reply-To: X-Organization: It's something of 'Cos X-PGP-Key: http://www.boudnik.org/~cos/pubkey.asc User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1372484468; bh=lBtFOk8goJQZKmYuG38Fo9fii5c1uBMzeKGaf8hqRps=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=XcqunWfxMKM/oZCnCeUYudX8EIE0O98OXbbZx/eSgCUzkEdWRADfDPrzx2cB46htI cinC9IIzo0COXQMiwF61NEufCXSX+CM1KQx+Jt6JIZK8/ziEmTUzdUd4Bjmbvvs4qT /NLJm3xlqD2O3SdRiR7uKuA/JnUcd0g8KVl3ZREcusFA+a5vQokJ0rt+MIwKLYq1RG e9EWtzeeNCMU1Ayro9QJcWHa/qcJaACtiLK6aXK9y+fNv+JuGKo/ojmbB+NtKo1LLK RSbRlzF4WXI7S3Tou9zndyM/ghrjvon7vf64r09BzZCoVowujxH+iRfs8978Zvo8EC Evmvd2IcE9/BA== X-Virus-Checked: Checked by ClamAV on apache.org --vOmOzSkFvhd7u8Ms Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable That makes sense. Thanks for the update - I am still catching up on my emai= ls backed up because of the Hadoop summit. Cos On Tue, Jun 04, 2013 at 01:44AM, Mattmann, Chris A (398J) wrote: > Dear Konstantin, >=20 > Thanks! The incoming Spark project is excited about the relationship > with Bigtop that could happen here. >=20 > As for new committers, after conferring with the Spark project > members, we would like to adopt a simple policy of having all new > committers not add themselves to the wiki as of yet, but simply > join the project mailing lists when they are created, and then from > there, contribute. I and other mentors, and the Spark community are > committed to being inclusive, so hopefully won't take too long for > anybody to become a PPMC member/committer on the project after some > demonstrated contributions. >=20 > Thanks for your interest and again for your kind words. >=20 > Cheers! >=20 > Chris >=20 >=20 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: chris.a.mattmann@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >=20 >=20 >=20 >=20 >=20 >=20 > -----Original Message----- > From: Konstantin Boudnik > Reply-To: "general@incubator.apache.org" > Date: Friday, May 31, 2013 12:29 PM > To: "general@incubator.apache.org" > Subject: Re: [PROPOSAL] Apache Spark for the Incubator >=20 > >Great news! > > > >Definitely +1 (non-binding, I guess) on adding Spark to the family > >of ASF project! > > > >I also express the interest to contribute to the project and move it > >forward > >to the graduation! Bigtop has been packaging and providing Spark as a > >part of > >Hadoop 1.x software stacks for some time; and hopefully would be able to > >offer > >it as a part of Hadoop 2.x line in the coming days. > > > >Dr. Konstantin Boudnik > > Hadoop committer > > BigTop PMC > > > >On Fri, May 31, 2013 at 06:03PM, Mattmann, Chris A (398J) wrote: > >> Hi Folks, > >>=20 > >> I'm pleased to bring you a proposal to the Apache Incubator for the > >>Apache > >> Spark project: https://wiki.apache.org/incubator/SparkProposal > >>=20 > >> The work originates from the Berkeley AMPLab and through a number of > >> industry > >> participants, and other institutions. Spark is a framework for > >>large-scale > >> data=20 > >> analysis on clusters, with a particular focus on low latency operation= s. > >> The > >> source code is written in Scala, and provides a number of APIs and > >>bindings > >> in various programming languages. > >>=20 > >> The proposal text is copied to the bottom of this email. I'm going to > >>leave > >> this thread open for the next week for discussion. Once it's died down, > >> I'll > >> call an official VOTE. > >>=20 > >> Suresh, Ross G. -- heads up -- this project may be of interest to you > >>both > >> and would welcome you guys as additional mentors. We currently have 3 > >> mentors > >> committed to the project, but would love to have more. People > >>interested in > >> contributing should declare their interest here on the general@incubat= or > >> thread > >> and those potential contributors will be discussed by the incoming Spa= rk > >> community. > >>=20 > >> Questions -- let's hear em'! :) > >>=20 > >> Cheers, > >> Chris > >> ("Champion", incoming Apache Spark) > >>=20 > >> =3D=3D=3D Abstract =3D=3D=3D > >> Spark is an open source system for large-scale data analysis on > >>clusters. > >>=20 > >> =3D=3D=3D Proposal =3D=3D=3D > >> Spark is an open source system for fast and flexible large-scale data > >> analysis. Spark provides a general purpose runtime that supports > >> low-latency execution in several forms. These include interactive > >> exploration of very large datasets, near real-time stream processing, > >>and > >> ad-hoc SQL analytics (through higher layer extensions). Spark interfac= es > >> with HDFS, HBase, Cassandra and several other storage storage layers, > >>and > >> exposes APIs in Scala, Java and Python. > >> Background > >> Spark started as U.C. Berkeley research project, designed to efficient= ly > >> run machine learning algorithms on large datasets. Over time, it has > >> evolved into a general computing engine as outlined above. Spark=E2=95= =A7s > >> developer community has also grown to include additional institutions, > >> such as universities, research labs, and corporations. Funding has been > >> provided by various institutions including the U.S. National Science > >> Foundation, DARPA, and a number of industry sponsors. See: > >> https://amplab.cs.berkeley.edu/sponsors/ for full details. > >>=20 > >> =3D=3D=3D Rationale =3D=3D=3D > >> As the number of contributors to Spark has grown, we have sought for a > >> long-term home for the project, and we believe the Apache foundation > >>would > >> be a great fit. Spark is a natural fit for the Apache foundation: Spark > >> already interoperates with several existing Apache projects (HDFS, > >>HBase, > >> Hive, Cassandra, Avro and Flume to name a few). The Spark team is > >>familiar > >> with the Apache process and and subscribes to the Apache mission - the > >> team includes multiple Apache committers already. Finally, joining > >>Apache > >> will help coordinate the development effort of the growing number of > >> organizations which contribute to Spark. > >>=20 > >> =3D=3D Initial Goals =3D=3D > >> The initial goals will most likely be to move the existing codebase to > >> Apache and integrate with the Apache development process. Furthermore, > >>we > >> plan for incremental development, and releases along with the Apache > >> guidelines. > >>=20 > >> =3D=3D=3D Current Status =3D=3D=3D > >> =3D=3D Meritocracy =3D=3D > >> The Spark project already operates on meritocratic principles. Today, > >> Spark has several developers and has accepted multiple major patches > >>from > >> outside of U.C. Berkeley. While this process has remained mostly > >>informal > >> (we do not have an official committer list), an implicit organization > >> exists in which individuals who contribute major components act as > >> maintainers for those modules. If accepted, the Spark project would > >> include several of these participants as committers from the onset. We > >> will work to identify all committers and PPMC members for the project > >>and > >> to operate under the ASF meritocratic principles. > >>=20 > >> =3D=3D=3D Community =3D=3D=3D > >> Acceptance into the Apache foundation would bolster the already strong > >> user and developer community around Spark. That community includes > >>dozens > >> of contributors from several institutions, a meetup group with several > >> hundred members, and an active mailing list composed of hundreds of > >>users. > >> Core Developers > >> The core developers of our project are listed in our contributors and > >> initial PPMC below. Though many exist at UC Berkeley, there is a > >> representative cross sampling of other organizations including > >>Quantifind, > >> Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. > >>=20 > >>=20 > >> =3D=3D=3D Alignment =3D=3D=3D > >> Our proposed effort aligns with several ongoing BIGDATA and U.S. > >>National > >> priority funding interests including the NSF and its Expeditions > >>program, > >> and the DARPA XDATA project. Our industry partners and collaborators a= re > >> well aligned with our code base. > >>=20 > >> There are also a number of related Apache projects and dependencies, > >>that > >> will be mentioned in the Relationships with Other Apache products > >>section. > >>=20 > >> =3D=3D Known Risks =3D=3D > >>=20 > >> =3D=3D=3D Orphaned Products =3D=3D=3D > >> Given the current level of investment in Spark - the risk of the proje= ct > >> being abandoned is minimal. There are several constituents who are > >>highly > >> incentivized to continue development. The U.C. Berkeley AMPLab relies = on > >> Spark as a platform for a large number of long-term research projects. > >> Several companies have build verticalized products which are tightly > >> dependent on Spark. Other companies have devoted significant internal > >> infrastructure investment in Spark. > >>=20 > >> =3D=3D=3D Inexperience with Open Source =3D=3D=3D > >> Spark has existed as a healthy open source project for several years. > >> During that time, Matei and others have curated an open-source communi= ty > >> successfully, attracting developers from a diverse group of companies > >> including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, > >>and > >> Webtrends.=20 > >>=20 > >> =3D=3D=3D Homogenous Developers =3D=3D=3D > >> The initial list of committers includes developers from several > >> institutions, including Quantifind, Microsoft, Yahoo!, ClearStory Data, > >> Bizo, Intel, and Webtrends. > >>=20 > >> =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > >> Like most open source projects, Spark receives a substantial support > >>from > >> salaried developers. A large fraction of Spark development is supported > >>by > >> graduate students at U.C. Berkeley in the course of research degrees - > >> this is more a =D0=81volunteer=E2=95=A1 relationship, since in most ca= ses students > >> contribute vastly more than is necessary to immediately support > >>research. > >> In addition, those working from within corporations often devote =D0= =81after > >> hours=E2=95=A1 or spare time in the project - and these come from seve= ral > >> organizations. We will work to ensure that the ability for the project > >>to > >> continuously be stewarded and to proceed forward independent of salari= ed > >> developers is continued. > >>=20 > >>=20 > >> =3D=3D=3D Relationship with Other Apache Products =3D=3D=3D > >> Spark inter-operates with several existing Apache products by supporti= ng > >> them as storage layers: Apache Cassandra, Apache HBase, and Apache > >>Hadoop > >> (HDFS). It also uses several Apache components internally including > >>Apache > >> Maven and several Apache Commons libraries. Finally, Shark (a higher > >>layer > >> framework built on Spark) inter-operates with Apache Hive. We will > >>explore > >> the relationship between Spark and Apache Gora, which also provides > >> in-memory object storage (Champion Mattmann was the Champion for Apace > >> Gora so we expect alignment and cross pollination between our efforts). > >>=20 > >> Spark offers an alternative computation engine to Apache Hadoop > >> (MapReduce). Unlike MapReduce, Spark is designed for lower-latency and > >> interactive workloads. This makes the projects complimentary: many use= rs > >> run MapReduce and Spark side-by-side. > >>=20 > >> =3D=3D=3D A Excessive Fascination with the Apache Brand =3D=3D=3D > >> Spark is already a healthy and relatively well known open source > >>project. > >> This proposal is not for the purpose of generating publicity. Rather, > >>the > >> primary benefits to joining Apache are those outlined in the Rationale > >> section. > >>=20 > >> =3D=3D=3D Documentation =3D=3D=3D > >> The reader will find these websites highly relevant: > >> * Spark website: http://spark-project.org/ > >> * Spark documentation: http://spark-project.org/documentation/ > >> * Issue tracking: https://spark-project.atlassian.net/ > >> * Codebase: https://github.com/mesos/spark > >> * User group: https://groups.google.com/group/spark-users > >>=20 > >> =3D=3D Initial Source =3D=3D > >> The Spark codebase is currently hosted on Github: > >> https://github.com/mesos/spark. This is the exact codebase that we wou= ld > >> migrate to the Apache foundation. > >> Source and Intellectual Property Submission Plan > >> Currently, the Spark codebase is distributed under a BSD license. The > >>vast > >> majority of code has copyright held by the University of California. > >>Upon > >> entering Apache, Spark will migrate to an Apache License with all > >> copyright assigned to the Apache Foundation. The University of > >>California > >> will transfer all copyright to the Apache Foundation. In certain cases > >> where individuals hold copyright, we will have individuals sign over > >> copyright to the Apache foundation as well. > >>=20 > >> Going forward, all commits would assign copyright directly to the Apac= he > >> foundation through our signed Individual Contributor License Agreements > >> for all initial committers on the project. > >>=20 > >>=20 > >> =3D=3D External Dependencies =3D=3D > >> To the best of our knowledge, all dependencies of Spark are distributed > >> under Apache compatible licenses. Upon acceptance to the incubator, we > >> would begin a thorough analysis of all transitive dependencies to veri= fy > >> this fact and introduce license checking into the build and release > >> process (for instance integrating Apache Rat). > >>=20 > >> =3D=3D Required Resources =3D=3D > >> =3D=3D=3D Mailing list =3D=3D=3D > >> We will migrate the existing Spark mailing lists as follows: > >>=20 > >> * spark-users@googlegroups --> users@spark.incubator.apache.org > >> * spark-developers@googlegroups --> dev@spark.incubator.apache.org > >> * spark-commits are hosted on Github, so we would request > >> commits@spark.incubator.apache.org > >>=20 > >> The latter is to be consistent with the new PIAO naming scheme for > >> podlings. > >>=20 > >> =3D=3D=3D Source control =3D=3D=3D > >> The Spark team would like to use Git for source control, due to our > >> current use of Git. > >> We request a writeable Git repo for Spark, and mirroring to be set up = to > >> Github through INFRA. Champion Mattmann can assist with creating INFRA > >> tickets for this. > >>=20 > >> =3D=3D=3D Issue Tracking =3D=3D=3D > >> Spark currently uses a hosted JIRA deployment for issue tracking. We > >>will > >> migrate to the Apache JIRA. > >> http://issues.apache.org/jira/browse/SPARK > >>=20 > >> =3D=3D Initial Committers =3D=3D > >> * Matei Zaharia > >> * Ankur Dave > >> * Tathagata Das > >> * Haoyuan Li > >> * Josh Rosen > >> * Reynold Xin > >> * Shivaram Venkataraman > >> * Mosharaf Chowdhury > >> * Charles Reiss > >> * Andy Konwinski > >> * Patrick Wendell > >> * Imran Rashid > >> * Ryan LeCompte > >> * Ravi Pandya > >> * Ram Sriharsha > >> * Robert Evans > >> * Mridul Muralidharan > >> * Thomas Dudziak > >> * Mark Hamstra > >> * Stephen Haberman > >> * Shane Huang > >> * Andrew xia > >> * Nick Pentreath > >> * Sean McNamara > >>=20 > >> =3D=3D Affiliations =3D=3D > >> The initial committers are from nine organizations: UC Berkeley, > >> Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Mxit and > >> Webtrends. > >>=20 > >> * Matei Zaharia (UCB) > >> * Ankur Dave (UCB) > >> * Tathagata Das (UCB) > >> * Haoyuan Li (UCB) > >> * Josh Rosen (UCB) > >> * Reynold Xin (UCB) > >> * Shivaram Venkataraman (UCB) > >> * Mosharaf Chowdhury (UCB) > >> * Charles Reiss (UCB) > >> * Andy Konwinski (UCB) > >> * Patrick Wendell (UCB) > >> * Imran Rashid (Quantifind) > >> * Ryan LeCompte (Quantifind) > >> * Ravi Pandya (Microsoft) > >> * Ram Sriharsha (Yahoo!) > >> * Robert Evans (Yahoo!) > >> * Mridul Muralidharam (Yahoo!) > >> * Thomas Dudziak (ClearStory) > >> * Mark Hamstra (ClearStory) > >> * Stephen Haberman (Bizo) > >> * Shane Huang (Intel) > >> * Andrew Xia (Intel) > >> * Nick Pentreath (Mxit) > >> * Sean McNamara (Webtrends) > >>=20 > >> =3D=3D Sponsors =3D=3D > >> =3D=3D=3D Champion =3D=3D=3D > >> * Chris Mattmann > >>=20 > >> =3D=3D=3D Nominated Mentors =3D=3D=3D > >> * Chris Mattmann > >> * Paul Ramirez=20 > >> * Andrew Hart=20 > >>=20 > >> =3D=3D=3D Sponsoring Entity =3D=3D=3D > >> The Apache Incubator > >>=20 > >>=20 > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Chris Mattmann, Ph.D. > >> Senior Computer Scientist > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> Office: 171-266B, Mailstop: 171-246 > >> Email: chris.a.mattmann@nasa.gov > >> WWW: http://sunset.usc.edu/~mattmann/ > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Adjunct Assistant Professor, Computer Science Department > >> University of Southern California, Los Angeles, CA 90089 USA > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>=20 > >>=20 > >>=20 > >>=20 > >>=20 > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > >> For additional commands, e-mail: general-help@incubator.apache.org > >>=20 > > > >--------------------------------------------------------------------- > >To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > >For additional commands, e-mail: general-help@incubator.apache.org > > >=20 --vOmOzSkFvhd7u8Ms Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iF4EAREIAAYFAlHOc3MACgkQenyFlstYjhKpYwD+LTH3OszE9AV49kQaPiUG+xgA SXht5Vonc+YFe9GO6J8BANYR73t/JIQV3zGkh4Ni2bqbJy+UQtBQfpqIUaLqRKA2 =IlPD -----END PGP SIGNATURE----- --vOmOzSkFvhd7u8Ms--