Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2339B105A4 for ; Sat, 8 Jun 2013 07:26:27 +0000 (UTC) Received: (qmail 50500 invoked by uid 500); 8 Jun 2013 07:26:25 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 49675 invoked by uid 500); 8 Jun 2013 07:26:21 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 49666 invoked by uid 99); 8 Jun 2013 07:26:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Jun 2013 07:26:19 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hitesh@hortonworks.com designates 209.85.220.50 as permitted sender) Received: from [209.85.220.50] (HELO mail-pa0-f50.google.com) (209.85.220.50) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Jun 2013 07:26:13 +0000 Received: by mail-pa0-f50.google.com with SMTP id fb1so3108235pad.37 for ; Sat, 08 Jun 2013 00:25:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=OSG6Sh9oBuV//9NNAcKQJnuVZNuWKODVddJHW7GdCBQ=; b=MkzUcvXawODYpTpaZ9qqa775rHIl0IJerREuCOTefkwJm9QMgCb9NCAMyvEPlO+jqa I4ldaU2ishF19KoqK7pdFmuyLFdyfbSYXnPWwq2/3XIJ7b4YzLoh4eNQu0iVDslWn2Hh hxvDtkto0dxSq6r9MzBVJTUwekULZM5JW8k6mV5S0wPA0wf9fsC2ZNZXq8nLMnI/6m/o EQKHcuVUnK1wQe2/tKHiUjgSadXmkFo3LYMCqS0S2drqKs7l/TDa5K0Q1kimOk+ELSPs mTjB5I4ovYupjOOpmdpOn+wW5igLagsxPSwjOtZcYlEEpHXhbs1e7bf/nZLT614iD7aj 4wTQ== X-Received: by 10.68.130.199 with SMTP id og7mr1977747pbb.132.1370676352054; Sat, 08 Jun 2013 00:25:52 -0700 (PDT) Received: from [10.0.0.2] (c-107-3-142-185.hsd1.ca.comcast.net. [107.3.142.185]) by mx.google.com with ESMTPSA id un15sm6528341pab.7.2013.06.08.00.25.50 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 08 Jun 2013 00:25:51 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1085) Subject: Re: [VOTE] Apache Spark for the Incubator From: Hitesh Shah In-Reply-To: Date: Sat, 8 Jun 2013 00:25:49 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <11517A04-1F90-43C2-83E0-7FD2EBB0A74B@hortonworks.com> References: To: general@incubator.apache.org X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQkxSR/GA+vt6DmQy8d3GHTSXd7wAqd6RoueZCB14oYNHgXA1EzeNOijVD6EZ5i4r40TwvYh X-Virus-Checked: Checked by ClamAV on apache.org +1 (non-binding) -- Hitesh On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) wrote: > Hi Folks, >=20 > OK discussion has died down, time to VOTE to accept Spark into the > Apache Incubator. I'll let the VOTE run for at least a week. >=20 > So far I've heard +1s from the following folks, so no need for them > to VOTE again unless they want to change their VOTE: >=20 > +1 >=20 > Chris Mattmann* > Konstantin Boudnik > Henry Saputra* > Reynold Xin > Pei Chen > Roman Shaposhnik* > Suresh Marru* >=20 > * -indicates IPMC >=20 > [ ] +1 Accept Spark into the Apache Incubator. > [ ] +0 Don't care. > [ ] -1 Don't accept Spark into the Apache Incubator because.. >=20 > Proposal text is below. >=20 > =3D=3D=3D Abstract =3D=3D=3D > Spark is an open source system for large-scale data analysis on = clusters. >=20 > =3D=3D=3D Proposal =3D=3D=3D > Spark is an open source system for fast and flexible large-scale data > analysis. Spark provides a general purpose runtime that supports > low-latency execution in several forms. These include interactive > exploration of very large datasets, near real-time stream processing, = and > ad-hoc SQL analytics (through higher layer extensions). Spark = interfaces > with HDFS, HBase, Cassandra and several other storage storage layers, = and > exposes APIs in Scala, Java and Python. > Background > Spark started as U.C. Berkeley research project, designed to = efficiently > run machine learning algorithms on large datasets. Over time, it has > evolved into a general computing engine as outlined above. Spark=B9s > developer community has also grown to include additional institutions, > such as universities, research labs, and corporations. Funding has = been > provided by various institutions including the U.S. National Science > Foundation, DARPA, and a number of industry sponsors. See: > https://amplab.cs.berkeley.edu/sponsors/ for full details. >=20 > =3D=3D=3D Rationale =3D=3D=3D > As the number of contributors to Spark has grown, we have sought for a > long-term home for the project, and we believe the Apache foundation = would > be a great fit. Spark is a natural fit for the Apache foundation: = Spark > already interoperates with several existing Apache projects (HDFS, = HBase, > Hive, Cassandra, Avro and Flume to name a few). The Spark team is = familiar > with the Apache process and and subscribes to the Apache mission - the > team includes multiple Apache committers already. Finally, joining = Apache > will help coordinate the development effort of the growing number of > organizations which contribute to Spark. >=20 > =3D=3D Initial Goals =3D=3D > The initial goals will most likely be to move the existing codebase to > Apache and integrate with the Apache development process. Furthermore, = we > plan for incremental development, and releases along with the Apache > guidelines. >=20 > =3D=3D=3D Current Status =3D=3D=3D > =3D=3D Meritocracy =3D=3D > The Spark project already operates on meritocratic principles. Today, > Spark has several developers and has accepted multiple major patches = from > outside of U.C. Berkeley. While this process has remained mostly = informal > (we do not have an official committer list), an implicit organization > exists in which individuals who contribute major components act as > maintainers for those modules. If accepted, the Spark project would > include several of these participants as committers from the onset. We > will work to identify all committers and PPMC members for the project = and > to operate under the ASF meritocratic principles. >=20 > =3D=3D=3D Community =3D=3D=3D > Acceptance into the Apache foundation would bolster the already strong > user and developer community around Spark. That community includes = dozens > of contributors from several institutions, a meetup group with several > hundred members, and an active mailing list composed of hundreds of = users. > Core Developers > The core developers of our project are listed in our contributors and > initial PPMC below. Though many exist at UC Berkeley, there is a > representative cross sampling of other organizations including = Quantifind, > Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. >=20 >=20 > =3D=3D=3D Alignment =3D=3D=3D > Our proposed effort aligns with several ongoing BIGDATA and U.S. = National > priority funding interests including the NSF and its Expeditions = program, > and the DARPA XDATA project. Our industry partners and collaborators = are > well aligned with our code base. >=20 > There are also a number of related Apache projects and dependencies, = that > will be mentioned in the Relationships with Other Apache products = section. >=20 > =3D=3D Known Risks =3D=3D >=20 > =3D=3D=3D Orphaned Products =3D=3D=3D > Given the current level of investment in Spark - the risk of the = project > being abandoned is minimal. There are several constituents who are = highly > incentivized to continue development. The U.C. Berkeley AMPLab relies = on > Spark as a platform for a large number of long-term research projects. > Several companies have build verticalized products which are tightly > dependent on Spark. Other companies have devoted significant internal > infrastructure investment in Spark. >=20 > =3D=3D=3D Inexperience with Open Source =3D=3D=3D > Spark has existed as a healthy open source project for several years. > During that time, Matei and others have curated an open-source = community > successfully, attracting developers from a diverse group of companies > including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, = and > Webtrends.=20 >=20 > =3D=3D=3D Homogenous Developers =3D=3D=3D > The initial list of committers includes developers from several > institutions, including Quantifind, Microsoft, Yahoo!, ClearStory = Data, > Bizo, Intel, and Webtrends. >=20 > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > Like most open source projects, Spark receives a substantial support = from > salaried developers. A large fraction of Spark development is = supported by > graduate students at U.C. Berkeley in the course of research degrees - > this is more a =B3volunteer=B2 relationship, since in most cases = students > contribute vastly more than is necessary to immediately support = research. > In addition, those working from within corporations often devote = =B3after > hours=B2 or spare time in the project - and these come from several > organizations. We will work to ensure that the ability for the project = to > continuously be stewarded and to proceed forward independent of = salaried > developers is continued. >=20 >=20 > =3D=3D=3D Relationship with Other Apache Products =3D=3D=3D > Spark inter-operates with several existing Apache products by = supporting > them as storage layers: Apache Cassandra, Apache HBase, and Apache = Hadoop > (HDFS). It also uses several Apache components internally including = Apache > Maven and several Apache Commons libraries. Finally, Shark (a higher = layer > framework built on Spark) inter-operates with Apache Hive. We will = explore > the relationship between Spark and Apache Gora, which also provides > in-memory object storage (Champion Mattmann was the Champion for Apace > Gora so we expect alignment and cross pollination between our = efforts). >=20 > Spark offers an alternative computation engine to Apache Hadoop > (MapReduce). Unlike MapReduce, Spark is designed for lower-latency and > interactive workloads. This makes the projects complimentary: many = users > run MapReduce and Spark side-by-side. >=20 > =3D=3D=3D A Excessive Fascination with the Apache Brand =3D=3D=3D > Spark is already a healthy and relatively well known open source = project. > This proposal is not for the purpose of generating publicity. Rather, = the > primary benefits to joining Apache are those outlined in the Rationale > section. >=20 > =3D=3D=3D Documentation =3D=3D=3D > The reader will find these websites highly relevant: > * Spark website: http://spark-project.org/ > * Spark documentation: http://spark-project.org/documentation/ > * Issue tracking: https://spark-project.atlassian.net/ > * Codebase: https://github.com/mesos/spark > * User group: https://groups.google.com/group/spark-users >=20 > =3D=3D Initial Source =3D=3D > The Spark codebase is currently hosted on Github: > https://github.com/mesos/spark. This is the exact codebase that we = would > migrate to the Apache foundation. > Source and Intellectual Property Submission Plan > Currently, the Spark codebase is distributed under a BSD license. The = vast > majority of code has copyright held by the University of California. = Upon > entering Apache, Spark will migrate to an Apache License with all > copyright assigned to the Apache Foundation. The University of = California > will transfer all copyright to the Apache Foundation. In certain cases > where individuals hold copyright, we will have individuals sign over > copyright to the Apache foundation as well. >=20 > Going forward, all commits would assign copyright directly to the = Apache > foundation through our signed Individual Contributor License = Agreements > for all initial committers on the project. >=20 >=20 > =3D=3D External Dependencies =3D=3D > To the best of our knowledge, all dependencies of Spark are = distributed > under Apache compatible licenses. Upon acceptance to the incubator, we > would begin a thorough analysis of all transitive dependencies to = verify > this fact and introduce license checking into the build and release > process (for instance integrating Apache Rat). >=20 > =3D=3D Required Resources =3D=3D > =3D=3D=3D Mailing list =3D=3D=3D > We will migrate the existing Spark mailing lists as follows: >=20 > * spark-users@googlegroups --> users@spark.incubator.apache.org > * spark-developers@googlegroups --> dev@spark.incubator.apache.org > * spark-commits are hosted on Github, so we would request > commits@spark.incubator.apache.org >=20 > The latter is to be consistent with the new PIAO naming scheme for > podlings. >=20 > =3D=3D=3D Source control =3D=3D=3D > The Spark team would like to use Git for source control, due to our > current use of Git. > We request a writeable Git repo for Spark, and mirroring to be set up = to > Github through INFRA. Champion Mattmann can assist with creating INFRA > tickets for this. >=20 > =3D=3D=3D Issue Tracking =3D=3D=3D > Spark currently uses a hosted JIRA deployment for issue tracking. We = will > migrate to the Apache JIRA. > http://issues.apache.org/jira/browse/SPARK >=20 > =3D=3D Initial Committers =3D=3D > * Matei Zaharia > * Ankur Dave > * Tathagata Das > * Haoyuan Li > * Josh Rosen > * Reynold Xin > * Shivaram Venkataraman > * Mosharaf Chowdhury > * Charles Reiss > * Andy Konwinski > * Patrick Wendell > * Imran Rashid > * Ryan LeCompte > * Ravi Pandya > * Ram Sriharsha > * Robert Evans > * Mridul Muralidharan > * Thomas Dudziak > * Mark Hamstra > * Stephen Haberman > * Jason Dai > * Shane Huang > * Andrew xia > * Nick Pentreath > * Sean McNamara >=20 > =3D=3D Affiliations =3D=3D > The initial committers are from nine organizations: UC Berkeley, > Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Mxit and > Webtrends. >=20 > * Matei Zaharia (UCB) > * Ankur Dave (UCB) > * Tathagata Das (UCB) > * Haoyuan Li (UCB) > * Josh Rosen (UCB) > * Reynold Xin (UCB) > * Shivaram Venkataraman (UCB) > * Mosharaf Chowdhury (UCB) > * Charles Reiss (UCB) > * Andy Konwinski (UCB) > * Patrick Wendell (UCB) > * Imran Rashid (Quantifind) > * Ryan LeCompte (Quantifind) > * Ravi Pandya (Microsoft) > * Ram Sriharsha (Yahoo!) > * Robert Evans (Yahoo!) > * Mridul Muralidharam (Yahoo!) > * Thomas Dudziak (ClearStory) > * Mark Hamstra (ClearStory) > * Stephen Haberman (Bizo) > * Jason Dai (Intel) > * Shane Huang (Intel) > * Andrew Xia (Intel) > * Nick Pentreath (Mxit) > * Sean McNamara (Webtrends) >=20 > =3D=3D Sponsors =3D=3D > =3D=3D=3D Champion =3D=3D=3D > * Chris Mattmann >=20 > =3D=3D=3D Nominated Mentors =3D=3D=3D > * Chris Mattmann > * Paul Ramirez=20 > * Andrew Hart=20 > * Thomas Dudziak=20 > * Suresh Marru > * Henry Saputra >=20 > =3D=3D=3D Sponsoring Entity =3D=3D=3D > The Apache Incubator >=20 >=20 >=20 >=20 >=20 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: chris.a.mattmann@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >=20 >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org