Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E1152CA4C for ; Fri, 19 Dec 2014 05:44:42 +0000 (UTC) Received: (qmail 59501 invoked by uid 500); 19 Dec 2014 05:44:42 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 59314 invoked by uid 500); 19 Dec 2014 05:44:42 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 59302 invoked by uid 99); 19 Dec 2014 05:44:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Dec 2014 05:44:41 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com designates 209.85.223.182 as permitted sender) Received: from [209.85.223.182] (HELO mail-ie0-f182.google.com) (209.85.223.182) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Dec 2014 05:44:37 +0000 Received: by mail-ie0-f182.google.com with SMTP id x19so132911ier.13 for ; Thu, 18 Dec 2014 21:43:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=VO3aY3zTubW+x79vcOVDLtFLL/4oDwNNc2hywyBotuI=; b=XJ2tp5Lw0wnOnMwQwI/Vk/l6JtjsTNfURGVw893wCnkIWSR5TEenWChZZ8qFMq5+hf 88bwOUpS8jyKCyIO4F2PLzFXLWGb5w6wdEJqECaya4b/ebNzjAKVGnIHhIATi0FMCCGe taGifaUoW40ETT54glNYGogqfXcYAeiDIu/UD1g4K0n381lj1GYmVLBRl/5aTv1Uw5ob kUm4F8MKfoabAUWvXRv7mlNXoNQonwKcwIgrBHCcgFILka8ZZbtRP28z8topkFgtcb5m XQWwMtDNU8ds4P3ANFD8nnmR3aPy3iDS54RPDkQYEKGnX04JsVUehU9s+zM6CRDsPYji NSTQ== X-Received: by 10.107.129.223 with SMTP id l92mr5686712ioi.18.1418967807256; Thu, 18 Dec 2014 21:43:27 -0800 (PST) MIME-Version: 1.0 Received: by 10.64.250.4 with HTTP; Thu, 18 Dec 2014 21:42:57 -0800 (PST) In-Reply-To: References: From: Ted Dunning Date: Thu, 18 Dec 2014 21:42:57 -0800 Message-ID: Subject: Re: [VOTE] Accept Zeppelin into the Apache Incubator To: "general@incubator.apache.org" Content-Type: multipart/alternative; boundary=001a113ecf46197160050a8b3082 X-Virus-Checked: Checked by ClamAV on apache.org --001a113ecf46197160050a8b3082 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable +1 (binding) On Thu, Dec 18, 2014 at 9:29 PM, Roman Shaposhnik wrote: > > Following the discussion earlier: > http://s.apache.org/kTp > > I would like to call a VOTE for accepting > Zeppelin as a new Incubator project. > > The proposal is available at: > https://wiki.apache.org/incubator/ZeppelinProposal > and is also attached to the end of this email. > > Vote is open until at least Sunday, 21th December 2014, > 23:59:00 PST > > [ ] +1 Accept Zeppelin into the Incubator > [ ] =C2=B10 Indifferent to the acceptance of Zeppelin > [ ] -1 Do not accept Zeppelin because ... > > Thanks, > Roman. > > =3D=3D Abstract =3D=3D > Zeppelin is a collaborative data analytics and visualization tool for > distributed, general-purpose data processing systems such as Apache > Spark, Apache Flink, etc. > > =3D=3D Proposal =3D=3D > Zeppelin is a modern web-based tool for the data scientists to > collaborate over large-scale data exploration and visualization > projects. It is a notebook style interpreter that enable collaborative > analysis sessions sharing between users. Zeppelin is independent of > the execution framework itself. Current version runs on top of Apache > Spark but it has pluggable interpreter APIs to support other data > processing systems. More execution frameworks could be added at a > later date i.e Apache Flink, Crunch as well as SQL-like backends such > as Hive, Tajo, MRQL. > > We have a strong preference for the project to be called Zeppelin. In > case that may not be feasible, alternative names could be: =E2=80=9CMir= =E2=80=9D, > =E2=80=9CYuga=E2=80=9D or =E2=80=9CSora=E2=80=9D. > > =3D=3D Background =3D=3D > Large scale data analysis workflow includes multiple steps like data > acquisition, pre-processing, visualization, etc and may include > inter-operation of multiple different tools and technologies. With the > widespread of the open source general-purpose data processing systems > like Spark there is a lack of open source, modern user-friendly tools > that combine strengths of interpreted language for data analysis with > new in-browser visualization libraries and collaborative capabilities. > > Zeppelin initially started as a GUI tool for diverse set of > SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open > source since its inception in Sep 2013. Later, it became clear that > there was a need for a greater web-based tool for data scientists to > collaborate on data exploration over the large-scale projects, not > limited to SQL. So Zeppelin integrated full support of Apache Spark > while adding a collaborative environment with the ability to run and > share interpreter sessions in-browser > > =3D=3D Rationale =3D=3D > There are no open source alternatives for a collaborative > notebook-based interpreter with support of multiple distributed data > processing systems. > > As a number of companies adopting and contributing back to Zeppelin is > growing, we think that having a long-term home at Apache foundation > would be a great fit for the project ensuring that processes and > procedures are in place to keep project and community =E2=80=9Chealthy=E2= =80=9D and > free of any commercial, political or legal faults. > > =3D=3D Initial Goals =3D=3D > The initial goals will be to move the existing codebase to Apache and > integrate with the Apache development process. This includes moving > all infrastructure that we currently maintain, such as: a website, a > mailing list, an issues tracker and a Jenkins CI, as mentioned in > =E2=80=9CRequired Resources=E2=80=9D section of current proposal. > Once this is accomplished, we plan for incremental development and > releases that follow the Apache guidelines. > To increase adoption the major goal for the project would be to > provide integration with as much projects from Apache data ecosystem > as possible, including new interpreters for Apache Hive, Apache Drill > and adding Zeppelin distribution to Apache Bigtop. > On the community building side the main goal is to attract a diverse > set of contributors by promoting Zeppelin to wide variety of > engineers, starting a Zeppelin user groups around the globe and by > engaging with other existing Apache projects communities online. > > > =3D=3D Current Status =3D=3D > Currently, Zeppelin has 4 released versions and is used in production > at a number of companies across the globe mentioned in Affiliation > section. Current implementation status is pre-release with public API > not being finalized yet. Current main and default backend processing > engine is Apache Spark with consistent support of SparkSQL. > Zeppelin is distributed as a binary package which includes an embedded > webserver, application itself, a set of libraries and startup/shutdown > scripts. No platform-specific installation packages are provided yet > but it is something we are looking to provide as part of Apache Bigtop > integration. > Project codebase is currently hosted at github.com, which will form > the basis of the Apache git repository. > > =3D=3D=3D Meritocracy =3D=3D=3D > Zeppelin is an open source project that already leverages meritocracy > principles. It was started by a handfull of people and now it has > multiple contributors, although as the number of contribution grows we > want to build a diverse developer and user community that is governed > by the "Apache way". Users and new contributors will be treated with > respect and welcomed; they will earn merit in the project by tendering > quality patches and support that move the project forward. Those with > a proven support and quality patch track record will be encouraged to > become committers. > > =3D=3D=3D Community =3D=3D=3D > Zeppelin already has a burgeoning community of users spread across the > world that leverage and contributes to the code base and mailing list. > We hope that being part of Apache Foundation will help to grow it more > and convert some of the users into active contributors to the project. > > =3D=3D=3D Core Developers =3D=3D=3D > The core developers of Zeppelin are listed in our contributors and > initial PPMC below. It is a diverse group of people from two > companies, NFLabs and Between, as mentioned in Affiliations section > including at least one Apache committer and PPMC member, Lee Moon Soo, > of Apache MRQL project. > > =3D=3D=3D Alignment =3D=3D=3D > Zeppelin is already integrated with Apache Spark. Integration with > Apache Tajo and Apache MRQL is something that has been currently > worked on. Apache Flink is a potential next integration step. We also > plan to add a binary distribution of Zeppelin to Apache Bigtop to > align it with whole ASF Hadoop data stack. > > =3D=3D Known Risks =3D=3D > We feel that for Zeppelin to become as successful as it can be, it > needs to be picked up by as many back-end systems as possible, not > only Apache Spark. > > =3D=3D=3D Orphaned Products =3D=3D=3D > Initial code contributors were from the same company but in last few > months we see signs of the global adoption, at least 2 more companies > in Europe and US have products based on a Zeppelin codebase. Other > companies use Zeppelin in production for their data analytics > workflows. We believe that this, plus the fact that Zeppelin already > have contributors from different companies mitigates this risk well. > > =3D=3D=3D Inexperience with Open Source =3D=3D=3D > Zeppelin was born as an open source project from scratch. Majority of > the current core contributors have experience working on other open > source projects. We also expect that as we grow the community further > based on meritocracy and with the guidance of more experienced mentors > this will have a positive influence on the project in the long term. > > =3D=3D=3D Homogenous Developers =3D=3D=3D > The initial committers are from same region but there are already 2 > companies in the Europe that contribute to Zeppelin and others in US > also reviewing it and being active on the mailing list. We are > committed to create diverse mix of developers from all over the world. > > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > Most of the Zeppelin contributors use it as tool of choice either in > their own companies internally or distribute it as part of the > product. > Backend agnostic design helps to keep it as tool of choice for diverse > community of data analysts even if they move from one employee to > another. > There also is at least one university in US with students who > potentially might use Zeppelin for R=E2=80=99n=E2=80=99D projects. > > =3D=3D=3D Relationship with Other Apache Products =3D=3D=3D > Right now Zeppelin relies on Apache Spark to run distributed task > across a cluster of machines, but it=E2=80=99s abstract interpreter desig= n > allows it to work with other systems like Apache MRQL, Apache Crunch > as well as SQL-based systems like Apache Tajo, Apache Hive > > =3D=3D=3D A Excessive Fascination with the Apache Brand =3D=3D=3D > We believe that joining Apache will help us attract more contributors > to Zeppelin, by giving us a well-defined, transparent development and > governance process under a known brand. The reason for this proposal > is not to gain publicity, but to further strengthen the longevity of > the project without affiliation with any particular company. There are > no plans to use of Apache brand in press releases nor posting > advertising of acceptance it into Apache Incubator. > > =3D=3D=3D Documentation =3D=3D=3D > Additional documentation on Zeppelin may be found on its github website: > * Zeppelin overview: > https://github.com/NFLabs/zeppelin/blob/master/README.md > * Zeppelin docs: http://zeppelin-project.org/docs/index.html > * Zeppelin road map: > https://github.com/NFLabs/zeppelin/blob/master/Roadmap.md > * Zeppelin issue tracking: > https://zeppelin-project.atlassian.net/browse/ZEPPELIN > * Zeppelin codebase: https://github.com/NFLabs/zeppelin > * User group: https://groups.google.com/group/zeppelin-developers > > =3D=3D Initial Source =3D=3D > Zeppelin codebase is currently hosted on Github: > https://github.com/NFLabs/zeppelin > > =3D=3D=3D Source and Intellectual Property Submission Plan =3D=3D=3D > Currently, the Zeppleing codebase is distributed under an Apache 2.0 > License. > > =3D=3D External Dependencies =3D=3D > To the best of our knowledge, all other dependencies of Zeppelin are > distributed under Apache compatible licenses (e.g. junit is EPL, > Eclipse Public License v1.0, atmosphere-jersey is CDDL1.0 and > dom4j:dom4 is BSD licensed, org.slf4j and > org.java-websocket:Java-WebSocket are MIT). > Only org.reflections:reflections > https://github.com/ronmamo/reflections is WTFPL 2.0, which should not > be a problem as of https://issues.apache.org/jira/browse/LEGAL-135 > Upon acceptance to the incubator, we would begin a thorough analysis > of all transitive dependencies to verify this information and > introduce license checking into the build and release process by > integrating with Apache Rat. > > =3D=3D Required Resources =3D=3D > =3D=3D=3D Mailing list =3D=3D=3D > We will migrate the existing Zeppelin mailing lists as follows: > * zeppelin-developers@googlegroups.com --> > dev@zeppelin.incubator.apache.org > * users@zeppelin.incubator.apache.org > * private@zeppelin.incubator.apache.org for PPMC members > * commits@zeppelin.incubator.apache.org > The latter is to be consistent with the new PIAO naming scheme for > podlings. > > =3D=3D=3D Source control =3D=3D=3D > Zeppelin team would like to use Git for source control, as it already > uses Git. We request a writeable Git repo for Zeppelin, and mirroring > to be set up to Github through INFRA. > https://git-wip-us.apache.org/repos/asf/incubator-zeppelin.git > > =3D=3D=3D Issue Tracking =3D=3D=3D > Zeppelin currently uses the Jira tracking system > https://zeppelin-project.atlassian.net/browse/ZEPPELIN. We will > migrate to the Apache JIRA: > http://issues.apache.org/jira/browse/ZEPPELIN > > > =3D=3D=3D Other Resources =3D=3D=3D > * Jenkins/Hudson for builds and test running. > * Wiki for documentation purposes > * Blog to improve project dissemination > > =3D=3D Initial Committers =3D=3D > * Lee Moon Soo > * Anthony Corbacho , CLA submitted > * Damien Corneau , CLA submitted > * Alexander Bezzubov , CLA confirmed > * Kevin Sangwoo Kim , CLA confirmed > > =3D=3D Affiliations =3D=3D > * Lee Moon Soo: NFLabs > * Anthony Corbacho: NFLabs > * Damien Corneau: NFLabs > * Alexander Bezzubov: NFLabs > * Kevin Sangwoo Kim: VCNC (a.k.a Between) > > =3D=3D Sponsors =3D=3D > =3D=3D=3D Champion =3D=3D=3D > * Roman Shaposhnik > > =3D=3D=3D Nominated Mentors =3D=3D=3D > * Konstantin Boudnik > * Ted Dunning > * Henry Saputra > * Roman Shaposhnik > * Hyunsik Choi > > =3D=3D=3D Sponsoring Entity =3D=3D=3D > The Apache Incubator > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > > --001a113ecf46197160050a8b3082--