Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BEBD818EFF for ; Tue, 24 Nov 2015 20:08:52 +0000 (UTC) Received: (qmail 92686 invoked by uid 500); 24 Nov 2015 20:08:52 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 92491 invoked by uid 500); 24 Nov 2015 20:08:52 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 92450 invoked by uid 99); 24 Nov 2015 20:08:52 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Nov 2015 20:08:52 +0000 Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 7A3561A0015 for ; Tue, 24 Nov 2015 20:08:51 +0000 (UTC) Received: by wmec201 with SMTP id c201so226374103wme.0 for ; Tue, 24 Nov 2015 12:08:50 -0800 (PST) X-Received: by 10.194.19.100 with SMTP id d4mr39495610wje.18.1448395730097; Tue, 24 Nov 2015 12:08:50 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.113.206 with HTTP; Tue, 24 Nov 2015 12:08:10 -0800 (PST) In-Reply-To: References: From: Arvind Prabhakar Date: Tue, 24 Nov 2015 12:08:10 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [VOTE] Accept Kudu into the Apache Incubator To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=047d7b471bc8fc9ec605254ee896 --047d7b471bc8fc9ec605254ee896 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable +1 (binding) Regards, Arvind Prabhakar On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon wrote: > Hi all, > > Discussion on the [DISCUSS] thread seems to have wound down, so I'd like = to > call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is > pasted below and also available on the wiki at: > https://wiki.apache.org/incubator/KuduProposal > > The proposal is unchanged since the original version, except for the > addition of Carl Steinbach as a Mentor. > > Please cast your votes: > > [] +1, accept Kudu into the Incubator > [] +/-0, positive/negative non-counted expression of feelings > [] -1, do not accept Kudu into the incubator (please state reasoning) > > Given the US holiday this week, I imagine many folks are traveling or > otherwise offline. So, let's run the vote for a full week rather than the > traditional 72 hours. Unless the IPMC objects to the extended voting > period, the vote will close on Tues, Dec 1st at noon PST. > > Thanks > -Todd > ----- > > =3D Kudu Proposal =3D > > =3D=3D Abstract =3D=3D > > Kudu is a distributed columnar storage engine built for the Apache Hadoop > ecosystem. > > =3D=3D Proposal =3D=3D > > Kudu is an open source storage engine for structured data which supports > low-latency random access together with efficient analytical access > patterns. Kudu distributes data using horizontal partitioning and > replicates each partition using Raft consensus, providing low > mean-time-to-recovery and low tail latencies. Kudu is designed within the > context of the Apache Hadoop ecosystem and supports many integrations wit= h > other data analytics projects both inside and outside of the Apache > Software Foundation. > > > > We propose to incubate Kudu as a project of the Apache Software Foundatio= n. > > =3D=3D Background =3D=3D > > In recent years, explosive growth in the amount of data being generated a= nd > captured by enterprises has resulted in the rapid adoption of open source > technology which is able to store massive data sets at scale and at low > cost. In particular, the Apache Hadoop ecosystem has become a focal point > for such =E2=80=9Cbig data=E2=80=9D workloads, because many traditional o= pen source > database systems have lagged in offering a scalable alternative. > > > > Structured storage in the Hadoop ecosystem has typically been achieved in > two ways: for static data sets, data is typically stored on Apache HDFS > using binary data formats such as Apache Avro or Apache Parquet. However, > neither HDFS nor these formats has any provision for updating individual > records, or for efficient random access. Mutable data sets are typically > stored in semi-structured stores such as Apache HBase or Apache Cassandra= . > These systems allow for low-latency record-level reads and writes, but la= g > far behind the static file formats in terms of sequential read throughput > for applications such as SQL-based analytics or machine learning. > > > > Kudu is a new storage system designed and implemented from the ground up = to > fill this gap between high-throughput sequential-access storage systems > such as HDFS and low-latency random-access systems such as HBase or > Cassandra. While these existing systems continue to hold advantages in so= me > situations, Kudu offers a =E2=80=9Chappy medium=E2=80=9D alternative that= can dramatically > simplify the architecture of many common workloads. In particular, Kudu > offers a simple API for row-level inserts, updates, and deletes, while > providing table scans at throughputs similar to Parquet, a commonly-used > columnar format for static data. > > > > More information on Kudu can be found at the existing open source project > website: http://getkudu.io and in particular in the Kudu white-paper PDF: > http://getkudu.io/kudu.pdf from which the above was excerpted. > > =3D=3D Rationale =3D=3D > > As described above, Kudu fills an important gap in the open source storag= e > ecosystem. After our initial open source project release in September 201= 5, > we have seen a great amount of interest across a diverse set of users and > companies. We believe that, as a storage system, it is critical to build = an > equally diverse set of contributors in the development community. Our > experiences as committers and PMC members on other Apache projects have > taught us the value of diverse communities in ensuring both longevity and > high quality for such foundational systems. > > =3D=3D Initial Goals =3D=3D > > * Move the existing codebase, website, documentation, and mailing lists = to > Apache-hosted infrastructure > * Work with the infrastructure team to implement and approve our code > review, build, and testing workflows in the context of the ASF > * Incremental development and releases per Apache guidelines > > =3D=3D Current Status =3D=3D > > =3D=3D=3D=3D Releases =3D=3D=3D=3D > > Kudu has undergone one public release, tagged here > https://github.com/cloudera/kudu/tree/kudu0.5.0-release > > This initial release was not performed in the typical ASF fashion -- no > source tarball was released, but rather only convenience binaries made > available in Cloudera=E2=80=99s repositories. We will adopt the ASF sourc= e release > process upon joining the incubator. > > > =3D=3D=3D=3D Source =3D=3D=3D=3D > > Kudu=E2=80=99s source is currently hosted on GitHub at > https://github.com/cloudera/kudu > > This repository will be transitioned to Apache=E2=80=99s git hosting duri= ng > incubation. > > > > =3D=3D=3D=3D Code review =3D=3D=3D=3D > > Kudu=E2=80=99s code reviews are currently public and hosted on Gerrit at > http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu > > The Kudu developer community is very happy with gerrit and hopes to work > with the Apache Infrastructure team to figure out how we can continue to > use Gerrit within ASF policies. > > > > =3D=3D=3D=3D Issue tracking =3D=3D=3D=3D > > Kudu=E2=80=99s bug and feature tracking is hosted on JIRA at: > https://issues.cloudera.org/projects/KUDU/summary > > This JIRA instance contains bugs and development discussion dating back 2 > years prior to Kudu=E2=80=99s open source release and will provide an ini= tial seed > for the ASF JIRA. > > > > =3D=3D=3D=3D Community discussion =3D=3D=3D=3D > > Kudu has several public discussion forums, linked here: > http://getkudu.io/community.html > > > > =3D=3D=3D=3D Build Infrastructure =3D=3D=3D=3D > > The Kudu Gerrit instance is configured to only allow patches to be > committed after running them through an extensive set of pre-commit tests > and code lints. The project currently makes use of elastic public cloud > resources to perform these tests. Until this point, these resources have > been internal to Cloudera, though we are currently investing in moving to= a > publicly accessible infrastructure. > > > > =3D=3D=3D=3D Development practices =3D=3D=3D=3D > > Given that Kudu is a persistent storage engine, the community has a high > quality bar for contributions to its core. We have a firm belief that hig= h > quality is achieved through automation, not manual inspection, and hence > put a focus on thorough testing and build infrastructure to ensure that > bar. The development community also practices review-then-commit for all > changes to ensure that changes are accompanied by appropriate tests, are > well commented, etc. > > Rather than seeing these practices as barriers to contribution, we believ= e > that a fully automated and standardized review and testing practice makes > it easier for new contributors to have patches accepted. Any new develope= r > may post a patch to Gerrit using the same workflow as a seasoned > contributor, and the same suite of tests will be automatically run. If th= e > tests pass, a committer can quickly review and commit the contribution fr= om > their web browser. > > =3D=3D=3D Meritocracy =3D=3D=3D > > We believe strongly in meritocracy in electing committers and PMC members= . > We believe that contributions can come in forms other than just code: for > example, one of our initial proposed committers has contributed solely in > the area of project documentation. We will encourage contributions and > participation of all types, and ensure that contributors are appropriatel= y > recognized. > > =3D=3D=3D Community =3D=3D=3D > > Though Kudu is relatively new as an open source project, it has already > seen promising growth in its community across several organizations: > > * '''Cloudera''' is the original development sponsor for Kudu. > * '''Xiaomi''' has been helping to develop and optimize Kudu for a new > production use case, contributing code, benchmarks, feedback, and > conference talks. > * '''Intel''' has contributed optimizations related to their hardware > technologies. > * '''Dropbox''' has been experimenting with Kudu for a machine monitorin= g > use case, and has been contributing bug reports and product feedback. > * '''Dremio''' is working on integration with Apache Drill and exploring > using Kudu in a production use case. > * Several community-built Docker images, tutorials, and blog posts have > sprouted up since Kudu=E2=80=99s release. > > > > By bringing Kudu to Apache, we hope to encourage further contribution fro= m > the above organizations as well as to engage new users and contributors i= n > the community. > > =3D=3D=3D Core Developers =3D=3D=3D > > Kudu was initially developed as a project at Cloudera. Most of the > contributions to date have been by developers employed by Cloudera. > > > > Many of the developers are committers or PMC members on other Apache > projects. > > =3D=3D=3D Alignment =3D=3D=3D > > As a project in the big data ecosystem, Kudu is aligned with several othe= r > ASF projects. Kudu includes input/output format integration with Apache > Hadoop, and this integration can also provide a bridge to Apache Spark. W= e > are planning to integrate with Apache Hive in the near future. We also > integrate closely with Cloudera Impala, which is also currently being > proposed for incubation. We have also scheduled a hackathon with the Apac= he > Drill team to work on integration with that query engine. > > =3D=3D Known Risks =3D=3D > > =3D=3D=3D Orphaned Products =3D=3D=3D > > The risk of Kudu being abandoned is low. Cloudera has invested a great de= al > in the initial development of the project, and intends to grow its > investment over time as Kudu becomes a product adopted by its customer > base. Several other organizations are also experimenting with Kudu for > production use cases which would live for many years. > > =3D=3D=3D Inexperience with Open Source =3D=3D=3D > > Kudu has been released in the open for less than two months. However, fro= m > our very first public announcement we have been committed to open-source > style development: > > * our code reviews are fully public and documented on a mailing list > * our daily development chatter is in a public chat room > * we send out weekly =E2=80=9Ccommunity status=E2=80=9D reports highligh= ting news and > contributions > * we published our entire JIRA history and discuss bugs in the open > * we published our entire Git commit history, going back three years (no > squashing) > > > > Several of the initial committers are experienced open source developers, > several being committers and/or PMC members on other ASF projects (Hadoop= , > HBase, Thrift, Flume, et al). Those who are not ASF committers have > experience on non-ASF open source projects (Kiji, open-vm-tools, et al). > > =3D=3D=3D Homogenous Developers =3D=3D=3D > > The initial committers are employees or former employees of Cloudera. > However, the committers are spread across multiple offices (Palo Alto, Sa= n > Francisco, Melbourne), so the team is familiar with working in a > distributed environment across varied time zones. > > > > The project has received some contributions from developers outside of > Cloudera, and is starting to attract a ''user'' community as well. We hop= e > to continue to encourage contributions from these developers and communit= y > members and grow them into committers after they have had time to continu= e > their contributions. > > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > > As mentioned above, the majority of development up to this point has been > sponsored by Cloudera. We have seen several community users participate i= n > discussions who are hobbyists interested in distributed systems and > databases, and hope that they will continue their participation in the > project going forward. > > =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D > > Kudu is currently related to the following other Apache projects: > > * Hadoop: Kudu provides MapReduce input/output formats for integration > * Spark: Kudu integrates with Spark via the above-mentioned input format= s, > and work is progressing on support for Spark Data Frames and Spark SQL. > > > > The Kudu team has reached out to several other Apache projects to start > discussing integrations, including Flume, Kafka, Hive, and Drill. > > > > Kudu integrates with Impala, which is also being proposed for incubation. > > > > Kudu is already collaborating on ValueVector, a proposed TLP spinning out > from the Apache Drill community. > > > > We look forward to continuing to integrate and collaborate with these > communities. > > =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D > > Many of the initial committers are already experienced Apache committers, > and understand the true value provided by the Apache Way and the principl= es > of the ASF. We believe that this development and contribution model is > especially appropriate for storage products, where Apache=E2=80=99s > community-over-code philosophy ensures long term viability and > consensus-based participation. > > =3D=3D Documentation =3D=3D > > * Documentation is written in AsciiDoc and committed in the Kudu source > repository: > > * https://github.com/cloudera/kudu/tree/master/docs > > > > * The Kudu web site is version-controlled on the =E2=80=98gh-pages=E2=80= =99 branch of the > above repository. > > * A LaTeX whitepaper is also published, and the source is available with= in > the same repository. > * APIs are documented within the source code as JavaDoc or C++-style > documentation comments. > * Many design documents are stored within the source code repository as > text files next to the code being documented. > > =3D=3D Source and Intellectual Property Submission Plan =3D=3D > > The Kudu codebase and web site is currently hosted on GitHub and will be > transitioned to the ASF repositories during incubation. Kudu is already > licensed under the Apache 2.0 license. > > > > Some portions of the code are imported from other open source projects > under the Apache 2.0, BSD, or MIT licenses, with copyrights held by autho= rs > other than the initial committers. These copyright notices are maintained > in those files as well as a top-level NOTICE.txt file. We believe this to > be permissible under the license terms and ASF policies, and confirmed vi= a > a recent thread on general@incubator.apache.org . > > > > The =E2=80=9CKudu=E2=80=9D name is not a registered trademark, though bef= ore the initial > release of the project, we performed a trademark search and Cloudera=E2= =80=99s > legal counsel deemed it acceptable in the context of a data storage engin= e. > There exists an unrelated open source project by the same name related to > deployments on Microsoft=E2=80=99s Azure cloud service. We have been in c= ontact > with legal counsel from Microsoft and have obtained their approval for th= e > use of the Kudu name. > > > > Cloudera currently owns several domain names related to Kudu (getkudu.io, > kududb.io, et al) which will be transferred to the ASF and redirected to > the official page during incubation. > > > > Portions of Kudu are protected by pending or published patents owned by > Cloudera. Given the protections already granted by the Apache License, we > do not anticipate any explicit licensing or transfer of this intellectual > property. > > =3D=3D External Dependencies =3D=3D > > The full set of dependencies and licenses are listed in > https://github.com/cloudera/kudu/blob/master/LICENSE.txt > > and summarized here: > > * '''Twitter Bootstrap''': Apache 2.0 > * '''d3''': BSD 3-clause > * '''epoch JS library''': MIT > * '''lz4''': BSD 2-clause > * '''gflags''': BSD 3-clause > * '''glog''': BSD 3-clause > * '''gperftools''': BSD 3-clause > * '''libev''': BSD 2-clause > * '''squeasel''':MIT license > * '''protobuf''': BSD 3-clause > * '''rapidjson''': MIT > * '''snappy''': BSD 3-clause > * '''trace-viewer''': BSD 3-clause > * '''zlib''': zlib license > * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike) > * '''bitshuffle''': MIT > * '''boost''': Boost license > * '''curl''': MIT > * '''libunwind''': MIT > * '''nvml''': BSD 3-clause > * '''cyrus-sasl''': Cyrus SASL license (BSD-alike) > * '''openssl''': OpenSSL License (BSD-alike) > > * '''Guava''': Apache 2.0 > * '''StumbleUpon Async''': BSD > * '''Apache Hadoop''': Apache 2.0 > * '''Apache log4j''': Apache 2.0 > * '''Netty''': Apache 2.0 > * '''slf4j''': MIT > * '''Apache Commons''': Apache 2.0 > * '''murmur''': Apache 2.0 > > > '''Build/test-only dependencies''': > > * '''CMake''': BSD 3-clause > * '''gcovr''': BSD 3-clause > * '''gmock''': BSD 3-clause > * '''Apache Maven''': Apache 2.0 > * '''JUnit''': EPL > * '''Mockito''': MIT > > =3D=3D Cryptography =3D=3D > > Kudu does not currently include any cryptography-related code. > > =3D=3D Required Resources =3D=3D > > =3D=3D=3D Mailing lists =3D=3D=3D > > * private@kudu.incubator.apache.org (PMC) > * commits@kudu.incubator.apache.org (git push emails) > * issues@kudu.incubator.apache.org (JIRA issue feed) > * dev@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion= ) > * user@kudu.incubator.apache.org (User questions) > > > =3D=3D=3D Repository =3D=3D=3D > > * git://git.apache.org/kudu > > =3D=3D=3D Gerrit =3D=3D=3D > > We hope to continue using Gerrit for our code review and commit workflow. > The Kudu team has already been in contact with Jake Farrell to start > discussions on how Gerrit can fit into the ASF. We know that several othe= r > ASF projects and podlings are also interested in Gerrit. > > > > If the Infrastructure team does not have the bandwidth to support Gerrit, > we will continue to support our own instance of Gerrit for Kudu, and make > the necessary integrations such that commits are properly authenticated a= nd > maintain sufficient provenance to uphold the ASF standards (e.g. via the > solution adopted by the AsterixDB podling). > > =3D=3D Issue Tracking =3D=3D > > We would like to import our current JIRA project into the ASF JIRA, such > that our historical commit messages and code comments continue to referen= ce > the appropriate bug numbers. > > =3D=3D Initial Committers =3D=3D > > * Adar Dembo adar@cloudera.com > * Alex Feinberg alex@strlen.net > * Andrew Wang wang@apache.org > * Dan Burkert dan@cloudera.com > * David Alves dralves@apache.org > * Jean-Daniel Cryans jdcryans@apache.org > * Mike Percy mpercy@apache.org > * Misty Stanley-Jones misty@apache.org > * Todd Lipcon todd@apache.org > > The initial list of committers was seeded by listing those contributors w= ho > have contributed 20 or more patches in the last 12 months, indicating tha= t > they are active and have achieved merit through participation on the > project. We chose not to include other contributors who either have not y= et > contributed a significant number of patches, or whose contributions are f= ar > in the past and we don=E2=80=99t expect to be active within the ASF. > > =3D=3D Affiliations =3D=3D > > * Adar Dembo - Cloudera > * Alex Feinberg - Forward Networks > * Andrew Wang - Cloudera > * Dan Burkert - Cloudera > * David Alves - Cloudera > * Jean-Daniel Cryans - Cloudera > * Mike Percy - Cloudera > * Misty Stanley-Jones - Cloudera > * Todd Lipcon - Cloudera > > =3D=3D Sponsors =3D=3D > > =3D=3D=3D Champion =3D=3D=3D > > * Todd Lipcon > > =3D=3D=3D Nominated Mentors =3D=3D=3D > > * Jake Farrell - ASF Member and Infra team member, Acquia > * Brock Noland - ASF Member, StreamSets > * Michael Stack - ASF Member, Cloudera > * Jarek Jarcec Cecho - ASF Member, Cloudera > * Chris Mattmann - ASF Member, NASA JPL and USC > * Julien Le Dem - Incubator PMC, Dremio > * Carl Steinbach - ASF Member, LinkedIn > > =3D=3D=3D Sponsoring Entity =3D=3D=3D > > The Apache Incubator > --047d7b471bc8fc9ec605254ee896--