incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adunuthula, Seshu" <sadunuth...@ebay.com>
Subject Re: [DISCUSS] Impala incubator proposal
Date Wed, 18 Nov 2015 00:53:55 GMT
Awesome! Glad to see this becoming part of ASFŠ


On 11/17/15, 10:49 AM, "Henry Robinson" <henry@cloudera.com> wrote:

>Hi all -
>
>We'd like to start a discussion regarding a proposal to submit Impala to
>the Apache Incubator.
>
>The proposal text is available on the Wiki here:
>https://wiki.apache.org/incubator/ImpalaProposal
>
>and pasted below for convenience.
>
>I'm excited to make this proposal, and look forward to the community's
>input!
>
>Best,
>Henry
>
>
>= Abstract =
>Impala is a high-performance C++ and Java SQL query engine for data stored
>in Apache Hadoop-based clusters.
>
>= Proposal =
>
>We propose to contribute the Impala codebase and associated artifacts
>(e.g.
>documentation, web-site content etc.) to the Apache Software Foundation
>with the intent of forming a productive, meritocratic and open community
>around Impala¹s continued development, according to the ŒApache Way¹.
>
>Cloudera owns several trademarks regarding Impala, and proposes to
>transfer
>ownership of those trademarks in full to the ASF.
>
>= Background =
>Engineers at Cloudera developed Impala and released it as an
>Apache-licensed open-source project in Fall 2012. Impala was written as a
>brand-new, modern C++ SQL engine targeted from the start for data stored
>in
>Apache Hadoop clusters.
>
>Impala¹s most important benefit to users is high-performance, making it
>extremely appropriate for common enterprise analytic and business
>intelligence workloads. This is achieved by a number of software
>techniques, including: native support for data stored in HDFS and related
>filesystems, just-in-time compilation and optimization of individual query
>plans, high-performance C++ codebase and massively-parallel distributed
>architecture. In benchmarks, Impala is routinely amongst the very highest
>performing SQL query engines.
>
>= Rationale =
>
>Despite the exciting innovation in the so-called Œbig-data¹ space, SQL
>remains by far the most common interface for interacting with data in both
>traditional warehouses and modern Œbig-data¹ clusters. There is clearly a
>need, as evidenced by the eager adoption of Impala and other SQL engines
>in
>enterprise contexts, for a query engine that offers the familiar SQL
>interface, but that has been specifically designed to operate in massive,
>distributed clusters rather than in traditional, fixed-hardware,
>warehouse-specific deployments. Impala is one such query engine.
>
>We believe that the ASF is the right venue to foster an open-source
>community around Impala¹s development. We expect that Impala will benefit
>from more productive collaboration with related Apache projects, and under
>the auspices of the ASF will attract talented contributors who will push
>Impala¹s development forward at pace.
>
>We believe that the timing is right for Impala¹s development to move
>wholesale to the ASF: Impala is well-established, has been Apache-licensed
>open-source for more than three years, and the core project is relatively
>stable. We are excited to see where an ASF-based community can take Impala
>from this strong starting point.
>
>= Initial Goals =
>Our initial goals are as follows:
>
>* Establish ASF-compatible engineering practices and workflows
>* Refactor and publish existing internal build scripts and test
>infrastructure, in order to make them usable by any community member.
>* Transfer source code, documentation and associated artifacts to the ASF.
>* Grow the user and developer communities
>
>= Current Status =
>
>Impala is developed as an Apache-licensed open-source project. The source
>code is available at http://github.com/cloudera/Impala, and developer
>documentation is at https://github.com/cloudera/Impala/wiki. The majority
>of commits to the project have come from Cloudera-employed developers, but
>we have accepted some contributions from individuals from other
>organizations.
>
>All code reviews are done via a public instance of the Gerrit review tool
>at http://gerrit.cloudera.org:8080/, and discussed on a public mailing
>list. All patches must be reviewed before they are accepted into the
>codebase, via a voting mechanism that is similar to that used on Apache
>projects such as Hadoop and HBase.
>
>Before a patch is committed, it must pass a suite of pre-commit tests.
>These tests are currently run on Cloudera¹s internal infrastructure. One
>of
>our initial goals will be to work with the ASF Infrastructure team to find
>a way to run these tests in an acceptable way on publicly accessible
>machines.
>
>Issues are tracked in JIRA at https://issues.cloudera.org/projects/IMPALA,
>in a way that is extremely similar to existing practices at other ASF
>projects.
>
>= Meritocracy =
>
>We understand the central importance of meritocracy to the Apache Way. We
>will work to establish a welcoming, fair and meritocratic community, in
>part by expanding the set of committers on the project. Although Impala¹s
>committer list will initially be dominated by members of the Impala
>engineering team at Cloudera, we look forward to growing a rich user and
>developer community.
>
>= Community =
>Impala has a strong user community (see
>https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user), and a
>growing developer community (see
>https://groups.google.com/a/cloudera.org/forum/#!forum/impala-dev). We
>wish
>to attract more developers to the project, and we believe that the ASF¹s
>open and meritocratic philosophy will help us with this. We note the
>success of other, similar projects already part of the ASF.
>
>= Core Developers =
>Most - but not all - of Impala¹s core developers are not currently
>affiliated with the ASF, and will require new ICLAs.
>
>= Alignment =
>Impala is related to several other Apache projects:
>
>* Data that is read by Impala is very often stored in Apache Hadoop
>clusters powered by the HDFS filesystem.
>* Impala can also read data stored in Apache HBase
>* Metadata for databases, tables and so on is read by Impala from Apache
>Hive.
>* The preferred data format for HDFS-based tables is Apache Parquet, and
>Apache Avro is also a supported data format.
>* Impala is closely integrated with Kudu, which is also being proposed to
>the Incubator.
>* Impala uses Apache Thrift as its RPC and serialization framework of
>choice.
>
>= Known Risks =
>
>== Orphaned Products ==
>Impala is used by most of Cloudera¹s customers, and Cloudera remains
>committed to developing and supporting the project. Cloudera has a strong
>track record in standing behind projects that were contributed to the ASF
>by its employees, including Apache Flume, Apache Sqoop, and others. Other
>companies both ship and support Impala, lending credence to the idea that
>Impala is not at risk of being suddenly orphaned.
>
>== Inexperience with Open Source ==
>Although all committers on the initial list have significant experience
>with at least one open-source project - namely Impala - fewer have much
>experience with ASF-based software projects as contributors and community
>members. However, with the guidance of our mentors, committers who do have
>ASF experience, and time to learn during Incubation, we are confident that
>the project can be run in accordance with Apache principles on an ongoing
>basis.
>
>== Homogeneous Developers ==
>
>The initial committers are employees of Cloudera.
>
>The project has received some contributions from developers outside of
>Cloudera, from individuals belonging to organizations such as Intel and
>Google, from hobbyists and from students using Impala to advance their
>understanding of distributed databases. The project attracted an active
>user community as well. We hope to continue to encourage contributions
>from
>these developers and community members and grow them into committers after
>they have had time to continue their contributions.
>
>== Reliance on Salaried Developers ==
>
>Many of Impala¹s initial set of committers work full-time on Impala, and
>are paid to do so. However, as mentioned elsewhere, we anticipate growth
>in
>the developer community which we hope will include hobbyists and academics
>who have an interested in distributed data systems.
>
>== An Excessive Fascination with the Apache Brand ==
>Although we hope that Impala benefits from the Apache Brand, any reflected
>goodwill to Cloudera as the contributing entity is not the goal of
>establishing Impala as an Apache project. We will work with the Incubator
>PMC and the PRC to ensure that the Apache Brand is respected.
>
>= Documentation =
>Impala: A Modern, Open-Source SQL Engine for Hadoop (
>http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf)
>
>Impala¹s developer wiki (https://github.com/cloudera/Impala/wiki)
>
>Impala¹s auto-generated API documentation (
>http://impala.io/doc/html/index.html)
>
>= Initial Source =
>Impala¹s initial source contribution will come from
>http://github.com/cloudera/Impala/.
>
>= External Dependencies =
>
>Impala depends upon a number of third-party libraries, which we list
>below.
>We intend to compile a LICENSE.txt file in the very short term (see
>https://issues.cloudera.org/browse/IMPALA-2670).
>
>* Google gflags (BSD)
>* Google glog (BSD)
>* Apache Thrift (Apache Software License v2.0)
>* Apache Commons (Apache Software License v2.0)
>* Apache Thrift (Apache Software License v2.0)
>* Apache Hadoop (Apache Software License v2.0)
>* Apache HBase (Apache Software License v2.0)
>* Apache Hive (Apache Software License v2.0)
>* Boost (Boost Software License)
>* OpenLdap (OpenLDAP Software License)
>* rapidjson (MIT)
>* Google RE2 (BSD-style)
>* lz4 (BSD)
>* snappy (BSD)
>* cyrus-sasl (CMU License)
>* Apache Avro (Apache Software License v2.0)
>* Cloudera squeasel (Apache Software License v2.0)
>* Apache htrace (Incubating) (Apache Software License v2.0)
>* Apache Sentry (Incubating) (Apache Software License v2.0)
>* Apache Shiro (Apache Software License v2.0)
>* Twitter Bootstrap (Apache Software License v2.0)
>* d3 (BSD)
>* LLVM (BSD-like)
>
>Build and test dependencies:
>
>* ant (Apache Software License v2.0)
>* maven (Apache Software License v2.0)
>* cmake (BSD)
>* clang (BSD)
>* Google gtest (Apache Software License v2.0)
>
>= Required Resources =
>
>We request that following resources be created for the project to use:
>
>== Mailing lists ==
>
>* private@impala.incubator.apache.org (moderated subscriptions)
>* commits@impala.incubator.apache.org
>* dev@impala.incubator.apache.org
>* issues@impala.incubator.apache.org
>* user@impala.incubator.apache.org
>
>== Git repository ==
>https://git.apache.org/impala.git
>
>== JIRA instance ==
>JIRA project IMPALA (IMPALA or IMP)
>
>== Other Resources ==
>We hope to continue using Gerrit for our code review and commit workflow.
>We are involved with discussions that the Kudu team at Cloudera have been
>having with Jake Farrell to start discussions on how Gerrit can fit into
>the ASF. We know that several other ASF projects or podlings are also
>interested in Gerrit.
>
>If the Infrastructure team does not have the bandwidth to support gerrit,
>we will continue to support our own instance of gerrit for Impala, and
>make
>the necessary integrations such that commits are properly authenticated
>and
>maintain sufficient provenance to uphold the ASF standards (e.g. via the
>solution adopted by the AsterixDB podling).
>
>= Initial Committers =
>
>* Tim Armstrong
>* Alex Behm
>* Taras Bobrovytsky
>* Casey Ching
>* Martin Grund
>* Daniel Hecht
>* Michael Ho
>* Matthew Jacobs
>* Ishaan Joshi
>* Marcel Kornacker
>* Sailesh Mukil
>* Henry Robinson
>* John Russell
>* Dimitris Tsirogiannis
>* Skye Wanderman-Milne
>* Juan Yu
>
>== Affiliations ==
>All: Cloudera Inc.
>
>= Sponsors =
>
>== Champion ==
>Tom White
>
>== Nominated Mentors ==
>Tom White
>Todd Lipcon
>Carl Steinbach
>
>= Sponsoring Entity =
>We ask that the Incubator PMC sponsor this proposal.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message