Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C0FDF182F0 for ; Wed, 18 Nov 2015 00:39:04 +0000 (UTC) Received: (qmail 2694 invoked by uid 500); 18 Nov 2015 00:39:03 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 2457 invoked by uid 500); 18 Nov 2015 00:39:03 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 2445 invoked by uid 99); 18 Nov 2015 00:39:03 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Nov 2015 00:39:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id DAE61C5CB2 for ; Wed, 18 Nov 2015 00:39:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3 X-Spam-Level: *** X-Spam-Status: No, score=3 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera-com.20150623.gappssmtp.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id EPEp9LlvUqaO for ; Wed, 18 Nov 2015 00:38:51 +0000 (UTC) Received: from mail-io0-f171.google.com (mail-io0-f171.google.com [209.85.223.171]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 63DB520B66 for ; Wed, 18 Nov 2015 00:38:50 +0000 (UTC) Received: by ioir85 with SMTP id r85so37500983ioi.1 for ; Tue, 17 Nov 2015 16:38:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=OYJBmCyleZw0kcTIy8H0gcRwNwYOJpXhKwAMobcF8o8=; b=TlkV9whAQp2N6HiGaPyRBG17Y4gnlixa/0y7eaqcUurAcz0ZBBWhlOoHry2hq5CkhJ QoS9wUVplvAj/Wz1Ux7Fg/MGSOBvrPdmzH9+wS/VC8Csj7v/QyGnlib1Z9uLq0AJl/E/ +Pso3HztCvEKK93mVPzVGc7IEKpXVCV5NApokLvpsRDMVUaV/VnXNbkPF8PAwVnESFtu CmgTJc65FbaYehOXz6pSgFHbAkQA7DNUseJhp0jv9ymkjqNDYeJ/1LfOItsUxehu88D8 z0UK3Gkq+1jkDAnawuKYqoAesYbFHRqutuGBjherxP+p4wtEMz2DFCTM+V7KmNdSRsej LmEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=OYJBmCyleZw0kcTIy8H0gcRwNwYOJpXhKwAMobcF8o8=; b=ijzNwRAJlVokfh9lXmNfi8oN2M6fQGOLA9zcRaTiP09dpWkvBO8MohhITlznJ+vEw5 vYzMxdDmK9hF1LckV4hoDyaKaXao4bC3anTrGoaPfAD4w5gGctwbRt30xRi3Mds9+xTG PwXnZfjMvLO/HAo7s57lBcp3tBUWO6+5JGao1TgRjk+ys03orunpHDRs7KDaejSPekMP 6OtMzCASpVrOEQ23rZYXX08xbSSTtOKiHd2Lt9n7AmNWnbeKseiCjvxZKaiV0erEoRYE M+FqgeiW/xsM0Yaer/BWOnHxPM5HFFYGr1StjZ2enXVnwq+GZ3Ko0I+OArR4V3tuR8B0 oeMQ== X-Gm-Message-State: ALoCoQntOkt8QgbWkK2uPq2NPtCf1fTHiwWOK3WOCB2X16w0MmQ5Chl3QvK/Ka4jJMPiSmjr+W/z X-Received: by 10.107.10.66 with SMTP id u63mr396004ioi.86.1447807129290; Tue, 17 Nov 2015 16:38:49 -0800 (PST) MIME-Version: 1.0 Received: by 10.36.67.19 with HTTP; Tue, 17 Nov 2015 16:38:29 -0800 (PST) In-Reply-To: <20151118003039.GG4878@tpx> References: <20151118003039.GG4878@tpx> From: Henry Robinson Date: Tue, 17 Nov 2015 16:38:29 -0800 Message-ID: Subject: Re: [DISCUSS] Impala incubator proposal To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=001a113f93bea52b3c0524c5dd94 --001a113f93bea52b3c0524c5dd94 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 17 November 2015 at 16:30, Konstantin Boudnik wrote: > On Tue, Nov 17, 2015 at 02:26PM, Henry Robinson wrote: > > Hi Henry - > > > > Absolutely, although I want to point out that only two of our three > mentors > > which clearly constitutes "almost all" as Henry pointed out :) > > On a different note: have the initial delopers considered a possibility of > bringing the code to Apache Drill, which in my uneducated view seems to b= e > covering most of the bases in this case? > > We hadn't considered it, but speaking for myself I don't think it's a viable idea. In my opinion, being somewhat familiar with Drill and of course very familiar with Impala, there's less overlap than you might imagine. For example, Drill has the interesting idea of being a 'schema-free' query engine (see https://drill.apache.org/); Impala relies on traditional schemas. There are other design decisions that differ between the two, which mean that they are only superficially similar but have quite different implementations. We hope to continue to collaborate with Apache Drill and other related projects where appropriate - serialisation formats, client APIs and so on. Best, Henry > Thanks, > Cos > > > are Cloudera employees. That said, we'd of course be delighted to > consider > > any additional offers of mentorship. > > > > Best, > > Henry > > > > On 17 November 2015 at 14:17, Henry Saputra > wrote: > > > > > Glad to have the proposal :) > > > > > > Immediate glance would show almost all, including mentors, are coming > from > > > Cloudera. I think it would be beneficial for the podling to have at > > > least mentors from different org to provide bit of balance. > > > > > > - Henry > > > > > > On Tuesday, November 17, 2015, Henry Robinson > wrote: > > > > > > > Hi all - > > > > > > > > We'd like to start a discussion regarding a proposal to submit > Impala to > > > > the Apache Incubator. > > > > > > > > The proposal text is available on the Wiki here: > > > > https://wiki.apache.org/incubator/ImpalaProposal > > > > > > > > and pasted below for convenience. > > > > > > > > I'm excited to make this proposal, and look forward to the > community's > > > > input! > > > > > > > > Best, > > > > Henry > > > > > > > > > > > > =3D Abstract =3D > > > > Impala is a high-performance C++ and Java SQL query engine for data > > > stored > > > > in Apache Hadoop-based clusters. > > > > > > > > =3D Proposal =3D > > > > > > > > We propose to contribute the Impala codebase and associated artifac= ts > > > (e.g. > > > > documentation, web-site content etc.) to the Apache Software > Foundation > > > > with the intent of forming a productive, meritocratic and open > community > > > > around Impala=E2=80=99s continued development, according to the =E2= =80=98Apache Way=E2=80=99. > > > > > > > > Cloudera owns several trademarks regarding Impala, and proposes to > > > transfer > > > > ownership of those trademarks in full to the ASF. > > > > > > > > =3D Background =3D > > > > Engineers at Cloudera developed Impala and released it as an > > > > Apache-licensed open-source project in Fall 2012. Impala was writte= n > as a > > > > brand-new, modern C++ SQL engine targeted from the start for data > stored > > > in > > > > Apache Hadoop clusters. > > > > > > > > Impala=E2=80=99s most important benefit to users is high-performanc= e, making > it > > > > extremely appropriate for common enterprise analytic and business > > > > intelligence workloads. This is achieved by a number of software > > > > techniques, including: native support for data stored in HDFS and > related > > > > filesystems, just-in-time compilation and optimization of individua= l > > > query > > > > plans, high-performance C++ codebase and massively-parallel > distributed > > > > architecture. In benchmarks, Impala is routinely amongst the very > highest > > > > performing SQL query engines. > > > > > > > > =3D Rationale =3D > > > > > > > > Despite the exciting innovation in the so-called =E2=80=98big-data= =E2=80=99 space, > SQL > > > > remains by far the most common interface for interacting with data = in > > > both > > > > traditional warehouses and modern =E2=80=98big-data=E2=80=99 cluste= rs. There is > clearly a > > > > need, as evidenced by the eager adoption of Impala and other SQL > engines > > > in > > > > enterprise contexts, for a query engine that offers the familiar SQ= L > > > > interface, but that has been specifically designed to operate in > massive, > > > > distributed clusters rather than in traditional, fixed-hardware, > > > > warehouse-specific deployments. Impala is one such query engine. > > > > > > > > We believe that the ASF is the right venue to foster an open-source > > > > community around Impala=E2=80=99s development. We expect that Impal= a will > benefit > > > > from more productive collaboration with related Apache projects, an= d > > > under > > > > the auspices of the ASF will attract talented contributors who will > push > > > > Impala=E2=80=99s development forward at pace. > > > > > > > > We believe that the timing is right for Impala=E2=80=99s developmen= t to move > > > > wholesale to the ASF: Impala is well-established, has been > > > Apache-licensed > > > > open-source for more than three years, and the core project is > relatively > > > > stable. We are excited to see where an ASF-based community can take > > > Impala > > > > from this strong starting point. > > > > > > > > =3D Initial Goals =3D > > > > Our initial goals are as follows: > > > > > > > > * Establish ASF-compatible engineering practices and workflows > > > > * Refactor and publish existing internal build scripts and test > > > > infrastructure, in order to make them usable by any community membe= r. > > > > * Transfer source code, documentation and associated artifacts to t= he > > > ASF. > > > > * Grow the user and developer communities > > > > > > > > =3D Current Status =3D > > > > > > > > Impala is developed as an Apache-licensed open-source project. The > source > > > > code is available at http://github.com/cloudera/Impala, and > developer > > > > documentation is at https://github.com/cloudera/Impala/wiki. The > > > majority > > > > of commits to the project have come from Cloudera-employed > developers, > > > but > > > > we have accepted some contributions from individuals from other > > > > organizations. > > > > > > > > All code reviews are done via a public instance of the Gerrit revie= w > tool > > > > at http://gerrit.cloudera.org:8080/, and discussed on a public > mailing > > > > list. All patches must be reviewed before they are accepted into th= e > > > > codebase, via a voting mechanism that is similar to that used on > Apache > > > > projects such as Hadoop and HBase. > > > > > > > > Before a patch is committed, it must pass a suite of pre-commit > tests. > > > > These tests are currently run on Cloudera=E2=80=99s internal infras= tructure. > One > > > of > > > > our initial goals will be to work with the ASF Infrastructure team = to > > > find > > > > a way to run these tests in an acceptable way on publicly accessibl= e > > > > machines. > > > > > > > > Issues are tracked in JIRA at > > > https://issues.cloudera.org/projects/IMPALA, > > > > in a way that is extremely similar to existing practices at other A= SF > > > > projects. > > > > > > > > =3D Meritocracy =3D > > > > > > > > We understand the central importance of meritocracy to the Apache > Way. We > > > > will work to establish a welcoming, fair and meritocratic community= , > in > > > > part by expanding the set of committers on the project. Although > Impala=E2=80=99s > > > > committer list will initially be dominated by members of the Impala > > > > engineering team at Cloudera, we look forward to growing a rich use= r > and > > > > developer community. > > > > > > > > =3D Community =3D > > > > Impala has a strong user community (see > > > > https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user)= , > > > and a > > > > growing developer community (see > > > > https://groups.google.com/a/cloudera.org/forum/#!forum/impala-dev). > We > > > > wish > > > > to attract more developers to the project, and we believe that the > ASF=E2=80=99s > > > > open and meritocratic philosophy will help us with this. We note th= e > > > > success of other, similar projects already part of the ASF. > > > > > > > > =3D Core Developers =3D > > > > Most - but not all - of Impala=E2=80=99s core developers are not cu= rrently > > > > affiliated with the ASF, and will require new ICLAs. > > > > > > > > =3D Alignment =3D > > > > Impala is related to several other Apache projects: > > > > > > > > * Data that is read by Impala is very often stored in Apache Hadoop > > > > clusters powered by the HDFS filesystem. > > > > * Impala can also read data stored in Apache HBase > > > > * Metadata for databases, tables and so on is read by Impala from > Apache > > > > Hive. > > > > * The preferred data format for HDFS-based tables is Apache Parquet= , > and > > > > Apache Avro is also a supported data format. > > > > * Impala is closely integrated with Kudu, which is also being > proposed to > > > > the Incubator. > > > > * Impala uses Apache Thrift as its RPC and serialization framework = of > > > > choice. > > > > > > > > =3D Known Risks =3D > > > > > > > > =3D=3D Orphaned Products =3D=3D > > > > Impala is used by most of Cloudera=E2=80=99s customers, and Clouder= a remains > > > > committed to developing and supporting the project. Cloudera has a > strong > > > > track record in standing behind projects that were contributed to > the ASF > > > > by its employees, including Apache Flume, Apache Sqoop, and others. > Other > > > > companies both ship and support Impala, lending credence to the ide= a > that > > > > Impala is not at risk of being suddenly orphaned. > > > > > > > > =3D=3D Inexperience with Open Source =3D=3D > > > > Although all committers on the initial list have significant > experience > > > > with at least one open-source project - namely Impala - fewer have > much > > > > experience with ASF-based software projects as contributors and > community > > > > members. However, with the guidance of our mentors, committers who = do > > > have > > > > ASF experience, and time to learn during Incubation, we are confide= nt > > > that > > > > the project can be run in accordance with Apache principles on an > ongoing > > > > basis. > > > > > > > > =3D=3D Homogeneous Developers =3D=3D > > > > > > > > The initial committers are employees of Cloudera. > > > > > > > > The project has received some contributions from developers outside > of > > > > Cloudera, from individuals belonging to organizations such as Intel > and > > > > Google, from hobbyists and from students using Impala to advance > their > > > > understanding of distributed databases. The project attracted an > active > > > > user community as well. We hope to continue to encourage > contributions > > > from > > > > these developers and community members and grow them into committer= s > > > after > > > > they have had time to continue their contributions. > > > > > > > > =3D=3D Reliance on Salaried Developers =3D=3D > > > > > > > > Many of Impala=E2=80=99s initial set of committers work full-time o= n Impala, > and > > > > are paid to do so. However, as mentioned elsewhere, we anticipate > growth > > > in > > > > the developer community which we hope will include hobbyists and > > > academics > > > > who have an interested in distributed data systems. > > > > > > > > =3D=3D An Excessive Fascination with the Apache Brand =3D=3D > > > > Although we hope that Impala benefits from the Apache Brand, any > > > reflected > > > > goodwill to Cloudera as the contributing entity is not the goal of > > > > establishing Impala as an Apache project. We will work with the > Incubator > > > > PMC and the PRC to ensure that the Apache Brand is respected. > > > > > > > > =3D Documentation =3D > > > > Impala: A Modern, Open-Source SQL Engine for Hadoop ( > > > > http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf) > > > > > > > > Impala=E2=80=99s developer wiki (https://github.com/cloudera/Impala= /wiki) > > > > > > > > Impala=E2=80=99s auto-generated API documentation ( > > > > http://impala.io/doc/html/index.html) > > > > > > > > =3D Initial Source =3D > > > > Impala=E2=80=99s initial source contribution will come from > > > > http://github.com/cloudera/Impala/. > > > > > > > > =3D External Dependencies =3D > > > > > > > > Impala depends upon a number of third-party libraries, which we lis= t > > > below. > > > > We intend to compile a LICENSE.txt file in the very short term (see > > > > https://issues.cloudera.org/browse/IMPALA-2670). > > > > > > > > * Google gflags (BSD) > > > > * Google glog (BSD) > > > > * Apache Thrift (Apache Software License v2.0) > > > > * Apache Commons (Apache Software License v2.0) > > > > * Apache Thrift (Apache Software License v2.0) > > > > * Apache Hadoop (Apache Software License v2.0) > > > > * Apache HBase (Apache Software License v2.0) > > > > * Apache Hive (Apache Software License v2.0) > > > > * Boost (Boost Software License) > > > > * OpenLdap (OpenLDAP Software License) > > > > * rapidjson (MIT) > > > > * Google RE2 (BSD-style) > > > > * lz4 (BSD) > > > > * snappy (BSD) > > > > * cyrus-sasl (CMU License) > > > > * Apache Avro (Apache Software License v2.0) > > > > * Cloudera squeasel (Apache Software License v2.0) > > > > * Apache htrace (Incubating) (Apache Software License v2.0) > > > > * Apache Sentry (Incubating) (Apache Software License v2.0) > > > > * Apache Shiro (Apache Software License v2.0) > > > > * Twitter Bootstrap (Apache Software License v2.0) > > > > * d3 (BSD) > > > > * LLVM (BSD-like) > > > > > > > > Build and test dependencies: > > > > > > > > * ant (Apache Software License v2.0) > > > > * maven (Apache Software License v2.0) > > > > * cmake (BSD) > > > > * clang (BSD) > > > > * Google gtest (Apache Software License v2.0) > > > > > > > > =3D Required Resources =3D > > > > > > > > We request that following resources be created for the project to > use: > > > > > > > > =3D=3D Mailing lists =3D=3D > > > > > > > > * private@impala.incubator.apache.org (moderated > > > > subscriptions) > > > > * commits@impala.incubator.apache.org > > > > * dev@impala.incubator.apache.org > > > > * issues@impala.incubator.apache.org > > > > * user@impala.incubator.apache.org > > > > > > > > =3D=3D Git repository =3D=3D > > > > https://git.apache.org/impala.git > > > > > > > > =3D=3D JIRA instance =3D=3D > > > > JIRA project IMPALA (IMPALA or IMP) > > > > > > > > =3D=3D Other Resources =3D=3D > > > > We hope to continue using Gerrit for our code review and commit > workflow. > > > > We are involved with discussions that the Kudu team at Cloudera hav= e > been > > > > having with Jake Farrell to start discussions on how Gerrit can fit > into > > > > the ASF. We know that several other ASF projects or podlings are al= so > > > > interested in Gerrit. > > > > > > > > If the Infrastructure team does not have the bandwidth to support > gerrit, > > > > we will continue to support our own instance of gerrit for Impala, > and > > > make > > > > the necessary integrations such that commits are properly > authenticated > > > and > > > > maintain sufficient provenance to uphold the ASF standards (e.g. vi= a > the > > > > solution adopted by the AsterixDB podling). > > > > > > > > =3D Initial Committers =3D > > > > > > > > * Tim Armstrong > > > > * Alex Behm > > > > * Taras Bobrovytsky > > > > * Casey Ching > > > > * Martin Grund > > > > * Daniel Hecht > > > > * Michael Ho > > > > * Matthew Jacobs > > > > * Ishaan Joshi > > > > * Marcel Kornacker > > > > * Sailesh Mukil > > > > * Henry Robinson > > > > * John Russell > > > > * Dimitris Tsirogiannis > > > > * Skye Wanderman-Milne > > > > * Juan Yu > > > > > > > > =3D=3D Affiliations =3D=3D > > > > All: Cloudera Inc. > > > > > > > > =3D Sponsors =3D > > > > > > > > =3D=3D Champion =3D=3D > > > > Tom White > > > > > > > > =3D=3D Nominated Mentors =3D=3D > > > > Tom White > > > > Todd Lipcon > > > > Carl Steinbach > > > > > > > > =3D Sponsoring Entity =3D > > > > We ask that the Incubator PMC sponsor this proposal. > > > > > > > > > > > > > > > -- > > Henry Robinson > > Software Engineer > > Cloudera > > 415-994-6679 > --=20 Henry Robinson Software Engineer Cloudera 415-994-6679 --001a113f93bea52b3c0524c5dd94--