Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 36EA11021E for ; Wed, 2 Sep 2015 07:34:59 +0000 (UTC) Received: (qmail 64453 invoked by uid 500); 2 Sep 2015 07:34:58 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 64175 invoked by uid 500); 2 Sep 2015 07:34:58 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Delivered-To: moderator for general@incubator.apache.org Received: (qmail 79601 invoked by uid 99); 1 Sep 2015 18:39:02 -0000 X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.109 X-Spam-Level: X-Spam-Status: No, score=-0.109 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com X-Yahoo-Newman-Id: 870210.4315.bm@smtp211.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-4 X-YMail-OSG: Jzj3AjMVM1mHmLMP0.CScph5HW.QIW4GjQNe9aGyRAHQfoH NkLnbhLm0pEaWRKf8rn.2i13YUVVEfvLVnurVRd.G62z.CVGzAlGU74uwbca VzcimUlFRStJTFpxTW7wkhDD4a9.yhjps12_xEdIt095b.eVDKrIEmAFx_K8 0dyJQADoxNA_6x8H_3lm76_GXQbiOCzKoAje8inFNvlsuqv0gAuNRf_y2QwR Kap10xEWBpXeEz9_ZbFBXUfc40ARzv3E9S2qXUS3GoE2wCa5h.bxz9ObEhUL h.RwBC2Y3ZUVKQn7cOIqvjz_y7YXpNGVnr8i7dVqONVbxCvJ51tR72.ik7xa K5bBGAxEc6MhcwQucYHhtUpFUkVoEjuPuYuWKqp6gIgf0kRAIpBPwR8QgfXI cdXXctXOx.UvujAUG38E._6sY2yQQijSpOpXc9lXQFAw_ifxuL2lkejFDpQu _WeuDx7FE.rGA5BaUGDCSgRGk5zxHaHg307G3tHBdP42VUNv1BBkZQrk1znH vtjFH2r9vg_6gwhXqGmqT6NAMi.yXR_Iyp8civaxIomQ- X-Yahoo-SMTP: b_1nDt.swBA.ZpZyhvIX6JSPuvAP From: Caleb Welton Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) Subject: Re: [VOTE] Accept HAWQ into the Apache Incubator Message-Id: <3C0BB31A-E937-448B-A662-EB83EF32D658@yahoo.com> Date: Tue, 1 Sep 2015 08:35:42 -1000 References: In-Reply-To: To: "general@incubator.apache.org" X-Mailer: iPhone Mail (12B411) +1 (non-binding) Caleb > On Aug 31, 2015, at 8:47 AM, Roman Shaposhnik wrote: >=20 > Following the discussion earlier: > http://s.apache.org/Gaf >=20 > I would like to call a VOTE for accepting HAWQ > as a new incubator project. >=20 > The proposal is available at: > https://wiki.apache.org/incubator/HAWQProposal > and is also included at the bottom of this email. >=20 > Vote is open until at least Thu, 3 September 2015, 23:59:00 PST >=20 > [ ] +1 accept HAWQ into the Apache Incubator > [ ] =C2=B10 > [ ] -1 because... >=20 > Thanks, > Roman. >=20 > =3D=3D Abstract =3D=3D >=20 > HAWQ is an advanced enterprise SQL on Hadoop analytic engine built > around a robust and high-performance massively-parallel processing > (MPP) SQL framework evolved from Pivotal Greenplum Database=E2=93=87. >=20 > HAWQ runs natively on Apache Hadoop=E2=93=87 clusters by tightly integrati= ng > with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as > Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and > managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL > compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP > extensions) and supports open database connectivity (ODBC) and Java > database connectivity (JDBC), as well. Most business intelligence, > data analysis and data visualization tools work with HAWQ out of the > box without the need for specialized drivers. >=20 > A unique aspect of HAWQ is its integration of statistical and machine > learning capabilities that can be natively invoked from SQL or (in the > context of PL/Python, PL/Java or PL/R) in massively parallel modes and > applied to large data sets across a Hadoop cluster. These capabilities > are provided through MADlib =E2=80=93 an existing open source, parallel > machine-learning library. Given the close ties between the two > development communities, the MADlib community has expressed interest > in joining HAWQ on its journey into the ASF Incubator and will be > submitting a separate, concurrent proposal. >=20 > HAWQ will provide more robust and higher performing options for Hadoop > environments that demand best-in-class data analytics for business > critical purposes. HAWQ is implemented in C and C++. >=20 > HAWQ has a few runtime dependencies licensed under the Cat X list: > * gperf (GPL Version 3) > * libgsasl (LGPL Version 2.1) > * libuuid-2.26 (LGPL Version 2) > However, given the runtime (dynamic linking) nature of these > dependencies it doesn't represent a problem for HAWQ to be considered > an ASF project. >=20 > =3D=3D Proposal =3D=3D > The goal of this proposal is to bring the core of Pivotal Software, > Inc.=E2=80=99s (Pivotal) Pivotal HAWQ=E2=93=87 codebase into the Apache So= ftware > Foundation (ASF) in order to build a vibrant, diverse and > self-governed open source community around the technology. Pivotal has > agreed to transfer the brand name "HAWQ" to Apache Software Foundation > and will stop using HAWQ to refer to this software if the project gets > accepted into the ASF Incubator under the name of "Apache HAWQ > (incubating)". Pivotal will continue to market and sell an analytic > engine product that includes Apache HAWQ (incubating). While HAWQ is > our primary choice for a name of the project, in anticipation of any > potential issues with PODLINGNAMESEARCH we have come up with two > alternative names: (1) Hornet; or (2) Grove. >=20 > Pivotal is submitting this proposal to donate the HAWQ source code and > associated artifacts (documentation, web site content, wiki, etc.) to > the Apache Software Foundation Incubator under the Apache License, > Version 2.0 and is asking Incubator PMC to establish an open source > community. >=20 > =3D=3D Background =3D=3D > While the ecosystem of open source SQL-on-Hadoop solutions is fairly > developed by now, HAWQ has several unique features that will set it > apart from existing ASF and non-ASF projects. HAWQ made its debut in > 2013 as a closed source product leveraging a decade's worth of product > development effort invested in Greenplum Database=E2=93=87. Since then HAW= Q > has rapidly gained a solid customer base and became available on > non-Pivotal distributions of Hadoop. > In 2015 HAWQ still leverages the rock solid foundation of Greenplum > Database, while at the same time embracing elasticity and resource > management native to Hadoop applications. This allows HAWQ to provide > superior SQL on Hadoop performance, scalability and coverage while > also providing massively-parallel machine learning capabilities and > support for native Hadoop file formats. In addition, HAWQ's advanced > features include support for complex joins, rich and compliant SQL > dialect and industry-differentiating data federation capabilities. > Dynamic pipelining and pluggable query optimizer architecture enable > HAWQ to perform queries on Hadoop with the speed and scalability > required for enterprise data warehouse (EDW) workloads. HAWQ provides > strong support for low-latency analytic SQL queries, coupled with > massively parallel machine learning capabilities. This enables > discovery-based analysis of large data sets and rapid, iterative > development of data analytics applications that apply deep machine > learning =E2=80=93 significantly shortening data-driven innovation cycles f= or > the enterprise. >=20 > Hundreds of companies and thousands of servers are running > mission-critical applications today on HAWQ managing over PBs of data. >=20 > =3D=3D Rationale =3D=3D > Hadoop and HDFS-based data management architectures continue their > expansion into the enterprise. As the amount of data stored on Hadoop > clusters grows, unlocking the analytics capabilities and democratizing > access to that treasure trove of data becomes one of the key concerns. > While Hadoop has no shortage of purposefully designed analytical > frameworks, the easiest and most cost-effective way to onboard the > largest amount of data consumers is provided by offering SQL APIs for > data retrieval at scale. Of course, given the high velocity of > innovation happening in the underlying Hadoop ecosystem, any > SQL-on-Hadoop solution has to keep up with the community. We strongly > believe that in the Big Data space, this can be optimally achieved > through a vibrant, diverse, self-governed community collectively > innovating around a single codebase while at the same time > cross-pollinating with various other data management communities. > Apache Software Foundation is the ideal place to meet those ambitious > goals. We also believe that our initial experience of bringing Pivotal > Gemfire=E2=93=87 into ASF as Apache Geode (incubating) could be leveraged t= hus > improving the chances of HAWQ becoming a vibrant Apache community. >=20 > =3D=3D Initial Goals =3D=3D > Our initial goals are to bring HAWQ into the ASF, transition internal > engineering processes into the open, and foster a collaborative > development model according to the "Apache Way." Pivotal and its > partners plan to develop new functionality in an open, > community-driven way. To get there, the existing internal build, test > and release processes will be refactored to support open development. >=20 > =3D=3D Current Status =3D=3D > Currently, the project code base is commercially licensed and is not > available to the general public. The documentation and wiki pages are > available at FIXME. Although Pivotal HAWQ was developed as a > proprietary, closed-source product, its roots are in the PostgreSQL > community and the internal engineering practices adopted by the > development team lend themselves well to an open, collaborative and > meritocratic environment. >=20 > The Pivotal HAWQ team has always focused on building a robust end user > community of paying and non-paying customers. The existing > documentation along with StackOverflow and other similar forums are > expected to facilitate conversions between our existing users so as to > transform them into an active community of HAWQ members, stakeholders > and developers. >=20 > =3D=3D=3D Meritocracy =3D=3D=3D > Our proposed list of initial committers include the current HAWQ R&D > team, Pivotal Field Engineers, and several existing partners. This > group will form a base for the broader community we will invite to > collaborate on the codebase. We intend to radically expand the initial > developer and user community by running the project in accordance with > the "Apache Way". Users and new contributors will be treated with > respect and welcomed. By participating in the community and providing > quality patches/support that move the project forward, contributors > will earn merit. They also will be encouraged to provide non-code > contributions (documentation, events, community management, etc.) and > will gain merit for doing so. Those with a proven support and quality > track record will be encouraged to become committers. >=20 > =3D=3D=3D Community =3D=3D=3D > If HAWQ is accepted for incubation, the primary initial goal will be > transitioning the core community towards embracing the Apache Way of > project governance. We would solicit major existing contributors to > become committers on the project from the start. >=20 > =3D=3D=3D Core Developers =3D=3D=3D >=20 > A few of HAWQ's core developers are skilled in working as part of > openly governed Apache communities (mainly around Hadoop ecosystem). > That said, most of the core developers are currently NOT affiliated > with the ASF and would require new ICLAs before committing to the > project. >=20 > =3D=3D=3D Alignment =3D=3D=3D > The following existing ASF projects can be considered when reviewing > HAWQ proposal: >=20 > Apache Hadoop is a distributed storage and processing framework for > very large datasets, focusing primarily on batch processing for > analytic purposes. HAWQ builds on top of two key pieces of Hadoop: > YARN and HDFS. HAWQ's community roadmap includes plans for > contributing Hadoop around HDFS features and increasing support for C > and C++ clients. >=20 > Apache Spark=E2=84=A2 is a fast engine for processing large datasets, > typically from a Hadoop cluster, and performing batch, streaming, > interactive, or machine learning workloads. Recently, Apache Spark > has embraced SQL-like APIs around DataFrames at its core. Because of > that we would expect a level of collaboration between the two projects > when it comes to query optimization and exposing HAWQ tables to Spark > analytical pipelines. >=20 > Apache Hive=E2=84=A2 is a data warehouse software that facilitates queryin= g > and managing large datasets residing in distributed storage. Hive > provides a mechanism to project structure onto this data and query the > data using a SQL-like language called HiveQL. Hive is also providing > HCatalog capabilities as table and storage management layer for > Hadoop, enabling users with different data processing tools to more > easily define structure for the data on the grid. Currently the core > Hive and HAWQ are viewed as complimentary solutions, but we expect > close integration with HCatalog given its dominant position for > metadata management on the Hadoop clusters. >=20 > Apache Phoenix is a high performance relational database layer over > HBase for low latency applications. Given Phoenix's exclusive focus on > HBase for its data management backend and its overall architecture > around HBase's co-processors, it is unlikely that there will be much > collaboration between the two projects. >=20 > =3D=3D Known Risks =3D=3D > Development has been sponsored mostly by a single company (or its > predecessors) thus far and coordinated mainly by the core Pivotal HAWQ > team. >=20 > For the project to fully transition to the Apache Way governance > model, development must shift towards the meritocracy-centric model of > growing a community of contributors balanced with the needs for > extreme stability and core implementation coherency. >=20 > The tools and development practices in place for the Pivotal HAWQ > product are compatible with the ASF infrastructure and thus we do not > anticipate any on-boarding pains. >=20 > The project currently includes a modified version of PostgreSQL 8.3 > source code. Given the ASF's position that the PostgreSQL License is > compatible with the Apache License version 2.0, we do NOT anticipate > any issues with licensing the code base. However, any new capabilities > developed by the HAWQ team once part of the ASF would need to be > consumed by the PostgreSQL community under the Apache License version > 2.0. >=20 > =3D=3D=3D Orphaned products =3D=3D=3D > Pivotal is fully committed to maintaining its position as one of the > leading providers of SQL-on-Hadoop solutions and the corresponding > Pivotal commercial product will continue to be based on the HAWQ > project. Moreover, Pivotal has a vested interest in making HAWQ > successful by driving its close integration with both existing > projects contributed by Pivotal including Apache Geode (incubating) > and MADlib (which is requesting Incubation), and sister ASF projects. > We expect this to further reduces the risk of orphaning the product. >=20 > =3D=3D=3D Inexperience with Open Source =3D=3D=3D > Pivotal has embraced open source software since its formation by > employing contributors/committers and by shepherding open source > projects like Cloud Foundry, Spring, RabbitMQ and MADlib. Individuals > working at Pivotal have experience with the formation of vibrant > communities around open technologies with the Cloud Foundry > Foundation, and continuing with the creation of a community around > Apache Geode (incubating). Although some of the initial committers > have not had the experience of developing entirely open source, > community-driven projects, we expect to bring to bear the open > development practices that have proven successful on longstanding > Pivotal open source projects to the HAWQ community. Additionally, > several ASF veterans have agreed to mentor the project and are listed > in this proposal. The project will rely on their collective guidance > and wisdom to quickly transition the entire team of initial committers > towards practicing the Apache Way. >=20 > =3D=3D=3D Homogeneous Developers =3D=3D=3D > While most of the initial committers are employed by Pivotal, we have > already seen a healthy level of interest from existing customers and > partners. We intend to convert that interest directly into > participation and will be investing in activities to recruit > additional committers from other companies. >=20 > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > Most of the contributors are paid to work in the Big Data space. While > they might wander from their current employers, they are unlikely to > venture far from their core expertise and thus will continue to be > engaged with the project regardless of their current employers. >=20 > =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D > As mentioned in the Alignment section, HAWQ may consider various > degrees of integration and code exchange with Apache Hadoop, Apache > Spark and Apache Hive projects. We expect integration points to be > inside and outside the project. We look forward to collaborating with > these communities as well as other communities under the Apache > umbrella. >=20 > =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D > While we intend to leverage the Apache =E2=80=98branding=E2=80=99 when tal= king to > other projects as testament of our project=E2=80=99s =E2=80=98neutrality=E2= =80=99, we have no > plans for making use of Apache brand in press releases nor posting > billboards advertising acceptance of HAWQ into Apache Incubator. >=20 > =3D=3D Documentation =3D=3D > The documentation is currently available at http://hawq.docs.pivotal.io/ >=20 > =3D=3D Initial Source =3D=3D > Initial source code will be available immediately after Incubator PMC > approves HAWQ joining the Incubator and will be licensed under the > Apache License v2. >=20 > =3D=3D Source and Intellectual Property Submission Plan =3D=3D > As soon as HAWQ is approved to join the Incubator, the source code > will be transitioned via an exhibit to Pivotal's current Software > Grant Agreement onto ASF infrastructure and in turn made available > under the Apache License, version 2.0. We know of no legal > encumberments that would inhibit the transfer of source code to the > ASF. >=20 > =3D=3D External Dependencies =3D=3D >=20 > Runtime dependencies: > * gimli (BSD) > * openldap (The OpenLDAP Public License) > * openssl (OpenSSL License and the Original SSLeay License, BSD style) > * proj (MIT) > * yaml (Creative Commons Attribution 2.0 License) > * python (Python Software Foundation License Version 2) > * apr-util (Apache Version 2.0) > * bzip2 (BSD-style License) > * curl (MIT/X Derivate License) > * gperf (GPL Version 3) > * protobuf (Google) > * libevent (BSD) > * json-c (https://github.com/json-c/json-c/blob/master/COPYING) > * krb5 (MIT) > * pcre (BSD) > * libedit (BSD) > * libxml2 (MIT) > * zlib (Permissive Free Software License) > * libgsasl (LGPL Version 2.1) > * thrift (Apache Version 2.0) > * snappy (Apache Version 2.0 (up to 1.0.1)/New BSD) > * libuuid-2.26 (LGPL Version 2) > * apache hadoop (Apache Version 2.0) > * apache avro (Apache Version 2.0) > * glog (BSD) > * googlemock (BSD) >=20 > Build only dependencies: > * ant (Apache Version 2.0) > * maven (Apache Version 2.0) > * cmake (BSD) >=20 > Test only dependencies: > * googletest (BSD) >=20 > Cryptography N/A >=20 > =3D=3D Required Resources =3D=3D >=20 > =3D=3D=3D Mailing lists =3D=3D=3D > * private@hawq.incubator.apache.org (moderated subscriptions) > * commits@hawq.incubator.apache.org > * dev@hawq.incubator.apache.org > * issues@hawq.incubator.apache.org > * user@hawq.incubator.apache.org >=20 > =3D=3D=3D Git Repository =3D=3D=3D > https://git-wip-us.apache.org/repos/asf/incubator-hawq.git >=20 > =3D=3D=3D Issue Tracking =3D=3D=3D > JIRA Project HAWQ (HAWQ) >=20 > =3D=3D=3D Other Resources =3D=3D=3D >=20 > Means of setting up regular builds for HAWQ on builds.apache.org will > require integration with Docker support. >=20 > =3D=3D Initial Committers =3D=3D > * Lirong Jian > * Hubert Huan Zhang > * Radar Da Lei > * Ivan Yanqing Weng > * Zhanwei Wang > * Yi Jin > * Lili Ma > * Jiali Yao > * Zhenglin Tao > * Ruilong Huo > * Ming Li > * Wen Lin > * Lei Chang > * Alexander V Denissov > * Newton Alex > * Oleksandr Diachenko > * Jun Aoki > * Bhuvnesh Chaudhary > * Vineet Goel > * Shivram Mani > * Noa Horn > * Sujeet S Varakhedi > * Junwei (Jimmy) Da > * Ting (Goden) Yao > * Mohammad F (Foyzur) Rahman > * Entong Shen > * George C Caragea > * Amr El-Helw > * Mohamed F Soliman > * Venkatesh (Venky) Raghavan > * Carlos Garcia > * Zixi (Jesse) Zhang > * Michael P Schubert > * C.J. Jameson > * Jacob Frank > * Ben Calegari > * Shoabe Shariff > * Rob Day-Reynolds > * Mel S Kiyama > * Charles Alan Litzell > * David Yozie > * Ed Espino > * Caleb Welton > * Parham Parvizi > * Dan Baskette > * Christian Tzolov > * Tushar Pednekar > * Greg Chase > * Chloe Jackson > * Michael Nixon > * Roman Shaposhnik > * Alan Gates > * Owen O'Malley > * Thejas Nair > * Don Bosco Durai > * Konstantin Boudnik > * Sergey Soldatov > * Atri Sharma >=20 > =3D=3D Affiliations =3D=3D > * Barclays: Atri Sharma > * Bloomberg: Justin Erenkrantz > * Hortonworks: Alan Gates, Owen O'Malley, Thejas Nair, Don Bosco Durai > * WANDisco: Konstantin Boudnik, Sergey Soldatov > * Pivotal: everyone else on this proposal >=20 > =3D=3D Sponsors =3D=3D >=20 > =3D=3D=3D Champion =3D=3D=3D > Roman Shaposhnik >=20 > =3D=3D=3D Nominated Mentors =3D=3D=3D >=20 > The initial mentors are listed below: > * Alan Gates - Apache Member, Hortonworks > * Owen O'Malley - Apache Member, Hortonworks > * Thejas Nair - Apache Member, Hortonworks > * Konstantin Boudnik - Apache Member, WANDisco > * Roman Shaposhnik - Apache Member, Pivotal > * Justin Erenkrantz - Apache Member, Bloomberg >=20 > =3D=3D=3D Sponsoring Entity =3D=3D=3D > We would like to propose Apache incubator to sponsor this project. >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org