incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Bosco Durai <bo...@apache.org>
Subject Re: [VOTE] Accept HAWQ into the Apache Incubator
Date Tue, 01 Sep 2015 03:08:53 GMT
+1 (non-binding)

Bosco

On 8/31/15, 7:55 PM, "Thejas Nair" <thejas.nair@gmail.com> wrote:

>+1
>
>On Mon, Aug 31, 2015 at 7:01 PM, Luke Han <luke.hq@gmail.com> wrote:
>> +1 (non-binding)
>>
>>
>> Best Regards!
>> ---------------------
>>
>> Luke Han
>>
>> On Tue, Sep 1, 2015 at 9:41 AM, Chris Douglas <cdouglas@apache.org>
>>wrote:
>>
>>> +1 -C
>>>
>>> On Mon, Aug 31, 2015 at 11:47 AM, Roman Shaposhnik <rvs@apache.org>
>>>wrote:
>>> > Following the discussion earlier:
>>> >    http://s.apache.org/Gaf
>>> >
>>> > I would like to call a VOTE for accepting HAWQ
>>> > as a new incubator project.
>>> >
>>> > The proposal is available at:
>>> >     https://wiki.apache.org/incubator/HAWQProposal
>>> > and is also included at the bottom of this email.
>>> >
>>> > Vote is open until at least Thu, 3 September 2015, 23:59:00 PST
>>> >
>>> >  [ ] +1 accept HAWQ into the Apache Incubator
>>> >  [ ] ±0
>>> >  [ ] -1 because...
>>> >
>>> > Thanks,
>>> > Roman.
>>> >
>>> > == Abstract ==
>>> >
>>> > HAWQ is an advanced enterprise SQL on Hadoop analytic engine built
>>> > around a robust and high-performance massively-parallel processing
>>> > (MPP) SQL framework evolved from Pivotal Greenplum Database?.
>>> >
>>> > HAWQ runs natively on Apache Hadoop? clusters by tightly integrating
>>> > with HDFS and YARN. HAWQ supports multiple Hadoop file formats such
>>>as
>>> > Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and
>>> > managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL
>>> > compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP
>>> > extensions) and supports open database connectivity (ODBC) and Java
>>> > database connectivity (JDBC), as well. Most business intelligence,
>>> > data analysis and data visualization tools work with HAWQ out of the
>>> > box without the need for specialized drivers.
>>> >
>>> > A unique aspect of HAWQ is its integration of statistical and machine
>>> > learning capabilities that can be natively invoked from SQL or (in
>>>the
>>> > context of PL/Python, PL/Java or PL/R) in massively parallel modes
>>>and
>>> > applied to large data sets across a Hadoop cluster. These
>>>capabilities
>>> > are provided through MADlib ­ an existing open source, parallel
>>> > machine-learning library. Given the close ties between the two
>>> > development communities, the MADlib community has expressed interest
>>> > in joining HAWQ on its journey into the ASF Incubator and will be
>>> > submitting a separate, concurrent proposal.
>>> >
>>> > HAWQ will provide more robust and higher performing options for
>>>Hadoop
>>> > environments that demand best-in-class data analytics for business
>>> > critical purposes. HAWQ is implemented in C and C++.
>>> >
>>> > HAWQ has a few runtime dependencies licensed under the Cat X list:
>>> >   * gperf (GPL Version 3)
>>> >   * libgsasl (LGPL Version 2.1)
>>> >   * libuuid-2.26 (LGPL Version 2)
>>> > However, given the runtime (dynamic linking) nature of these
>>> > dependencies it doesn't represent a problem for HAWQ to be considered
>>> > an ASF project.
>>> >
>>> > == Proposal ==
>>> > The goal of this proposal is to bring the core of Pivotal Software,
>>> > Inc.’s (Pivotal) Pivotal HAWQ? codebase into the Apache Software
>>> > Foundation (ASF) in order to build a vibrant, diverse and
>>> > self-governed open source community around the technology. Pivotal
>>>has
>>> > agreed to transfer the brand name "HAWQ" to Apache Software
>>>Foundation
>>> > and will stop using HAWQ to refer to this software if the project
>>>gets
>>> > accepted into the ASF Incubator under the name of "Apache HAWQ
>>> > (incubating)". Pivotal will continue to market and sell an analytic
>>> > engine product that includes Apache HAWQ (incubating). While HAWQ is
>>> > our primary choice for a name of the project, in anticipation of any
>>> > potential issues with PODLINGNAMESEARCH we have come up with two
>>> > alternative names: (1) Hornet; or (2) Grove.
>>> >
>>> > Pivotal is submitting this proposal to donate the HAWQ source code
>>>and
>>> > associated artifacts (documentation, web site content, wiki, etc.) to
>>> > the Apache Software Foundation Incubator under the Apache License,
>>> > Version 2.0 and is asking Incubator PMC to establish an open source
>>> > community.
>>> >
>>> > == Background ==
>>> > While the ecosystem of open source SQL-on-Hadoop solutions is fairly
>>> > developed by now, HAWQ has several unique features that will set it
>>> > apart from existing ASF and non-ASF projects. HAWQ made its debut in
>>> > 2013 as a closed source product leveraging a decade's worth of
>>>product
>>> > development effort invested in Greenplum Database?. Since then HAWQ
>>> > has rapidly gained a solid customer base and became available on
>>> > non-Pivotal distributions of Hadoop.
>>> > In 2015 HAWQ still leverages the rock solid foundation of Greenplum
>>> > Database, while at the same time embracing elasticity and resource
>>> > management native to Hadoop applications. This allows HAWQ to provide
>>> > superior SQL on Hadoop performance, scalability and coverage while
>>> > also providing massively-parallel machine learning capabilities and
>>> > support for native Hadoop file formats. In addition, HAWQ's advanced
>>> > features include support for complex joins, rich and compliant SQL
>>> > dialect and industry-differentiating data federation capabilities.
>>> > Dynamic pipelining and pluggable query optimizer architecture enable
>>> > HAWQ to perform queries on Hadoop with the speed and scalability
>>> > required for enterprise data warehouse (EDW) workloads. HAWQ provides
>>> > strong support for low-latency analytic SQL queries, coupled with
>>> > massively parallel machine learning capabilities. This enables
>>> > discovery-based analysis of large data sets and rapid, iterative
>>> > development of data analytics applications that apply deep machine
>>> > learning ­ significantly shortening data-driven innovation cycles for
>>> > the enterprise.
>>> >
>>> > Hundreds of companies and thousands of servers are running
>>> > mission-critical applications today on HAWQ managing over PBs of
>>>data.
>>> >
>>> > == Rationale ==
>>> > Hadoop and HDFS-based data management architectures continue their
>>> > expansion into the enterprise. As the amount of data stored on Hadoop
>>> > clusters grows, unlocking the analytics capabilities and
>>>democratizing
>>> > access to that treasure trove of data becomes one of the key
>>>concerns.
>>> > While Hadoop has no shortage of purposefully designed analytical
>>> > frameworks, the easiest and most cost-effective way to onboard the
>>> > largest amount of data consumers is provided by offering SQL APIs for
>>> > data retrieval at scale. Of course, given the high velocity of
>>> > innovation happening in the underlying Hadoop ecosystem, any
>>> > SQL-on-Hadoop solution has to keep up with the community. We strongly
>>> > believe that in the Big Data space, this can be optimally achieved
>>> > through a vibrant, diverse, self-governed community collectively
>>> > innovating around a single codebase while at the same time
>>> > cross-pollinating with various other data management communities.
>>> > Apache Software Foundation is the ideal place to meet those ambitious
>>> > goals. We also believe that our initial experience of bringing
>>>Pivotal
>>> > Gemfire? into ASF as Apache Geode (incubating) could be leveraged
>>>thus
>>> > improving the chances of HAWQ becoming a vibrant Apache community.
>>> >
>>> > == Initial Goals ==
>>> > Our initial goals are to bring HAWQ into the ASF, transition internal
>>> > engineering processes into the open, and foster a collaborative
>>> > development model according to the "Apache Way." Pivotal and its
>>> > partners plan to develop new functionality in an open,
>>> > community-driven way. To get there, the existing internal build, test
>>> > and release processes will be refactored to support open development.
>>> >
>>> > == Current Status ==
>>> > Currently, the project code base is commercially licensed and is not
>>> > available to the general public. The documentation and wiki pages are
>>> > available at FIXME. Although Pivotal HAWQ was developed as a
>>> > proprietary, closed-source product, its roots are in the PostgreSQL
>>> > community and the internal engineering practices adopted by the
>>> > development team lend themselves well to an open, collaborative and
>>> > meritocratic environment.
>>> >
>>> > The Pivotal HAWQ team has always focused on building a robust end
>>>user
>>> > community of paying and non-paying customers. The existing
>>> > documentation along with StackOverflow and other similar forums are
>>> > expected to facilitate conversions between our existing users so as
>>>to
>>> > transform them into an active community of HAWQ members, stakeholders
>>> > and developers.
>>> >
>>> > === Meritocracy ===
>>> > Our proposed list of initial committers include the current HAWQ R&D
>>> > team, Pivotal Field Engineers, and several existing partners. This
>>> > group will form a base for the broader community we will invite to
>>> > collaborate on the codebase. We intend to radically expand the
>>>initial
>>> > developer and user community by running the project in accordance
>>>with
>>> > the "Apache Way". Users and new contributors will be treated with
>>> > respect and welcomed. By participating in the community and providing
>>> > quality patches/support that move the project forward, contributors
>>> > will earn merit. They also will be encouraged to provide non-code
>>> > contributions (documentation, events, community management, etc.) and
>>> > will gain merit for doing so. Those with a proven support and quality
>>> > track record will be encouraged to become committers.
>>> >
>>> > === Community ===
>>> > If HAWQ is accepted for incubation, the primary initial goal will be
>>> > transitioning the core community towards embracing the Apache Way of
>>> > project governance. We would solicit major existing contributors to
>>> > become committers on the project from the start.
>>> >
>>> > === Core Developers ===
>>> >
>>> > A few of HAWQ's core developers are skilled in working as part of
>>> > openly governed Apache communities (mainly around Hadoop ecosystem).
>>> > That said, most of the core developers are currently NOT affiliated
>>> > with the ASF and would require new ICLAs before committing to the
>>> > project.
>>> >
>>> > === Alignment ===
>>> > The following existing ASF projects can be considered when reviewing
>>> > HAWQ proposal:
>>> >
>>> > Apache Hadoop is a distributed storage and processing framework for
>>> > very large datasets, focusing primarily on batch processing for
>>> > analytic purposes. HAWQ builds on top of two key pieces of Hadoop:
>>> > YARN and HDFS. HAWQ's community roadmap includes plans for
>>> > contributing Hadoop around HDFS features and increasing support for C
>>> > and C++ clients.
>>> >
>>> > Apache Spark™ is a fast engine for processing large datasets,
>>> > typically from a Hadoop cluster, and performing batch, streaming,
>>> > interactive, or machine learning workloads.  Recently, Apache Spark
>>> > has embraced SQL-like APIs around DataFrames at its core. Because of
>>> > that we would expect a level of collaboration between the two
>>>projects
>>> > when it comes to query optimization and exposing HAWQ tables to Spark
>>> > analytical pipelines.
>>> >
>>> > Apache Hive™ is a data warehouse software that facilitates querying
>>> > and managing large datasets residing in distributed storage. Hive
>>> > provides a mechanism to project structure onto this data and query
>>>the
>>> > data using a SQL-like language called HiveQL. Hive is also providing
>>> > HCatalog capabilities as table and storage management layer for
>>> > Hadoop, enabling users with different data processing tools to more
>>> > easily define structure for the data on the grid. Currently the core
>>> > Hive and HAWQ are viewed as complimentary solutions, but we expect
>>> > close integration with HCatalog given its dominant position for
>>> > metadata management on the Hadoop clusters.
>>> >
>>> > Apache Phoenix is a high performance relational database layer over
>>> > HBase for low latency applications. Given Phoenix's exclusive focus
>>>on
>>> > HBase for its data management backend and its overall architecture
>>> > around HBase's co-processors, it is unlikely that there will be much
>>> > collaboration between the two projects.
>>> >
>>> > == Known Risks ==
>>> > Development has been sponsored mostly by a single company (or its
>>> > predecessors) thus far and coordinated mainly by the core Pivotal
>>>HAWQ
>>> > team.
>>> >
>>> > For the project to fully transition to the Apache Way governance
>>> > model, development must shift towards the meritocracy-centric model
>>>of
>>> > growing a community of contributors balanced with the needs for
>>> > extreme stability and core implementation coherency.
>>> >
>>> > The tools and development practices in place for the Pivotal HAWQ
>>> > product are compatible with the ASF infrastructure and thus we do not
>>> > anticipate any on-boarding pains.
>>> >
>>> > The project currently includes a modified version of PostgreSQL 8.3
>>> > source code. Given the ASF's position that the PostgreSQL License is
>>> > compatible with the Apache License version 2.0, we do NOT anticipate
>>> > any issues with licensing the code base. However, any new
>>>capabilities
>>> > developed by the HAWQ team once part of the ASF would need to be
>>> > consumed by the PostgreSQL community under the Apache License version
>>> > 2.0.
>>> >
>>> > === Orphaned products ===
>>> > Pivotal is fully committed to maintaining its position as one of the
>>> > leading providers of SQL-on-Hadoop solutions and the corresponding
>>> > Pivotal commercial product will continue to be based on the HAWQ
>>> > project. Moreover, Pivotal has a vested interest in making HAWQ
>>> > successful by driving its close integration with both existing
>>> > projects contributed by Pivotal including Apache Geode (incubating)
>>> > and MADlib (which is requesting Incubation), and sister ASF projects.
>>> > We expect this to further reduces the risk of orphaning the product.
>>> >
>>> > === Inexperience with Open Source ===
>>> > Pivotal has embraced open source software since its formation by
>>> > employing contributors/committers and by shepherding open source
>>> > projects like Cloud Foundry, Spring, RabbitMQ and MADlib. Individuals
>>> > working at Pivotal have experience with the formation of vibrant
>>> > communities around open technologies with the Cloud Foundry
>>> > Foundation, and continuing with the creation of a community around
>>> > Apache Geode (incubating).  Although some of the initial committers
>>> > have not had the experience of developing entirely open source,
>>> > community-driven projects, we expect to bring to bear the open
>>> > development practices that have proven successful on longstanding
>>> > Pivotal open source projects to the HAWQ community.  Additionally,
>>> > several ASF veterans have agreed to mentor the project and are listed
>>> > in this proposal. The project will rely on their collective guidance
>>> > and wisdom to quickly transition the entire team of initial
>>>committers
>>> > towards practicing the Apache Way.
>>> >
>>> > === Homogeneous Developers ===
>>> > While most of the initial committers are employed by Pivotal, we have
>>> > already seen a healthy level of interest from existing customers and
>>> > partners. We intend to convert that interest directly into
>>> > participation and will be investing in activities to recruit
>>> > additional committers from other companies.
>>> >
>>> > === Reliance on Salaried Developers ===
>>> > Most of the contributors are paid to work in the Big Data space.
>>>While
>>> > they might wander from their current employers, they are unlikely to
>>> > venture far from their core expertise and thus will continue to be
>>> > engaged with the project regardless of their current employers.
>>> >
>>> > === Relationships with Other Apache Products ===
>>> > As mentioned in the Alignment section, HAWQ may consider various
>>> > degrees of integration and code exchange with Apache Hadoop, Apache
>>> > Spark and Apache Hive projects. We expect integration points to be
>>> > inside and outside the project. We look forward to collaborating with
>>> > these communities as well as other communities under the Apache
>>> > umbrella.
>>> >
>>> > === An Excessive Fascination with the Apache Brand ===
>>> > While we intend to leverage the Apache ‘branding’ when talking to
>>> > other projects as testament of our project’s ‘neutrality’, we have
no
>>> > plans for making use of Apache brand in press releases nor posting
>>> > billboards advertising acceptance of HAWQ into Apache Incubator.
>>> >
>>> > == Documentation ==
>>> > The documentation is currently available at
>>>http://hawq.docs.pivotal.io/
>>> >
>>> > == Initial Source ==
>>> > Initial source code will be available immediately after Incubator PMC
>>> > approves HAWQ joining the Incubator and will be licensed under the
>>> > Apache License v2.
>>> >
>>> > == Source and Intellectual Property Submission Plan ==
>>> > As soon as HAWQ is approved to join the Incubator, the source code
>>> > will be transitioned via an exhibit to Pivotal's current Software
>>> > Grant Agreement onto ASF infrastructure and in turn made available
>>> > under the Apache License, version 2.0.  We know of no legal
>>> > encumberments that would inhibit the transfer of source code to the
>>> > ASF.
>>> >
>>> > == External Dependencies ==
>>> >
>>> > Runtime dependencies:
>>> >   * gimli (BSD)
>>> >   * openldap (The OpenLDAP Public License)
>>> >   * openssl (OpenSSL License and the Original SSLeay License, BSD
>>>style)
>>> >   * proj (MIT)
>>> >   * yaml (Creative Commons Attribution 2.0 License)
>>> >   * python (Python Software Foundation License Version 2)
>>> >   * apr-util (Apache Version 2.0)
>>> >   * bzip2 (BSD-style License)
>>> >   * curl (MIT/X Derivate License)
>>> >   * gperf (GPL Version 3)
>>> >   * protobuf (Google)
>>> >   * libevent (BSD)
>>> >   * json-c (https://github.com/json-c/json-c/blob/master/COPYING)
>>> >   * krb5 (MIT)
>>> >   * pcre (BSD)
>>> >   * libedit (BSD)
>>> >   * libxml2 (MIT)
>>> >   * zlib (Permissive Free Software License)
>>> >   * libgsasl (LGPL Version 2.1)
>>> >   * thrift (Apache Version 2.0)
>>> >   * snappy (Apache Version 2.0 (up to 1.0.1)/New BSD)
>>> >   * libuuid-2.26 (LGPL Version 2)
>>> >   * apache hadoop (Apache Version 2.0)
>>> >   * apache avro (Apache Version 2.0)
>>> >   * glog (BSD)
>>> >   * googlemock (BSD)
>>> >
>>> > Build only dependencies:
>>> >   * ant (Apache Version 2.0)
>>> >   * maven (Apache Version 2.0)
>>> >   * cmake (BSD)
>>> >
>>> > Test only dependencies:
>>> >   * googletest (BSD)
>>> >
>>> > Cryptography N/A
>>> >
>>> > == Required Resources ==
>>> >
>>> > === Mailing lists ===
>>> >   * private@hawq.incubator.apache.org (moderated subscriptions)
>>> >   * commits@hawq.incubator.apache.org
>>> >   * dev@hawq.incubator.apache.org
>>> >   * issues@hawq.incubator.apache.org
>>> >   * user@hawq.incubator.apache.org
>>> >
>>> > === Git Repository ===
>>> > https://git-wip-us.apache.org/repos/asf/incubator-hawq.git
>>> >
>>> > === Issue Tracking ===
>>> > JIRA Project HAWQ (HAWQ)
>>> >
>>> > === Other Resources ===
>>> >
>>> > Means of setting up regular builds for HAWQ on builds.apache.org will
>>> > require integration with Docker support.
>>> >
>>> > == Initial Committers ==
>>> >   * Lirong Jian
>>> >   * Hubert Huan Zhang
>>> >   * Radar Da Lei
>>> >   * Ivan Yanqing Weng
>>> >   * Zhanwei Wang
>>> >   * Yi Jin
>>> >   * Lili Ma
>>> >   * Jiali Yao
>>> >   * Zhenglin Tao
>>> >   * Ruilong Huo
>>> >   * Ming Li
>>> >   * Wen Lin
>>> >   * Lei Chang
>>> >   * Alexander V Denissov
>>> >   * Newton Alex
>>> >   * Oleksandr Diachenko
>>> >   * Jun Aoki
>>> >   * Bhuvnesh Chaudhary
>>> >   * Vineet Goel
>>> >   * Shivram Mani
>>> >   * Noa Horn
>>> >   * Sujeet S Varakhedi
>>> >   * Junwei (Jimmy) Da
>>> >   * Ting (Goden) Yao
>>> >   * Mohammad F (Foyzur) Rahman
>>> >   * Entong Shen
>>> >   * George C Caragea
>>> >   * Amr El-Helw
>>> >   * Mohamed F Soliman
>>> >   * Venkatesh (Venky) Raghavan
>>> >   * Carlos Garcia
>>> >   * Zixi (Jesse) Zhang
>>> >   * Michael P Schubert
>>> >   * C.J. Jameson
>>> >   * Jacob Frank
>>> >   * Ben Calegari
>>> >   * Shoabe Shariff
>>> >   * Rob Day-Reynolds
>>> >   * Mel S Kiyama
>>> >   * Charles Alan Litzell
>>> >   * David Yozie
>>> >   * Ed Espino
>>> >   * Caleb Welton
>>> >   * Parham Parvizi
>>> >   * Dan Baskette
>>> >   * Christian Tzolov
>>> >   * Tushar Pednekar
>>> >   * Greg Chase
>>> >   * Chloe Jackson
>>> >   * Michael Nixon
>>> >   * Roman Shaposhnik
>>> >   * Alan Gates
>>> >   * Owen O'Malley
>>> >   * Thejas Nair
>>> >   * Don Bosco Durai
>>> >   * Konstantin Boudnik
>>> >   * Sergey Soldatov
>>> >   * Atri Sharma
>>> >
>>> > == Affiliations ==
>>> >   * Barclays:  Atri Sharma
>>> >   * Bloomberg: Justin Erenkrantz
>>> >   * Hortonworks: Alan Gates, Owen O'Malley, Thejas Nair, Don Bosco
>>>Durai
>>> >   * WANDisco: Konstantin Boudnik, Sergey Soldatov
>>> >   * Pivotal: everyone else on this proposal
>>> >
>>> > == Sponsors ==
>>> >
>>> > === Champion ===
>>> > Roman Shaposhnik
>>> >
>>> > === Nominated Mentors ===
>>> >
>>> > The initial mentors are listed below:
>>> >   * Alan Gates - Apache Member, Hortonworks
>>> >   * Owen O'Malley - Apache Member, Hortonworks
>>> >   * Thejas Nair - Apache Member, Hortonworks
>>> >   * Konstantin Boudnik - Apache Member, WANDisco
>>> >   * Roman Shaposhnik - Apache Member, Pivotal
>>> >   * Justin Erenkrantz - Apache Member, Bloomberg
>>> >
>>> > === Sponsoring Entity ===
>>> > We would like to propose Apache incubator to sponsor this project.
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> > For additional commands, e-mail: general-help@incubator.apache.org
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
>>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>For additional commands, e-mail: general-help@incubator.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message