incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Bezzubov <...@apache.org>
Subject Re: [VOTE] Accept Quickstep into the Apache Incubator
Date Sat, 26 Mar 2016 03:40:04 GMT
+1 (non-binding)

--
Alex

On Wed, Mar 23, 2016 at 8:16 AM, Julian Hyde <jhyde@apache.org> wrote:

> +1 (binding)
>
> > On Mar 22, 2016, at 3:00 PM, Chris Douglas <cdouglas@apache.org> wrote:
> >
> > +1 (binding) -C
> >
> > On Tue, Mar 22, 2016 at 2:01 PM, Roman Shaposhnik <rvs@apache.org>
> wrote:
> >> Hi!
> >>
> >> Quickstep proposal was made available for discussion last week
> >>    https://wiki.apache.org/incubator/QuickstepProposal
> >> and the feedback so far seems to be positive.
> >>
> >> Please vote to accept Quickstep into the Apache Incubator.
> >> The vote will be open until Mon 3/28 noon PST.
> >>
> >> [ ] +1 Accept Quickstep into the Apache Incubator
> >> [ ] +0 Abstain
> >> [ ] -1 Don't accept Quickstep into the Apache Incubator because ...
> >>
> >> == Abstract ==
> >>
> >> Quickstep is a high-performance database engine. It is designed to (1)
> >> convert data to insights at bare-metal speed, (2) support multiple
> >> query surfaces including SQL (the first (and current) version only
> >> supports SQL, and (3) deliver bare-metal performance on any hardware
> >> (including running on a laptop, running on a high-end (single node)
> >> server, and running on a distributed cluster). Since its inception,
> >> the project has been planned to deliver a high-performance single node
> >> system first, followed by a distributed system.
> >>
> >> Quickstep is composed of several different modules that handle
> >> different concerns of a database system. The main modules are:
> >>  * Utility - Reusable general-purpose code that is used by many other
> modules.
> >>  * Threading - Provides a cross-platform abstraction for threads and
> >> synchronization primitives that abstract the underlying OS threading
> >> features.
> >>  * Types - The core type system used across all of Quickstep. Handles
> >> details of how SQL types are stored, parsed, serialized &
> >> deserialized, and converted. Also includes basic containers for typed
> >> values (tuples and column-vectors) and low-level operations that apply
> >> to typed values (e.g. basic arithmetic and comparisons).
> >>  * Catalog - Tracks database schema as well as physical storage
> >> information for relations (e.g. which physical blocks store a
> >> relation's data, and any physical partitioning and placement
> >> information).
> >>  * Storage - Physically stores relational data in self-contained,
> >> self-describing blocks, both in-memory and on persistent storage (disk
> >> or a distributed filesystem). Also includes some heavyweight run-time
> >> data structures used in query processing (e.g. hash tables for join
> >> and aggregation). Includes a buffer manager component for managing
> >> memory use and a file manager component that handles data persistence.
> >>  * Compression - Implements ordered dictionary compression. Several
> >> storage formats in the Storage module are capable of storing
> >> compressed column data and evaluating some expressions directly on
> >> compressed data without decompressing. The common code supporting
> >> compression is in this module.
> >>  * Expressions - Builds on the simple operations provided by the
> >> Types module to support arbitrarily complex expressions over data,
> >> including scalar expressions, predicates, and aggregate functions with
> >> and without grouping.
> >>  * Relational Operators - This module provides the building blocks
> >> for queries in Quickstep. A query is represented as a directed acyclic
> >> graph of relational operators, each of which is responsible for
> >> applying some relational-algebraic operation(s) to transform its
> >> input. Operators generate individual self-contained "work orders" that
> >> can be executed independently. Most operators are parallelism-friendly
> >> and generate one work-order per storage block of input.
> >>  * Query Execution - Handles the actual scheduling and execution of
> >> work from a query at runtime. The central class is the Foreman, an
> >> independent thread with a global view of the query plan and progress.
> >> The Foreman dispatches work-orders to stateless Worker threads and
> >> monitors their progress, and also coordinates streaming of partial
> >> results between producers and consumers in a query plan DAG to
> >> maximize parallelism. This module also includes the QueryContext
> >> class, which holds global shared state for an individual query and is
> >> designed to support easy serialization/deserialization for distributed
> >> execution.
> >>  * Parser - A simple SQL lexer and parser that parses SQL syntax into
> >> an abstract syntax tree for consumption by the Query Optimizer.
> >>  * Query Optimizer - Takes the abstract syntax tree generated by the
> >> parser and transforms it into a runable query-plan DAG for the Query
> >> Execution module. The Query Optimizer is responsible for resolving
> >> references to relations and attributes in the query, checking it for
> >> semantic correctness, and applying optimizations (e.g. filter
> >> pushdown, column pruning, join ordering) as part of the transformation
> >> process.
> >>  * Command-Line Interface - An interactive SQL shell interface to
> Quickstep.
> >>
> >> Quickstep is implemented in C++ and does not require many external
> >> libraries to run. Quickstep is currently an open source project
> >> licensed under the Apache License Version 2.0 and governed by a group
> >> of engineers at Pivotal.
> >>
> >> Quickstep began in 2011 as a research project in the Computer Sciences
> >> Department at the University of Wisconsin
> >> https://quickstep.cs.wisc.edu/ and the copyrights underlying the
> >> project was transferred to a company called Quickstep Technologies,
> >> which was acquired by Pivotal in 2015.
> >>
> >> == Proposal ==
> >> The goal of this proposal is to bring an already existing open source
> >> project into the Apache Software Foundation (ASF) family thus
> >> leveraging a very successful “Apache Way” governance model in order to
> >> increase community participation and diversity. We hope that it will
> >> allow us to build a vibrant, diverse and self-governed open source
> >> community around the technology. Pivotal has agreed to transfer the
> >> brand name "Quickstep" to ASF and will stop using Quickstep to refer
> >> to this software if the project gets accepted into the ASF Incubator
> >> under the name of "Apache Quickstep (incubating)". Pivotal may market
> >> and sell products that include Apache Quickstep (incubating) under a
> >> different brand name, but no determination has been made regarding
> >> that. While Quickstep is our primary choice for a name of the project,
> >> in anticipation of any potential issues with PODLINGNAMESEARCH we have
> >> come up with two alternative names: (1) Bolero or (2) Hustle.
> >>
> >> Pivotal is submitting this proposal to transfer the Quickstep source
> >> code and associated artifacts (documentation, web site content, wiki,
> >> etc.) from its current Github location to the ASF Incubator under the
> >> Apache License, Version 2.0 and is asking the Incubator PMC to
> >> establish an open source community.
> >>
> >> == Background ==
> >>
> >> Quickstep is a next-generation relational data processing kernel
> >> currently being developed as a collaboration between the academic
> >> community and Pivotal. Quickstep aims to deliver efficient and
> >> sustainable data processing performance on current and future hardware
> >> by using a hardware-software co-design philosophy.
> >>
> >> For the hardware available today, this means effectively exploiting
> >> large main memories, fast on-die CPU caches, highly parallel
> >> multi-core CPUs, and NVRAM storage technologies.
> >>
> >> For the hardware available in the future, the project aims to
> >> co-design hardware and software primitives that will allow data
> >> processing kernels to work on increasing amounts of data economically
> >> -- both from the raw performance perspective, and from the perspective
> >> of the energy consumed by data processing kernels.
> >>
> >> == Rationale ==
> >>
> >> In the past decade, ASF has established itself as one of the
> >> quintessential sources of innovation in data management and data
> >> processing frameworks. At the same time, there is a clear need for a
> >> modern, flexible framework capable of exploiting the hardware
> >> characteristics of today and make it available as a set of building
> >> blocks to as wide a community of developers as possible. We strongly
> >> believe that Quickstep technology can benefit a broader ecosystem of
> >> database developers and researchers but this "world domination" needs
> >> to be achieved through a vibrant, diverse, self-governed community
> >> collectively innovating around a single codebase while at the same
> >> time cross-pollinating with various other data management communities.
> >> ASF is the ideal place to meet those ambitious goals. We also believe
> >> that our experience bringing various Pivotal data products into ASF
> >> family - including Apache Geode (incubating), Apache HAWQ (incubating)
> >> and Apache MADlib (incubating) can be leveraged to make the Quickstep
> >> transition a success, thus improving the chances of it becoming a
> >> truly vibrant Apache community.
> >>
> >> == Initial Goals ==
> >>
> >> Our initial goals are to bring Quickstep into ASF, transition internal
> >> engineering processes into the open, and foster a collaborative
> >> development model according to the "Apache Way." Pivotal and its
> >> academic partners plan to develop new functionality in an open,
> >> community-driven way. To get there, the existing internal build, test
> >> and release processes will be refactored to support open development.
> >>
> >> == Current Status ==
> >>
> >> Currently, the project code base is licensed under the Apache License
> >> v.2 and is available in a GitHub repository
> >> https://github.com/pivotalsoftware/quickstep . The documentation and
> >> wiki pages are available at same repository. Throughout its history
> >> Quickstep was developed in a hybrid closed/opens source mode but it
> >> has its roots in open source database management communities. The
> >> internal engineering practices adopted by the development team lend
> >> themselves well to an open, collaborative and meritocratic
> >> environment.
> >>
> >> The Quickstep team has always focused on building a robust end user
> >> community of researchers. The existing documentation along with
> >> various publications are expected to facilitate conversions between
> >> our existing users so as to transform them into an active community of
> >> Quickstep members, stakeholders and developers.
> >>
> >> == Meritocracy ==
> >>
> >> Our proposed list of initial committers include the current Quickstep
> >> R&D team and several existing academic partners. This group will form
> >> a base for the broader community we will invite to collaborate on the
> >> codebase. We intend to radically expand the initial developer and user
> >> community by running the project in accordance with the "Apache Way".
> >> Users and new contributors will be treated with respect and welcomed.
> >> By participating in the community and providing quality
> >> patches/support that move the project forward, contributors will earn
> >> merit. They also will be encouraged to provide non-code contributions
> >> (documentation, events, community management, etc.) and will gain
> >> merit for doing so. Those with a proven support and quality track
> >> record will be encouraged to become committers.
> >>
> >> == Community ==
> >>
> >> If Quickstep is accepted for incubation, the primary initial goal will
> >> be transitioning the core community towards embracing the Apache Way
> >> of project governance. We would solicit major existing contributors to
> >> become committers on the project from the start.
> >>
> >> == Core Developers ==
> >> A small percentage of Quickstep core developers are skilled in working
> >> as part of openly governed Apache communities (mainly around the
> >> Hadoop ecosystem). That said, most of the core developers are
> >> currently NOT affiliated with the ASF and would require new ICLAs
> >> before committing to the project.
> >>
> >> == Alignment ==
> >> The following existing ASF projects can be considered when reviewing
> >> the Quickstep proposal:
> >>  * Apache Hive: Potential alignment here is to consider a version of
> >> Hive that run on the Quickstep executor.
> >>  * Apache HAWQ (incubating): Potential alignment here is to consider
> >> exchanging ideas and/or code for execution across both systems.
> >>  * Apache YARN: Work has started on a distributed version of
> >> Quickstep, and its current path is to run as a YARN application.
> >>  * Apache Mesos: Potential alignment here is for Quickstep to run in
> >> Apache Mesos.
> >>
> >> == Known Risks ==
> >> Development has been done mostly by a tightly knit group of University
> >> of Wisconsin researchers and later was sponsored mostly by a single
> >> company (Pivotal) thus far and coordinated mainly by the core
> >> Quickstep team. The Quickstep team now spans Pivotal and the
> >> University of Wisconsin.
> >>
> >> For the project to fully transition to the Apache Way governance
> >> model, development must shift towards the meritocracy-centric model of
> >> growing a community of contributors balanced with the needs for
> >> extreme stability and core implementation coherency. The tools and
> >> development practices in place for the Quickstep product are
> >> compatible with the ASF infrastructure and thus we do not anticipate
> >> any on-boarding pains.
> >>
> >> The project went through a very thorough vetting as part of Pivotal
> >> open sourcing it under the  Apache License v. 2.0 only a few month
> >> ago. This gives us reasonable confidence to conclude that the code
> >> base is clean and free from IP complications.
> >> Orphaned products
> >> Pivotal is fully committed to maintaining its position as one of the
> >> leading providers of database management and data processing solutions
> >> and the corresponding Pivotal commercial product will continue to be
> >> developed around the Quickstep project.
> >>
> >> Moreover, Pivotal has a vested interest in making Quickstep successful
> >> by driving its close integration with both existing projects
> >> contributed to open source by Pivotal including Apache HAWQ
> >> (incubating) and Greenplum Database, and sister ASF projects. We
> >> expect this to further reduce the risk of orphaning the product.
> >>
> >> == Inexperience with Open Source ==
> >> Pivotal has embraced open source software since its formation by
> >> employing contributors/committers and by shepherding open source
> >> projects like Cloud Foundry, Spring, RabbitMQ and MADlib. Individuals
> >> working at Pivotal have experience with the formation of vibrant
> >> communities around open technologies with the Cloud Foundry
> >> Foundation, and continuing with the creation of a community around
> >> Apache Geode (incubating), Apache HAWQ (incubating) and Apache MADlib
> >> (incubating). Although some of the initial committers have not had the
> >> experience of developing entirely open source, community-driven
> >> projects, we expect to bring to bear the open development practices
> >> that have proven successful on longstanding Pivotal open source
> >> projects to the Quickstep community. Additionally, several ASF
> >> veterans have agreed to mentor the project and are listed in this
> >> proposal. The project will rely on their collective guidance and
> >> wisdom to quickly transition the entire team of initial committers
> >> towards practicing the Apache Way.
> >>
> >> == Homogeneous Developers ==
> >> While many of the initial committers are employed by Pivotal or at the
> >> University of Wisconsin, we have already seen a healthy level of
> >> interest from existing customers and partners. We intend to convert
> >> that interest directly into participation and will be investing in
> >> activities to recruit additional committers from other companies.
> >>
> >> == Reliance on Salaried Developers ==
> >> Many of the contributors are paid to work in the Big Data and data
> >> processing space and nearly all are committed to a career in that
> >> space. While they might wander from their current employers, they are
> >> unlikely to venture far from their core expertise and thus will
> >> continue to be engaged with the project regardless of their current
> >> employers.
> >>
> >> == Relationships with Other Apache Products ==
> >> As mentioned in the Alignment section, Quickstep may consider various
> >> degrees of integration and code exchange with Apache Hive, Apache HAWQ
> >> (incubating), Apache YARN and Apache Mesos.
> >>
> >> == An Excessive Fascination with the Apache Brand ==
> >> While we intend to leverage the Apache ‘branding’ when talking to
> >> other projects as testament of our project’s ‘neutrality’, we have no
> >> plans for making use of Apache brand in press releases nor posting
> >> billboards advertising acceptance of Quickstep into Apache Incubator.
> >>
> >> == Documentation ==
> >> The documentation is currently available at
> http://quickstep.cs.wisc.edu/
> >>
> >> == Initial Source ==
> >> Initial source code is currently licensed under Apache License v.2 and
> >> is available at https://github.com/pivotalsoftware/quickstep.
> >>
> >> == Source and Intellectual Property Submission Plan ==
> >> As soon as Quickstep is approved to join the Incubator, the source
> >> code will be transitioned via an exhibit to Pivotal's current Software
> >> Grant Agreement onto ASF infrastructure. We know of no legal
> >> encumbrances inhibiting the transfer of source code to the ASF.
> >>
> >> == External Dependencies ==
> >>
> >> Runtime dependencies:
> >> * farmhash: https://github.com/google/farmhash [License: MIT]
> >> * gflags: https://github.com/gflags/gflags [License: BSD]
> >> * glog: https://github.com/google/glog [License: BSD]
> >> * gperftools: https://github.com/gperftools/gperftools [License: BSD]
> >> * linenoise: https://github.com/antirez/linenoise [License: BSD
> 2-Clause]
> >> * protobuf: https://github.com/google/protobuf [License: BSD]
> >>
> >> Build only dependencies:
> >> * cmake: https://cmake.org/ [License: BSD]
> >> * bison: https://www.gnu.org/software/bison/ [License: GPL with
> >> exception for generated parsers]
> >> * flex: http://flex.sourceforge.net [License: BSD]
> >>
> >> Test only dependencies:
> >> * benchmark: https://github.com/google/benchmark [License: Apache 2.0]
> >> * cpplint: https://github.com/google/styleguide [License: BSD]
> >> * gtest: https://github.com/google/googletest [License: BSD]
> >> * iwyu: http://include-what-you-use.org/ [License: UIUC BSD-Like]
> >>
> >> Cryptography: N/A
> >>
> >> == Required Resources ==
> >>
> >> === Mailing lists ===
> >>  * private@quickstep.incubator.apache.org (moderated subscriptions)
> >>  * commits@quickstep.incubator.apache.org
> >>  * dev@quickstep.incubator.apache.org
> >>  * issues@quickstep.incubator.apache.org
> >>  * user@quickstep.incubator.apache.org
> >>
> >> === Git Repository ===
> >>  https://git-wip-us.apache.org/repos/asf/incubator-quickstep.git
> >>
> >> === Issue Tracking ===
> >>
> >> JIRA Project QUICKSTEP (QUICKSTEP)
> >>
> >> === Other Resources ===
> >> Means of setting up regular builds for Quickstep on builds.apache.org
> >> will require integration with Docker support.
> >>
> >> == Initial Committers ==
> >> * Jignesh M. Patel
> >> * Harshad Deshmukh
> >> * Jianqiao Zhu
> >> * Zuyu Zhang
> >> * Marc Spehlmann
> >> * Saket Saurabh
> >> * Hakan Memisoglu
> >> * Rogers Jeffrey Leo John
> >> * Adalbert Gerald Soosai Raj
> >> * Udip Pant
> >> * Siddharth Suresh
> >> * Rathijit Sen
> >> * Craig Chasseur
> >> * Qiang Zeng
> >> * Shoban Chandrabose
> >> * Navneet Potti
> >> * Yinan Li
> >> * Sangmin Shin
> >> * James Paton
> >> * Shixuan Fan
> >> * Roman Shaposhnik
> >> * Konstantin Boudnik
> >> * Julian Hyde
> >> * Dhruba Borthakur
> >>
> >> == Affiliations ==
> >> * Pivotal: Jignesh M. Patel, Zuyu Zhang, Roman Shaposhnik
> >> * Google: Craig Chasseur
> >> * Facebook: James Paton, Dhruba Borthakur
> >> * Pinterest: Sangmin Shin
> >> * Microsoft: Yinan Li
> >> * Hortonworks: Julian Hyde
> >> * Memcore: Konstantin Boudnik
> >> * University of Wisconsin (and supported in part by Pivotal): Everyone
> else
> >>
> >> == Sponsors ==
> >>
> >> === Champion ===
> >> Roman Shaposhnik
> >>
> >> === Nominated Mentors ===
> >> The initial mentors are listed below:
> >> * Konstantin Boudnik - Apache Member, Memcore
> >> * Roman Shaposhnik - Apache Member, Pivotal
> >> * Julian Hyde, IPMC Member, Hortonworks
> >>
> >> === Sponsoring Entity ===
> >> We would like to propose Apache incubator to sponsor this project.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: general-help@incubator.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message