incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Incubator Wiki] Update of "QuickstepProposal" by RomanShaposhnik
Date Tue, 15 Mar 2016 23:49:18 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "QuickstepProposal" page has been changed by RomanShaposhnik:

New page:
== Abstract ==

Quickstep is a high-performance database engine. It is designed to (1) convert data to insights
at bare-metal speed, (2) support multiple query surfaces including SQL (the first (and current)
version only supports SQL, and (3) deliver bare-metal performance on any hardware (including
running on a laptop, running on a high-end (single node) server, and running on a distributed
cluster). Since its inception, the project has been planned to deliver a high-performance
single node system first, followed by a distributed system.

Quickstep is composed of several different modules that handle different concerns of a database
system. The main modules are:
  * Utility - Reusable general-purpose code that is used by many other modules.
  * Threading - Provides a cross-platform abstraction for threads and synchronization primitives
that abstract the underlying OS threading features.
  * Types - The core type system used across all of Quickstep. Handles details of how SQL
types are stored, parsed, serialized & deserialized, and converted. Also includes basic
containers for typed values (tuples and column-vectors) and low-level operations that apply
to typed values (e.g. basic arithmetic and comparisons).
  * Catalog - Tracks database schema as well as physical storage information for relations
(e.g. which physical blocks store a relation's data, and any physical partitioning and placement
  * Storage - Physically stores relational data in self-contained, self-describing blocks,
both in-memory and on persistent storage (disk or a distributed filesystem). Also includes
some heavyweight run-time data structures used in query processing (e.g. hash tables for join
and aggregation). Includes a buffer manager component for managing memory use and a file manager
component that handles data persistence.
  * Compression - Implements ordered dictionary compression. Several storage formats in the
Storage module are capable of storing compressed column data and evaluating some expressions
directly on compressed data without decompressing. The common code supporting compression
is in this module.
  * Expressions - Builds on the simple operations provided by the Types module to support
arbitrarily complex expressions over data, including scalar expressions, predicates, and aggregate
functions with and without grouping.
  * Relational Operators - This module provides the building blocks for queries in Quickstep.
A query is represented as a directed acyclic graph of relational operators, each of which
is responsible for applying some relational-algebraic operation(s) to transform its input.
Operators generate individual self-contained "work orders" that can be executed independently.
Most operators are parallelism-friendly and generate one work-order per storage block of input.
  * Query Execution - Handles the actual scheduling and execution of work from a query at
runtime. The central class is the Foreman, an independent thread with a global view of the
query plan and progress. The Foreman dispatches work-orders to stateless Worker threads and
monitors their progress, and also coordinates streaming of partial results between producers
and consumers in a query plan DAG to maximize parallelism. This module also includes the QueryContext
class, which holds global shared state for an individual query and is designed to support
easy serialization/deserialization for distributed execution.
  * Parser - A simple SQL lexer and parser that parses SQL syntax into an abstract syntax
tree for consumption by the Query Optimizer.
  * Query Optimizer - Takes the abstract syntax tree generated by the parser and transforms
it into a runable query-plan DAG for the Query Execution module. The Query Optimizer is responsible
for resolving references to relations and attributes in the query, checking it for semantic
correctness, and applying optimizations (e.g. filter pushdown, column pruning, join ordering)
as part of the transformation process.
  * Command-Line Interface - An interactive SQL shell interface to Quickstep.

Quickstep is implemented in C++ and does not require many external libraries to run. Quickstep
is currently an open source project licensed under the Apache License Version 2.0 and governed
by a group of engineers at Pivotal.

Quickstep began in 2011 as a research project in the Computer Sciences Department at the University
of Wisconsin and the copyrights underlying the project was
transferred to a company called Quickstep Technologies, which was acquired by Pivotal in 2015.

== Proposal ==
The goal of this proposal is to bring an already existing open source project into the Apache
Software Foundation (ASF) family thus leveraging a very successful “Apache Way” governance
model in order to increase community participation and diversity. We hope that it will allow
us to build a vibrant, diverse and self-governed open source community around the technology.
Pivotal has agreed to transfer the brand name "Quickstep" to ASF and will stop using Quickstep
to refer to this software if the project gets accepted into the ASF Incubator under the name
of "Apache Quickstep (incubating)". Pivotal may market and sell products that include Apache
Quickstep (incubating) under a different brand name, but no determination has been made regarding
that. While Quickstep is our primary choice for a name of the project, in anticipation of
any potential issues with PODLINGNAMESEARCH we have come up with two alternative names: (1)
Bolero or (2) Hustle. 

Pivotal is submitting this proposal to transfer the Quickstep source code and associated artifacts
(documentation, web site content, wiki, etc.) from its current Github location to the ASF
Incubator under the Apache License, Version 2.0 and is asking the Incubator PMC to establish
an open source community.

== Background ==

Quickstep is a next-generation relational data processing kernel currently being developed
as a collaboration between the academic community and Pivotal. Quickstep aims to deliver efficient
and sustainable data processing performance on current and future hardware by using a hardware-software
co-design philosophy.

For the hardware available today, this means effectively exploiting large main memories, fast
on-die CPU caches, highly parallel multi-core CPUs, and NVRAM storage technologies.

For the hardware available in the future, the project aims to co-design hardware and software
primitives that will allow data processing kernels to work on increasing amounts of data economically
-- both from the raw performance perspective, and from the perspective of the energy consumed
by data processing kernels.

== Rationale ==

In the past decade, ASF has established itself as one of the quintessential sources of innovation
in data management and data processing frameworks. At the same time, there is a clear need
for a modern, flexible framework capable of exploiting the hardware characteristics of today
and make it available as a set of building blocks to as wide a community of developers as
possible. We strongly believe that Quickstep technology can benefit a broader ecosystem of
database developers and researchers but this "world domination" needs to be achieved through
a vibrant, diverse, self-governed community collectively innovating around a single codebase
while at the same time cross-pollinating with various other data management communities. ASF
is the ideal place to meet those ambitious goals. We also believe that our experience bringing
various Pivotal data products into ASF family - including Apache Geode (incubating), Apache
HAWQ (incubating) and Apache MADlib (incubating) can be leveraged to make the Quickstep transition
a success, thus improving the chances of it becoming a truly vibrant Apache community.

== Initial Goals ==

Our initial goals are to bring Quickstep into ASF, transition internal engineering processes
into the open, and foster a collaborative development model according to the "Apache Way."
Pivotal and its academic partners plan to develop new functionality in an open, community-driven
way. To get there, the existing internal build, test and release processes will be refactored
to support open development.

== Current Status ==

Currently, the project code base is licensed under the Apache License v.2 and is available
in a GitHub repository . The documentation and
wiki pages are available at same repository. Throughout its history Quickstep was developed
in a hybrid closed/opens source mode but it has its roots in open source database management
communities. The internal engineering practices adopted by the development team lend themselves
well to an open, collaborative and meritocratic environment.

The Quickstep team has always focused on building a robust end user community of researchers.
The existing documentation along with various publications are expected to facilitate conversions
between our existing users so as to transform them into an active community of Quickstep members,
stakeholders and developers.

== Meritocracy ==

Our proposed list of initial committers include the current Quickstep R&D team and several
existing academic partners. This group will form a base for the broader community we will
invite to collaborate on the codebase. We intend to radically expand the initial developer
and user community by running the project in accordance with the "Apache Way". Users and new
contributors will be treated with respect and welcomed. By participating in the community
and providing quality patches/support that move the project forward, contributors will earn
merit. They also will be encouraged to provide non-code contributions (documentation, events,
community management, etc.) and will gain merit for doing so. Those with a proven support
and quality track record will be encouraged to become committers.

== Community ==

If Quickstep is accepted for incubation, the primary initial goal will be transitioning the
core community towards embracing the Apache Way of project governance. We would solicit major
existing contributors to become committers on the project from the start.

== Core Developers ==
A small percentage of Quickstep core developers are skilled in working as part of openly governed
Apache communities (mainly around the Hadoop ecosystem). That said, most of the core developers
are currently NOT affiliated with the ASF and would require new ICLAs before committing to
the project.

== Alignment ==
The following existing ASF projects can be considered when reviewing the Quickstep proposal:
  * Apache Hive: Potential alignment here is to consider a version of Hive that run on the
Quickstep executor.
  * Apache HAWQ (incubating): Potential alignment here is to consider exchanging ideas and/or
code for execution across both systems. 
  * Apache YARN: Work has started on a distributed version of Quickstep, and its current path
is to run as a YARN application.
  * Apache Mesos: Potential alignment here is for Quickstep to run in Apache Mesos.

== Known Risks ==
Development has been done mostly by a tightly knit group of University of Wisconsin researchers
and later was sponsored mostly by a single company (Pivotal) thus far and coordinated mainly
by the core Quickstep team. The Quickstep team now spans Pivotal and the University of Wisconsin.

For the project to fully transition to the Apache Way governance model, development must shift
towards the meritocracy-centric model of growing a community of contributors balanced with
the needs for extreme stability and core implementation coherency. The tools and development
practices in place for the Quickstep product are compatible with the ASF infrastructure and
thus we do not anticipate any on-boarding pains.

The project went through a very thorough vetting as part of Pivotal open sourcing it under
the  Apache License v. 2.0 only a few month ago. This gives us reasonable confidence to conclude
that the code base is clean and free from IP complications.
Orphaned products
Pivotal is fully committed to maintaining its position as one of the leading providers of
database management and data processing solutions and the corresponding Pivotal commercial
product will continue to be developed around the Quickstep project. 

Moreover, Pivotal has a vested interest in making Quickstep successful by driving its close
integration with both existing projects contributed to open source by Pivotal including Apache
HAWQ (incubating) and Greenplum Database, and sister ASF projects. We expect this to further
reduce the risk of orphaning the product.

== Inexperience with Open Source ==
Pivotal has embraced open source software since its formation by employing contributors/committers
and by shepherding open source projects like Cloud Foundry, Spring, RabbitMQ and MADlib. Individuals
working at Pivotal have experience with the formation of vibrant communities around open technologies
with the Cloud Foundry Foundation, and continuing with the creation of a community around
Apache Geode (incubating), Apache HAWQ (incubating) and Apache MADlib (incubating). Although
some of the initial committers have not had the experience of developing entirely open source,
community-driven projects, we expect to bring to bear the open development practices that
have proven successful on longstanding Pivotal open source projects to the Quickstep community.
Additionally, several ASF veterans have agreed to mentor the project and are listed in this
proposal. The project will rely on their collective guidance and wisdom to quickly transition
the entire team of initial committers towards practicing the Apache Way.

== Homogeneous Developers ==
While many of the initial committers are employed by Pivotal or at the University of Wisconsin,
we have already seen a healthy level of interest from existing customers and partners. We
intend to convert that interest directly into participation and will be investing in activities
to recruit additional committers from other companies.

== Reliance on Salaried Developers ==
Many of the contributors are paid to work in the Big Data and data processing space and nearly
all are committed to a career in that space. While they might wander from their current employers,
they are unlikely to venture far from their core expertise and thus will continue to be engaged
with the project regardless of their current employers.

== Relationships with Other Apache Products ==
As mentioned in the Alignment section, Quickstep may consider various degrees of integration
and code exchange with Apache Hive, Apache HAWQ (incubating), Apache YARN and Apache Mesos.

== An Excessive Fascination with the Apache Brand ==
While we intend to leverage the Apache ‘branding’ when talking to other projects as testament
of our project’s ‘neutrality’, we have no plans for making use of Apache brand in press
releases nor posting billboards advertising acceptance of Quickstep into Apache Incubator.

== Documentation ==
The documentation is currently available at

== Initial Source ==
Initial source code is currently licensed under Apache License v.2 and is available at

== Source and Intellectual Property Submission Plan ==
As soon as Quickstep is approved to join the Incubator, the source code will be transitioned
via an exhibit to Pivotal's current Software Grant Agreement onto ASF infrastructure. We know
of no legal encumbrances inhibiting the transfer of source code to the ASF.

== External Dependencies ==

Runtime dependencies:
 * farmhash: [License: MIT]
 * gflags: [License: BSD]
 * glog: [License: BSD]
 * gperftools: [License: BSD]
 * linenoise: [License: BSD 2-Clause]
 * protobuf: [License: BSD]

Build only dependencies:
 * cmake: [License: BSD]
 * bison: [License: GPL with exception for generated parsers]
 * flex: [License: BSD]

Test only dependencies:
 * benchmark: [License: Apache 2.0]
 * cpplint: [License: BSD]
 * gtest: [License: BSD]
 * iwyu: [License: UIUC BSD-Like]

Cryptography: N/A

== Required Resources ==

=== Mailing lists ===
  * (moderated subscriptions)

=== Git Repository ===

=== Issue Tracking ===


=== Other Resources ===
Means of setting up regular builds for Quickstep on will require integration
with Docker support.

== Initial Committers ==
 * Jignesh M. Patel
 * Harshad Deshmukh
 * Craig Chasseur
 * Jianqiao Zhu
 * Zuyu Zhang
 * Marc Spehlmann
 * Saket Saurabh
 * Hakan Memisoglu
 * Harshad Deshmukh
 * Adalbert Gerald Soosai Raj
 * Udip Pant
 * Siddharth Suresh
 * Rathijit Sen
 * Qiang Zeng
 * Shoban Chandrabose
 * Navneet Potti
 * Yinan Li
 * Sangmin Shin
 * James Paton
 * Shixuan Fan
 * Roman Shaposhnik
 * Konstantin Boudnik
 * Julian Hyde
 * Dhruba Borthakur

== Affiliations ==
 * Pivotal: Jignesh M. Patel, Zuyu Zhang, Roman Shaposhnik
 * Google: Craig Chasseur
 * Facebook: James Paton, Dhruba Borthakur
 * Pinterest: Sangmin Shin
 * Microsoft: Yinan Li
 * Hortonworks: Julian Hyde
 * Memcore: Konstantin Boudnik
 * University of Wisconsin (and supported in part by Pivotal): Everyone else

== Sponsors ==

=== Champion ===
Roman Shaposhnik

=== Nominated Mentors ===
The initial mentors are listed below:
 * Konstantin Boudnik - Apache Member, Memcore
 * Roman Shaposhnik - Apache Member, Pivotal
 * Julian Hyde, IPMC Member, Hortonworks

=== Sponsoring Entity ===
We would like to propose Apache incubator to sponsor this project.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message