Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2E0E5200B54 for ; Thu, 14 Jul 2016 02:51:38 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2C85A160A6E; Thu, 14 Jul 2016 00:51:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 04137160A6A for ; Thu, 14 Jul 2016 02:51:36 +0200 (CEST) Received: (qmail 77598 invoked by uid 500); 14 Jul 2016 00:51:36 -0000 Mailing-List: contact commits-help@quickstep.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@quickstep.incubator.apache.org Delivered-To: mailing list commits@quickstep.incubator.apache.org Received: (qmail 77586 invoked by uid 99); 14 Jul 2016 00:51:36 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jul 2016 00:51:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C481D1A7254 for ; Thu, 14 Jul 2016 00:51:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.646 X-Spam-Level: X-Spam-Status: No, score=-4.646 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 54SLasoFyZNF for ; Thu, 14 Jul 2016 00:51:33 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with SMTP id 850445FBBC for ; Thu, 14 Jul 2016 00:51:32 +0000 (UTC) Received: (qmail 77570 invoked by uid 99); 14 Jul 2016 00:51:31 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jul 2016 00:51:31 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id B2A24E04BE; Thu, 14 Jul 2016 00:51:31 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: zuyuz@apache.org To: commits@quickstep.incubator.apache.org Message-Id: <23bfbfdc813947d89b2686b83222c980@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: incubator-quickstep git commit: Add a quick start guide. Move the old README to a developer README. Date: Thu, 14 Jul 2016 00:51:31 +0000 (UTC) archived-at: Thu, 14 Jul 2016 00:51:38 -0000 Repository: incubator-quickstep Updated Branches: refs/heads/master b4e25edce -> 8181f1550 Add a quick start guide. Move the old README to a developer README. Project: http://git-wip-us.apache.org/repos/asf/incubator-quickstep/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-quickstep/commit/8181f155 Tree: http://git-wip-us.apache.org/repos/asf/incubator-quickstep/tree/8181f155 Diff: http://git-wip-us.apache.org/repos/asf/incubator-quickstep/diff/8181f155 Branch: refs/heads/master Commit: 8181f1550dd9630f5bdcfe13da6df561324f6b55 Parents: b4e25ed Author: Jignesh Patel Authored: Wed Jul 13 16:09:54 2016 -0500 Committer: Zuyu Zhang Committed: Wed Jul 13 17:50:01 2016 -0700 ---------------------------------------------------------------------- DEV_README.md | 108 ++++++++++++++++++++++++++++++++ README.md | 176 +++++++++++++++++++++-------------------------------- 2 files changed, 176 insertions(+), 108 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-quickstep/blob/8181f155/DEV_README.md ---------------------------------------------------------------------- diff --git a/DEV_README.md b/DEV_README.md new file mode 100644 index 0000000..04d5d66 --- /dev/null +++ b/DEV_README.md @@ -0,0 +1,108 @@ +# Apache Quickstep (Incubating) + +[![Travis Widget]][Travis] + +[Travis]: https://travis-ci.org/apache/incubator-quickstep +[Travis Widget]: https://travis-ci.org/apache/incubator-quickstep.svg?branch=master + +Apache Quickstep is an experimental high-performance database engine designed with the +aim of Data at Bare-Metal Speed. It began life in 2011 as a +[research project at the University of Wisconsin](https://quickstep.cs.wisc.edu) +and was acquired by [Pivotal](https://pivotal.io) in 2015. +Quickstep entered incubation at the +[Apache Software Foundation](https://www.apache.org) in April, 2016. + +## Getting Started (Building) + +A [build guide](BUILDING.md) is available which includes instructions for +building Quickstep for the first time. You may also find it useful to use one +of the [pre-made Vagrant boxes](build/vagrant) for Quickstep that are already +set up with all of the development tools needed to build Quickstep. + +## Documentation + +All publicly-visible classes and functions in the Quickstep code base have +Doxygen documentation. Simply run `doxygen` in the root of the Quickstep source +to generate browsable HTML documentation. Of course, the Doxygen comments +should also be useful when reading header files directly. + +In addition to the Doxygen and inline code comments explaining implementation +details, a high-level overview for each module that comprises Quickstep is +included in the README files in each subdirectory. + +## Architectural Overview + +Quickstep is composed of several different modules that handle different +concerns of a database system. The main modules are: + +* [Utility](utility) - Reusable general-purpose code that is used by many + other modules. +* [Threading](threading) - Provides a cross-platform abstraction for threads + and synchronization primitives that abstracts the underlying OS threading + features. +* [Types](types) - The core type system used across all of Quickstep. Handles + details of how SQL types are stored, parsed, serialized & deserialized, and + converted. Also includes basic containers for typed values (tuples and + column-vectors) and low-level operations that apply to typed values (e.g. + basic arithmetic and comparisons). +* [Catalog](catalog) - Keeps track of database schema as well as physical + storage information for relations (e.g. which physical blocks store a + relation's data, and any physical partitioning and placement information). +* [Storage](storage) - Handles the physical storage of relation data in + self-contained, self-describing blocks, both in-memory and on persistent + storage (disk or a distributed filesystem). Also includes some heavyweight + run-time data structures used in query processing (e.g. hash tables for join + and aggregation). Includes a buffer manager component for managing memory + use and a file manager component that handles data persistence. +* [Compression](compression) - A simple implementation of ordered dictionary + compression. Several storage formats in the Storage module are capable of + storing compressed column data and evaluating some expressions directly on + compressed data without decompressing. The common code supporting compression + is in this module. +* [Expressions](expressions) - This module builds on the simple operations + provided by the Types module to support arbitrarily complex expressions over + data, including scalar expressions, predicates, and aggregate functions with + and without grouping. +* [Relational Operators](relational_operators) - This module provides the + building blocks for queries in Quickstep. A query is represented as a + directed acyclic graph of relational operators, each of which is responsible + for applying some relational-algebraic operation(s) to tranform its input. + Operators generate individual self-contained "work orders" that can be + executed independently. Most operators are parallelism-friendly and generate + one work-order per storage block of input. +* [Query Execution](query_execution) - Handles the actual scheduling and + execution of work from a query at runtime. The central class is the Foreman, + an independent thread with a global view of the query plan and progress. The + Foreman dispatches work-orders to stateless Worker threads and monitors their + progress, and also coordinates streaming of partial results between producers + and consumers in a query plan DAG to maximize parallelism. This module also + includes the QueryContext class, which holds global shared state for an + individual query and is designed to support easy + serialization/deserialization for distributed execution. +* [Parser](parser) - A simple SQL lexer and parser that parses SQL syntax into + an abstract syntax tree for consumption by the Query Optimizer. +* [Query Optimizer](query_optimizer) - Takes the abstract syntax tree generated + by the parser and transforms it into a runable query-plan DAG for the Query + Execution module. The Query Optimizer is responsible for resolving references + to relations and attributes in the query, checking it for semantic + correctness, and applying optimizations (e.g. filter pushdown, column + pruning, join ordering) as part of the transformation process. +* [Command-Line Interface](cli) - An interactive SQL shell interface to + Quickstep. + +## Licensing + +Quickstep is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/pivotalsoftware/quickstep/blob/master/LICENSE) for the full license text. + +## Disclaimer +Apache Quickstep is an effort undergoing incubation at the Apache Software +Foundation (ASF), sponsored by the Apache Incubator PMC. + +Incubation is required of all newly accepted projects until a further +review indicates that the infrastructure, communications, and decision +making process have stabilized in a manner consistent with other +successful ASF projects. + +While incubation status is not necessarily a reflection of the +completeness or stability of the code, it does indicate that the +project has yet to be fully endorsed by the ASF. http://git-wip-us.apache.org/repos/asf/incubator-quickstep/blob/8181f155/README.md ---------------------------------------------------------------------- diff --git a/README.md b/README.md index 04d5d66..bdfb7fc 100644 --- a/README.md +++ b/README.md @@ -1,108 +1,68 @@ -# Apache Quickstep (Incubating) - -[![Travis Widget]][Travis] - -[Travis]: https://travis-ci.org/apache/incubator-quickstep -[Travis Widget]: https://travis-ci.org/apache/incubator-quickstep.svg?branch=master - -Apache Quickstep is an experimental high-performance database engine designed with the -aim of Data at Bare-Metal Speed. It began life in 2011 as a -[research project at the University of Wisconsin](https://quickstep.cs.wisc.edu) -and was acquired by [Pivotal](https://pivotal.io) in 2015. -Quickstep entered incubation at the -[Apache Software Foundation](https://www.apache.org) in April, 2016. - -## Getting Started (Building) - -A [build guide](BUILDING.md) is available which includes instructions for -building Quickstep for the first time. You may also find it useful to use one -of the [pre-made Vagrant boxes](build/vagrant) for Quickstep that are already -set up with all of the development tools needed to build Quickstep. - -## Documentation - -All publicly-visible classes and functions in the Quickstep code base have -Doxygen documentation. Simply run `doxygen` in the root of the Quickstep source -to generate browsable HTML documentation. Of course, the Doxygen comments -should also be useful when reading header files directly. - -In addition to the Doxygen and inline code comments explaining implementation -details, a high-level overview for each module that comprises Quickstep is -included in the README files in each subdirectory. - -## Architectural Overview - -Quickstep is composed of several different modules that handle different -concerns of a database system. The main modules are: - -* [Utility](utility) - Reusable general-purpose code that is used by many - other modules. -* [Threading](threading) - Provides a cross-platform abstraction for threads - and synchronization primitives that abstracts the underlying OS threading - features. -* [Types](types) - The core type system used across all of Quickstep. Handles - details of how SQL types are stored, parsed, serialized & deserialized, and - converted. Also includes basic containers for typed values (tuples and - column-vectors) and low-level operations that apply to typed values (e.g. - basic arithmetic and comparisons). -* [Catalog](catalog) - Keeps track of database schema as well as physical - storage information for relations (e.g. which physical blocks store a - relation's data, and any physical partitioning and placement information). -* [Storage](storage) - Handles the physical storage of relation data in - self-contained, self-describing blocks, both in-memory and on persistent - storage (disk or a distributed filesystem). Also includes some heavyweight - run-time data structures used in query processing (e.g. hash tables for join - and aggregation). Includes a buffer manager component for managing memory - use and a file manager component that handles data persistence. -* [Compression](compression) - A simple implementation of ordered dictionary - compression. Several storage formats in the Storage module are capable of - storing compressed column data and evaluating some expressions directly on - compressed data without decompressing. The common code supporting compression - is in this module. -* [Expressions](expressions) - This module builds on the simple operations - provided by the Types module to support arbitrarily complex expressions over - data, including scalar expressions, predicates, and aggregate functions with - and without grouping. -* [Relational Operators](relational_operators) - This module provides the - building blocks for queries in Quickstep. A query is represented as a - directed acyclic graph of relational operators, each of which is responsible - for applying some relational-algebraic operation(s) to tranform its input. - Operators generate individual self-contained "work orders" that can be - executed independently. Most operators are parallelism-friendly and generate - one work-order per storage block of input. -* [Query Execution](query_execution) - Handles the actual scheduling and - execution of work from a query at runtime. The central class is the Foreman, - an independent thread with a global view of the query plan and progress. The - Foreman dispatches work-orders to stateless Worker threads and monitors their - progress, and also coordinates streaming of partial results between producers - and consumers in a query plan DAG to maximize parallelism. This module also - includes the QueryContext class, which holds global shared state for an - individual query and is designed to support easy - serialization/deserialization for distributed execution. -* [Parser](parser) - A simple SQL lexer and parser that parses SQL syntax into - an abstract syntax tree for consumption by the Query Optimizer. -* [Query Optimizer](query_optimizer) - Takes the abstract syntax tree generated - by the parser and transforms it into a runable query-plan DAG for the Query - Execution module. The Query Optimizer is responsible for resolving references - to relations and attributes in the query, checking it for semantic - correctness, and applying optimizations (e.g. filter pushdown, column - pruning, join ordering) as part of the transformation process. -* [Command-Line Interface](cli) - An interactive SQL shell interface to - Quickstep. - -## Licensing - -Quickstep is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/pivotalsoftware/quickstep/blob/master/LICENSE) for the full license text. - -## Disclaimer -Apache Quickstep is an effort undergoing incubation at the Apache Software -Foundation (ASF), sponsored by the Apache Incubator PMC. - -Incubation is required of all newly accepted projects until a further -review indicates that the infrastructure, communications, and decision -making process have stabilized in a manner consistent with other -successful ASF projects. - -While incubation status is not necessarily a reflection of the -completeness or stability of the code, it does indicate that the -project has yet to be fully endorsed by the ASF. +# Apache Quickstep (Incubating) + +[![Travis Widget]][Travis] + +[Travis]: https://travis-ci.org/apache/incubator-quickstep +[Travis Widget]: https://travis-ci.org/apache/incubator-quickstep.svg?branch=master + +## What is Quickstep? +Apache Quickstep is high-performance database engine designed to exploit the full potential of hardware that is packed in modern computing boxes (servers and laptops). The initial version (available now!) targets single-node in-memory environments. If your data spills overs the memory limit Quickstep will still work, so you don't have to obsessively worry about the in-memory part. Also, if your working set fits in memory then Quickstep will transparently and automatically figure that out, and cache that hot set to deliver in-memory performance. + +Distributed execution is the next big feature for Quickstep. + +Quickstep began life in 2011 as a +[research project at the University of Wisconsin](https://www.cs.wisc.edu/~jignesh) +and entered incubation at the +[Apache Software Foundation](https://www.apache.org) in April, 2016. + +## Why Quickstep? +Did you know that the hardware that you have in your laptop was spread across a small cluster just a decade ago? (PS: Hopefully you are not using a very old laptop!) If you look at a high-end server box, then that packs compute and storage power that was a full rack about 5 years ago! And, the way hardware technology is going, that box is going to become even more powerful in the future. In fact, it is likely that the computing power in each box is going to grow faster than other hardware components (e.g. networking) in data centers. So, if you care about performance and/or total operating costs, paying attention to single box performance is likely to be super important in the long run. + +In other words there is a small data center in an individual compute boxes today! Quickstep aims to allow you to fully exploit the potential of that data center that is hidden in each individual box today. We call this the **scaling-in approach**, and it complements a scaling-out approach. But without scaling-in, you are overpaying (by a lot!) when you run your data service. + +## What are the key ingredients? + +Modern computing boxes contain a large number of computing cores and large main memory configuration. Quickstep allows you to fully exploit these hardware resources using novel data processing, data storage, and query processing methods that include: + +1. A unique **decoupling of data-flow from control-flow** for query execution that allows for unlimited intra and inter-query parallelism. Thus, using all the processing core effectively. + +2. A **template meta-programming** framework that provides fast vectorized query execution. Thus, using each processor cycle very efficiently. + +3. A **hybrid data storage** architecture that includes columnar and row-store. Yes, this may surprise some of you, but sometimes a row-store beats a column-store! + +And, it is **open source!** + +## Giving it a spin + +1. Checkout the code: ```git clone https://git-wip-us.apache.org/repos/asf/incubator-quickstep.git quickstep``` +2. Then, go to the code directory: ```cd quickstep``` +3. Initialize the dependencies: ```git submodule init``` +4. Checkout the dependencies: ```git submodule update``` +5. Go into the build directory: ```cd build``` +6. Create the Makefile: ```cmake -D CMAKE_BUILD_TYPE=Release ..``` +7. Build: ```make -j4```. Note you may replace the 4 with the number of cores on your machine. +8. Start quickstep: ```./quickstep_cli_shell --initialize_db=true```. You can now fire SQL queries. To quit, you can type in ```quit;``` Your data is stored in the directory ```qsstor``` + + +## Additional pointers + +1. For other build options, see the more comprehensive [build guide](BUILDING.md). +2. To get started as a developer, you should start with the [code organization guide](DEV_README.md). + + +## Licensing + +Quickstep is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/pivotalsoftware/quickstep/blob/master/LICENSE) for the full license text. + +## Disclaimer +Apache Quickstep is an effort undergoing incubation at the Apache Software +Foundation (ASF), sponsored by the Apache Incubator PMC. + +Incubation is required of all newly accepted projects until a further +review indicates that the infrastructure, communications, and decision +making process have stabilized in a manner consistent with other +successful ASF projects. + +While incubation status is not necessarily a reflection of the +completeness or stability of the code, it does indicate that the +project has yet to be fully endorsed by the ASF.