quickstep-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hbdeshm...@apache.org
Subject incubator-quickstep-site git commit: A blog post on storage format
Date Thu, 06 Apr 2017 12:42:48 GMT
Repository: incubator-quickstep-site
Updated Branches:
  refs/heads/asf-site 1a407a1bc -> 725d8b94e


A blog post on storage format

- Added CREATE TABLE statements.
- Added source code links.
- Added images for storage formats
- Added figure links in the post.
- Added table style


Project: http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/commit/725d8b94
Tree: http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/tree/725d8b94
Diff: http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/diff/725d8b94

Branch: refs/heads/asf-site
Commit: 725d8b94eaa554b590f75e344217e8ced6b83d25
Parents: 1a407a1
Author: Harshad Deshmukh <d.harshad17@gmail.com>
Authored: Thu Mar 30 17:22:49 2017 -0500
Committer: Harshad Deshmukh <harshad@cs.wisc.edu>
Committed: Thu Apr 6 07:41:28 2017 -0500

----------------------------------------------------------------------
 ...017-03-30-storage-formats-quickstep.markdown | 136 ++++++++++
 _sass/custom.scss                               |   9 +-
 assets/storage-format-column-store.jpg          | Bin 0 -> 32318 bytes
 .../storage-format-compressed-column-store.jpg  | Bin 0 -> 30927 bytes
 assets/storage-format-row-store.jpg             | Bin 0 -> 32071 bytes
 content/about/index.html                        |   4 +-
 content/assets/main.css                         |   5 +
 content/assets/storage-format-column-store.jpg  | Bin 0 -> 32318 bytes
 .../storage-format-compressed-column-store.jpg  | Bin 0 -> 30927 bytes
 content/assets/storage-format-row-store.jpg     | Bin 0 -> 32071 bytes
 content/blog/index.html                         |  11 +-
 content/feed.xml                                | 142 +++++++++-
 content/feed.xslt.xml                           |   4 -
 content/guides/2016/12/10/FirstQuery.html       |   4 +-
 .../2017/03/30/storage-formats-quickstep.html   | 256 +++++++++++++++++++
 content/index.html                              |   4 +-
 content/release/index.html                      |   4 +-
 17 files changed, 556 insertions(+), 23 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/_posts/2017-03-30-storage-formats-quickstep.markdown
----------------------------------------------------------------------
diff --git a/_posts/2017-03-30-storage-formats-quickstep.markdown b/_posts/2017-03-30-storage-formats-quickstep.markdown
new file mode 100644
index 0000000..75483a4
--- /dev/null
+++ b/_posts/2017-03-30-storage-formats-quickstep.markdown
@@ -0,0 +1,136 @@
+---
+layout: post
+title:  "Storage Formats in Quickstep"
+date: 2017-03-30 14:00:30 -0600
+categories: guides
+author: Harshad
+short: Quickstep offers variety of storage formats. This post explains what these formats are and how to use them.
+---
+One of the strengths of Quickstep is the variety of storage formats it offers to store the relational data. The storage management in Quickstep can work with all these formats, and each format comes with its own strengths and weaknesses. The foundation of this work was laid in a [research paper](http://www.vldb.org/pvldb/vol6/p1474-chasseur.pdf) from the Patel Wisconsin Database group that appeared in [VLDB 2014](http://vldb.org/) (International Conference on Very Large Databases). In this post, I will provide a brief primer on what these storage formats mean and how to use them in the Quickstep system. I will also provide links to the relevant source code files, so that the readers can start exploring the code base.
+
+# Storage Formats 101
+A storage format refers to how the data in a table is laid out in the memory. Let's consider a toy relational schema.
+
+{% highlight sql %}
+CREATE TABLE employee (ID INT, Age INT, Name VARCHAR(8));
+{% endhighlight %}
+
+A tuple in the employee table consists of three attributes, id (an integer), age (an integer) and the name (character of 8 bytes fixed length). Let's also populate our table with few tuples. For simplicity, let us assume that the name is a fixed length attribute; however, in general it could be variable length. 
+
+{% highlight sql %}
+INSERT into employee values (101, 30, 'Jennifer');
+INSERT into employee values (102, 25, 'Jim');
+INSERT into employee values (103, 35, 'David');
+{% endhighlight %}
+
+Conceptually, our table looks as follows:
+
+| ID 	| Age 	|   Name   	|
+|:--:	|:---:	|:--------:	|
+|  101 	|  30 	| Jennifer 	|
+|  102  |  25 	|    Jim   	|
+|  103 	|  35 	|   David  	|
+
+Dropping a table is easy -
+
+{% highlight sql %}
+DROP TABLE employee;
+{% endhighlight %}
+
+Next, let us dig into understanding the different ways in which this table can be stored in memory. 
+
+## Row Store
+
+The first storage format we will look at is *row store*, which is a popular format, and one that is intuitive too.
+
+The figure below represents how a row store format for our toy example looks like in memory. The offsets indicate how far is the memory location from the beginning of the block. 
+
+![Row Store]({{ base }}/assets/storage-format-row-store.jpg)
+
+In the row store format, a tuple is stored in the table definition order. In memory, the ID (101) of the first tuple is stored at some address, and it is followed by the age field for that tuple (30), and then the name field ("Jennifer"). The second tuple is stored in memory sequentially after the first tuple, and so on.
+
+## Column Store
+
+Let's now take a look at the *column store* format. The figure below depicts a column store layout for our toy example. 
+
+![Column Store]({{ base }}/assets/storage-format-column-store.jpg)
+
+In the column store format, the values from the same column are stored together. The order in which the values from a column are stored remains the same for all the columns. Notice the ID values 101, 102, 103 are stored contiguously. The corresponding Age fields 30, 25 and 35 are stored contiguously and likewise for the Name column.
+
+## Compression
+
+Compression is a standard technique to reduce the storage footprint. There is a large body of work on various compression techniques, which we won't cover here. Let's look at how compression can be applied in our toy example. If we look at the ID column, it has three values 101, 102 and 103. If they are stored as regular integers, each of them will occupy 4 bytes of memory. Interestingly, we can reduce the memory consumed in storing these three values. Observe that these three values are very close to 100. If we remember the base value 100 and only record the difference of each value from this base value (of 100), then we only need to store the values 1, 2, and 3. To store these three *deltas*, we don't even need a 4 byte integer, a 1-byte value is enough.
+
+Putting it all together, the picture below shows the compressed column store format in which the compression is applied on the ID column. 
+
+![Column Store with Compression]({{ base }}/assets/storage-format-compressed-column-store.jpg)
+
+Observe that the memory occupied by the three tuples together is only 43 bytes, compared to 48 bytes in the row store and the column store formats. As the number of tuples increase, there may be more opportunities for compression and thus more memory savings.
+
+There are some aspects related to storage management which deserve a blog post of their own. This includes the block-based storage design, and the impact of the above storage formats on various analytical queries. I hope to write more blog posts to cover these topics in the future.
+
+# Creating tables with various storage formats in Quickstep
+
+Now I will show you how to use the above storage formats to create tables in Quickstep. The storage format specification is part of the `CREATE TABLE SQL` command. We will continue with our toy example above to illustrate the various ways in which we can use the storage formats in Quickstep. 
+
+{% highlight sql %}
+CREATE TABLE employee (
+ID INT NOT NULL, 
+Age INT NOT NULL, 
+Name VARCHAR(8) NOT NULL
+) WITH BLOCKPROPERTIES (
+  TYPE split_rowstore,
+  BLOCKSIZEMB 4);
+{% endhighlight %}
+
+I will now describe some keywords used in the above SQL statement. The keyword `BLOCKPROPERTIES` refers to the storage properties of this table. The above command means that all the blocks in the employee table will use the row store format (the name `split_rowstore` means just simple row store), with each block sized to a maximum of 4 MB.
+
+In the example below, all the blocks in the employee table have compressed column store as their storage format. The values in each block are sorted by the `ID` values, and the compression is applied on the `ID` and `Name` columns.
+
+The `COMPRESS` keyword accepts one or more or `ALL` the columns from the table. Thus it can look like:
+
+{% highlight sql %}
+CREATE TABLE employee (
+ID INT NOT NULL, 
+Age INT NOT NULL, 
+Name VARCHAR(8) NOT NULL
+) WITH BLOCKPROPERTIES (
+  TYPE compressed_columnstore,
+  SORT ID,
+  COMPRESS (ID, Name),
+  BLOCKSIZEMB 4);
+{% endhighlight %}
+
+Finally, the command below indicates that the compression is applied on all the columns.
+
+{% highlight sql %}
+CREATE TABLE employee (
+ID INT NOT NULL, 
+Age INT NOT NULL, 
+Name VARCHAR(8) NOT NULL
+) WITH BLOCKPROPERTIES (
+  TYPE compressed_columnstore,
+  SORT ID,
+  COMPRESS ALL,
+  BLOCKSIZEMB 4);
+{% endhighlight %}
+
+# Implementation Details
+
+The above illustrations are meant to explain the various storage formats. In the actual implementation, in each storage block, there is a separate region to store the variable length attributes. For such attributes, in the row store implementation, we use a pointer (or offset) to point to the true location of the variable length attribute.
+
+In the current implementation, the compressed column store format requires that all the variable length attributes be compressed.
+
+For folks interested in looking at the source code for these storage formats, I  provide the links to relevant source code files below. Our code is well documented (doxygen) for the most part, so it should be easier to read.
+
+[Parsing the block properties](https://github.com/apache/incubator-quickstep/blob/master/parser/ParseBlockProperties.hpp)
+
+[Row store implementation](https://github.com/apache/incubator-quickstep/blob/master/storage/SplitRowStoreTupleStorageSubBlock.hpp)
+
+[Basic column store implementation](https://github.com/apache/incubator-quickstep/blob/master/storage/BasicColumnStoreTupleStorageSubBlock.hpp)
+
+[Compressed column store implementation](https://github.com/apache/incubator-quickstep/blob/master/storage/CompressedColumnStoreTupleStorageSubBlock.hpp)
+
+# Conclusion
+
+I hope this blog post was useful and gives you some idea about the various storage formats implemented in Quickstep. If you have questions, please shoot us an email on dev@quickstep.incubator.apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/_sass/custom.scss
----------------------------------------------------------------------
diff --git a/_sass/custom.scss b/_sass/custom.scss
index 2581b5d..86be895 100644
--- a/_sass/custom.scss
+++ b/_sass/custom.scss
@@ -17,4 +17,11 @@
 .big-margin {
   margin-top: 50px;
   margin-bottom: 50px;
-}
\ No newline at end of file
+}
+
+table {
+  border: black;
+  border-width: 2px;
+  border-style: solid;
+}
+

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/assets/storage-format-column-store.jpg
----------------------------------------------------------------------
diff --git a/assets/storage-format-column-store.jpg b/assets/storage-format-column-store.jpg
new file mode 100644
index 0000000..ffd92bb
Binary files /dev/null and b/assets/storage-format-column-store.jpg differ

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/assets/storage-format-compressed-column-store.jpg
----------------------------------------------------------------------
diff --git a/assets/storage-format-compressed-column-store.jpg b/assets/storage-format-compressed-column-store.jpg
new file mode 100644
index 0000000..34ee090
Binary files /dev/null and b/assets/storage-format-compressed-column-store.jpg differ

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/assets/storage-format-row-store.jpg
----------------------------------------------------------------------
diff --git a/assets/storage-format-row-store.jpg b/assets/storage-format-row-store.jpg
new file mode 100644
index 0000000..efe2d7b
Binary files /dev/null and b/assets/storage-format-row-store.jpg differ

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/content/about/index.html
----------------------------------------------------------------------
diff --git a/content/about/index.html b/content/about/index.html
index 2ce1fdd..0adab57 100644
--- a/content/about/index.html
+++ b/content/about/index.html
@@ -10,7 +10,7 @@
   <meta name="description" content="Quickstep is a next-generation data processing platform designed for high-performance analytical queries.">
 
   <link rel="stylesheet" href="/assets/main.css">
-  <link rel="canonical" href="http://localhost:4000/about/">
+  <link rel="canonical" href="http://quickstep.apache.org//about/">
   <link rel="alternate" type="application/rss+xml" title="Apache Quickstep (Incubating)" href="/feed.xml">
   
   
@@ -54,8 +54,6 @@
         
           
         
-          
-        
       </div>
     </nav>
 

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/content/assets/main.css
----------------------------------------------------------------------
diff --git a/content/assets/main.css b/content/assets/main.css
index 69d76ee..ac1c9b8 100644
--- a/content/assets/main.css
+++ b/content/assets/main.css
@@ -469,3 +469,8 @@ pre {
 .big-margin {
   margin-top: 50px;
   margin-bottom: 50px; }
+
+table {
+  border: black;
+  border-width: 2px;
+  border-style: solid; }

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/content/assets/storage-format-column-store.jpg
----------------------------------------------------------------------
diff --git a/content/assets/storage-format-column-store.jpg b/content/assets/storage-format-column-store.jpg
new file mode 100644
index 0000000..ffd92bb
Binary files /dev/null and b/content/assets/storage-format-column-store.jpg differ

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/content/assets/storage-format-compressed-column-store.jpg
----------------------------------------------------------------------
diff --git a/content/assets/storage-format-compressed-column-store.jpg b/content/assets/storage-format-compressed-column-store.jpg
new file mode 100644
index 0000000..34ee090
Binary files /dev/null and b/content/assets/storage-format-compressed-column-store.jpg differ

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/content/assets/storage-format-row-store.jpg
----------------------------------------------------------------------
diff --git a/content/assets/storage-format-row-store.jpg b/content/assets/storage-format-row-store.jpg
new file mode 100644
index 0000000..efe2d7b
Binary files /dev/null and b/content/assets/storage-format-row-store.jpg differ

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/content/blog/index.html
----------------------------------------------------------------------
diff --git a/content/blog/index.html b/content/blog/index.html
index ad3221d..6ca08a4 100644
--- a/content/blog/index.html
+++ b/content/blog/index.html
@@ -10,7 +10,7 @@
   <meta name="description" content="Quickstep is a next-generation data processing platform designed for high-performance analytical queries.">
 
   <link rel="stylesheet" href="/assets/main.css">
-  <link rel="canonical" href="http://localhost:4000/blog/">
+  <link rel="canonical" href="http://quickstep.apache.org//blog/">
   <link rel="alternate" type="application/rss+xml" title="Apache Quickstep (Incubating)" href="/feed.xml">
   
   
@@ -54,8 +54,6 @@
         
           
         
-          
-        
       </div>
     </nav>
 
@@ -77,6 +75,13 @@
     
       <li>
         <h2>
+          <a class="post-link" href="/guides/2017/03/30/storage-formats-quickstep.html">Storage Formats in Quickstep</a>
+        </h2>
+        <span class="post-meta">Mar 30, 2017</span>
+      </li>
+    
+      <li>
+        <h2>
           <a class="post-link" href="/guides/2016/12/10/FirstQuery.html">Your First Query</a>
         </h2>
         <span class="post-meta">Dec 10, 2016</span>

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/content/feed.xml
----------------------------------------------------------------------
diff --git a/content/feed.xml b/content/feed.xml
index 83c994d..bf9d227 100644
--- a/content/feed.xml
+++ b/content/feed.xml
@@ -1,5 +1,141 @@
-<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xml" href="http://localhost:4000/feed.xslt.xml"?><feed xmlns="http://www.w3.org/2005/Atom"><generator uri="http://jekyllrb.com" version="3.3.1">Jekyll</generator><link href="http://localhost:4000/feed.xml" rel="self" type="application/atom+xml" /><link href="http://localhost:4000/" rel="alternate" type="text/html" /><updated>2017-03-27T15:47:02-05:00</updated><id>http://localhost:4000//</id><title type="html">Apache Quickstep (Incubating)</title><subtitle>Quickstep is a next-generation data processing platform designed  for high-performance analytical queries.
-</subtitle><entry><title type="html">Your First Query</title><link href="http://localhost:4000/guides/2016/12/10/FirstQuery.html" rel="alternate" type="text/html" title="Your First Query" /><published>2016-12-10T12:29:09-06:00</published><updated>2016-12-10T12:29:09-06:00</updated><id>http://localhost:4000/guides/2016/12/10/FirstQuery</id><content type="html" xml:base="http://localhost:4000/guides/2016/12/10/FirstQuery.html">&lt;p&gt;For this tutorial, I’m going to assume you’re running in a unix environment. If you’re having trouble building on Windows, try asking the dev community (&lt;a href=&quot;mailto:dev@quickstep.incubating.apache.org&quot;&gt;dev@quickstep.incubating.apache.org&lt;/a&gt;). You can also find a complete guide &lt;a href=&quot;https://github.com/cramja/incubator-quickstep/blob/master/BUILDING.md&quot;&gt;here in our documentation&lt;/a&gt;.&lt;/p&gt;
+<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.3.1">Jekyll</generator><link href="http://quickstep.apache.org//feed.xml" rel="self" type="application/atom+xml" /><link href="http://quickstep.apache.org//" rel="alternate" type="text/html" /><updated>2017-04-06T07:40:28-05:00</updated><id>http://quickstep.apache.org//</id><title type="html">Apache Quickstep (Incubating)</title><subtitle>Quickstep is a next-generation data processing platform designed  for high-performance analytical queries.
+</subtitle><entry><title type="html">Storage Formats in Quickstep</title><link href="http://quickstep.apache.org//guides/2017/03/30/storage-formats-quickstep.html" rel="alternate" type="text/html" title="Storage Formats in Quickstep" /><published>2017-03-30T15:00:30-05:00</published><updated>2017-03-30T15:00:30-05:00</updated><id>http://quickstep.apache.org//guides/2017/03/30/storage-formats-quickstep</id><content type="html" xml:base="http://quickstep.apache.org//guides/2017/03/30/storage-formats-quickstep.html">&lt;p&gt;One of the strengths of Quickstep is the variety of storage formats it offers to store the relational data. The storage management in Quickstep can work with all these formats, and each format comes with its own strengths and weaknesses. The foundation of this work was laid in a &lt;a href=&quot;http://www.vldb.org/pvldb/vol6/p1474-chasseur.pdf&quot;&gt;research paper&lt;/a&gt; from the Patel Wisconsin Database group that appeared in &lt;a href=&quot;http://vldb.or
 g/&quot;&gt;VLDB 2014&lt;/a&gt; (International Conference on Very Large Databases). In this post, I will provide a brief primer on what these storage formats mean and how to use them in the Quickstep system. I will also provide links to the relevant source code files, so that the readers can start exploring the code base.&lt;/p&gt;
+
+&lt;h1 id=&quot;storage-formats-101&quot;&gt;Storage Formats 101&lt;/h1&gt;
+&lt;p&gt;A storage format refers to how the data in a table is laid out in the memory. Let’s consider a toy relational schema.&lt;/p&gt;
+
+&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;employee&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ID&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Age&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Name&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
+
+&lt;p&gt;A tuple in the employee table consists of three attributes, id (an integer), age (an integer) and the name (character of 8 bytes fixed length). Let’s also populate our table with few tuples. For simplicity, let us assume that the name is a fixed length attribute; however, in general it could be variable length.&lt;/p&gt;
+
+&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;into&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;employee&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;values&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;101&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'Jennifer'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
+&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;into&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;employee&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;values&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;102&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;25&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'Jim'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
+&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;into&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;employee&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;values&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;103&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;35&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'David'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
+
+&lt;p&gt;Conceptually, our table looks as follows:&lt;/p&gt;
+
+&lt;table&gt;
+  &lt;thead&gt;
+    &lt;tr&gt;
+      &lt;th style=&quot;text-align: center&quot;&gt;ID&lt;/th&gt;
+      &lt;th style=&quot;text-align: center&quot;&gt;Age&lt;/th&gt;
+      &lt;th style=&quot;text-align: center&quot;&gt;Name&lt;/th&gt;
+    &lt;/tr&gt;
+  &lt;/thead&gt;
+  &lt;tbody&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;101&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;30&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;Jennifer&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;102&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;25&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;Jim&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;103&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;35&lt;/td&gt;
+      &lt;td style=&quot;text-align: center&quot;&gt;David&lt;/td&gt;
+    &lt;/tr&gt;
+  &lt;/tbody&gt;
+&lt;/table&gt;
+
+&lt;p&gt;Dropping a table is easy -&lt;/p&gt;
+
+&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;DROP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;employee&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
+
+&lt;p&gt;Next, let us dig into understanding the different ways in which this table can be stored in memory.&lt;/p&gt;
+
+&lt;h2 id=&quot;row-store&quot;&gt;Row Store&lt;/h2&gt;
+
+&lt;p&gt;The first storage format we will look at is &lt;em&gt;row store&lt;/em&gt;, which is a popular format, and one that is intuitive too.&lt;/p&gt;
+
+&lt;p&gt;The figure below represents how a row store format for our toy example looks like in memory. The offsets indicate how far is the memory location from the beginning of the block.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/assets/storage-format-row-store.jpg&quot; alt=&quot;Row Store&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;In the row store format, a tuple is stored in the table definition order. In memory, the ID (101) of the first tuple is stored at some address, and it is followed by the age field for that tuple (30), and then the name field (“Jennifer”). The second tuple is stored in memory sequentially after the first tuple, and so on.&lt;/p&gt;
+
+&lt;h2 id=&quot;column-store&quot;&gt;Column Store&lt;/h2&gt;
+
+&lt;p&gt;Let’s now take a look at the &lt;em&gt;column store&lt;/em&gt; format. The figure below depicts a column store layout for our toy example.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/assets/storage-format-column-store.jpg&quot; alt=&quot;Column Store&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;In the column store format, the values from the same column are stored together. The order in which the values from a column are stored remains the same for all the columns. Notice the ID values 101, 102, 103 are stored contiguously. The corresponding Age fields 30, 25 and 35 are stored contiguously and likewise for the Name column.&lt;/p&gt;
+
+&lt;h2 id=&quot;compression&quot;&gt;Compression&lt;/h2&gt;
+
+&lt;p&gt;Compression is a standard technique to reduce the storage footprint. There is a large body of work on various compression techniques, which we won’t cover here. Let’s look at how compression can be applied in our toy example. If we look at the ID column, it has three values 101, 102 and 103. If they are stored as regular integers, each of them will occupy 4 bytes of memory. Interestingly, we can reduce the memory consumed in storing these three values. Observe that these three values are very close to 100. If we remember the base value 100 and only record the difference of each value from this base value (of 100), then we only need to store the values 1, 2, and 3. To store these three &lt;em&gt;deltas&lt;/em&gt;, we don’t even need a 4 byte integer, a 1-byte value is enough.&lt;/p&gt;
+
+&lt;p&gt;Putting it all together, the picture below shows the compressed column store format in which the compression is applied on the ID column.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/assets/storage-format-compressed-column-store.jpg&quot; alt=&quot;Column Store with Compression&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Observe that the memory occupied by the three tuples together is only 43 bytes, compared to 48 bytes in the row store and the column store formats. As the number of tuples increase, there may be more opportunities for compression and thus more memory savings.&lt;/p&gt;
+
+&lt;p&gt;There are some aspects related to storage management which deserve a blog post of their own. This includes the block-based storage design, and the impact of the above storage formats on various analytical queries. I hope to write more blog posts to cover these topics in the future.&lt;/p&gt;
+
+&lt;h1 id=&quot;creating-tables-with-various-storage-formats-in-quickstep&quot;&gt;Creating tables with various storage formats in Quickstep&lt;/h1&gt;
+
+&lt;p&gt;Now I will show you how to use the above storage formats to create tables in Quickstep. The storage format specification is part of the &lt;code class=&quot;highlighter-rouge&quot;&gt;CREATE TABLE SQL&lt;/code&gt; command. We will continue with our toy example above to illustrate the various ways in which we can use the storage formats in Quickstep.&lt;/p&gt;
+
+&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;employee&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;ID&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
+&lt;span class=&quot;n&quot;&gt;Age&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
+&lt;span class=&quot;n&quot;&gt;Name&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;
+&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BLOCKPROPERTIES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
+  &lt;span class=&quot;k&quot;&gt;TYPE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;split_rowstore&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;BLOCKSIZEMB&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
+
+&lt;p&gt;I will now describe some keywords used in the above SQL statement. The keyword &lt;code class=&quot;highlighter-rouge&quot;&gt;BLOCKPROPERTIES&lt;/code&gt; refers to the storage properties of this table. The above command means that all the blocks in the employee table will use the row store format (the name &lt;code class=&quot;highlighter-rouge&quot;&gt;split_rowstore&lt;/code&gt; means just simple row store), with each block sized to a maximum of 4 MB.&lt;/p&gt;
+
+&lt;p&gt;In the example below, all the blocks in the employee table have compressed column store as their storage format. The values in each block are sorted by the &lt;code class=&quot;highlighter-rouge&quot;&gt;ID&lt;/code&gt; values, and the compression is applied on the &lt;code class=&quot;highlighter-rouge&quot;&gt;ID&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;Name&lt;/code&gt; columns.&lt;/p&gt;
+
+&lt;p&gt;The &lt;code class=&quot;highlighter-rouge&quot;&gt;COMPRESS&lt;/code&gt; keyword accepts one or more or &lt;code class=&quot;highlighter-rouge&quot;&gt;ALL&lt;/code&gt; the columns from the table. Thus it can look like:&lt;/p&gt;
+
+&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;employee&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;ID&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
+&lt;span class=&quot;n&quot;&gt;Age&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
+&lt;span class=&quot;n&quot;&gt;Name&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;
+&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BLOCKPROPERTIES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
+  &lt;span class=&quot;k&quot;&gt;TYPE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;compressed_columnstore&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;SORT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;COMPRESS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;BLOCKSIZEMB&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
+
+&lt;p&gt;Finally, the command below indicates that the compression is applied on all the columns.&lt;/p&gt;
+
+&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;employee&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;ID&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
+&lt;span class=&quot;n&quot;&gt;Age&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
+&lt;span class=&quot;n&quot;&gt;Name&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;
+&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BLOCKPROPERTIES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
+  &lt;span class=&quot;k&quot;&gt;TYPE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;compressed_columnstore&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;SORT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;COMPRESS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ALL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;BLOCKSIZEMB&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
+
+&lt;h1 id=&quot;implementation-details&quot;&gt;Implementation Details&lt;/h1&gt;
+
+&lt;p&gt;The above illustrations are meant to explain the various storage formats. In the actual implementation, in each storage block, there is a separate region to store the variable length attributes. For such attributes, in the row store implementation, we use a pointer (or offset) to point to the true location of the variable length attribute.&lt;/p&gt;
+
+&lt;p&gt;In the current implementation, the compressed column store format requires that all the variable length attributes be compressed.&lt;/p&gt;
+
+&lt;p&gt;For folks interested in looking at the source code for these storage formats, I  provide the links to relevant source code files below. Our code is well documented (doxygen) for the most part, so it should be easier to read.&lt;/p&gt;
+
+&lt;p&gt;&lt;a href=&quot;https://github.com/apache/incubator-quickstep/blob/master/parser/ParseBlockProperties.hpp&quot;&gt;Parsing the block properties&lt;/a&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;a href=&quot;https://github.com/apache/incubator-quickstep/blob/master/storage/SplitRowStoreTupleStorageSubBlock.hpp&quot;&gt;Row store implementation&lt;/a&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;a href=&quot;https://github.com/apache/incubator-quickstep/blob/master/storage/BasicColumnStoreTupleStorageSubBlock.hpp&quot;&gt;Basic column store implementation&lt;/a&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;a href=&quot;https://github.com/apache/incubator-quickstep/blob/master/storage/CompressedColumnStoreTupleStorageSubBlock.hpp&quot;&gt;Compressed column store implementation&lt;/a&gt;&lt;/p&gt;
+
+&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h1&gt;
+
+&lt;p&gt;I hope this blog post was useful and gives you some idea about the various storage formats implemented in Quickstep. If you have questions, please shoot us an email on dev@quickstep.incubator.apache.org.&lt;/p&gt;</content><author><name>Harshad</name></author><summary type="html">One of the strengths of Quickstep is the variety of storage formats it offers to store the relational data. The storage management in Quickstep can work with all these formats, and each format comes with its own strengths and weaknesses. The foundation of this work was laid in a research paper from the Patel Wisconsin Database group that appeared in VLDB 2014 (International Conference on Very Large Databases). In this post, I will provide a brief primer on what these storage formats mean and how to use them in the Quickstep system. I will also provide links to the relevant source code files, so that the readers can start exploring the code base.</summary></entry><entry><title type="html">Your First
  Query</title><link href="http://quickstep.apache.org//guides/2016/12/10/FirstQuery.html" rel="alternate" type="text/html" title="Your First Query" /><published>2016-12-10T12:29:09-06:00</published><updated>2016-12-10T12:29:09-06:00</updated><id>http://quickstep.apache.org//guides/2016/12/10/FirstQuery</id><content type="html" xml:base="http://quickstep.apache.org//guides/2016/12/10/FirstQuery.html">&lt;p&gt;For this tutorial, I’m going to assume you’re running in a unix environment. If you’re having trouble building on Windows, try asking the dev community (&lt;a href=&quot;mailto:dev@quickstep.incubating.apache.org&quot;&gt;dev@quickstep.incubating.apache.org&lt;/a&gt;). You can also find a complete guide &lt;a href=&quot;https://github.com/cramja/incubator-quickstep/blob/master/BUILDING.md&quot;&gt;here in our documentation&lt;/a&gt;.&lt;/p&gt;
 
 &lt;p&gt;If you’re going to build Quickstep, you’ll first need to clone it from Github and initialize the submodules&lt;/p&gt;
 
@@ -40,4 +176,4 @@ make -j4 quickstep_cli_shell&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
 &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_numbers&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;k&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1969&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1337&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;generate_series&lt;/span&gt;&lt;span cl
 ass=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AVG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_numbers&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
 
-&lt;p&gt;Of course, that query is meaningless but it should give you some idea of the sophistication of the SQL interface. This post is meant to give a taste of how I would get started with Quickstep. If it’s not enough or you want more information, we’ve been really good about updating our documentation. Checkout our &lt;a href=&quot;https://github.com/apache/incubator-quickstep/blob/master/README.md&quot;&gt;README&lt;/a&gt; and &lt;a href=&quot;https://github.com/apache/incubator-quickstep/blob/master/DEV_README.md&quot;&gt;DEV_GUIDE&lt;/a&gt; for more pointers!&lt;/p&gt;</content><author><name>Marc</name></author><summary type="html">For this tutorial, I’m going to assume you’re running in a unix environment. If you’re having trouble building on Windows, try asking the dev community (dev@quickstep.incubating.apache.org). You can also find a complete guide here in our documentation.</summary></entry></feed>
+&lt;p&gt;Of course, that query is meaningless but it should give you some idea of the sophistication of the SQL interface. This post is meant to give a taste of how I would get started with Quickstep. If it’s not enough or you want more information, we’ve been really good about updating our documentation. Checkout our &lt;a href=&quot;https://github.com/apache/incubator-quickstep/blob/master/README.md&quot;&gt;README&lt;/a&gt; and &lt;a href=&quot;https://github.com/apache/incubator-quickstep/blob/master/DEV_README.md&quot;&gt;DEV_GUIDE&lt;/a&gt; for more pointers!&lt;/p&gt;</content><author><name>Marc</name></author><summary type="html">For this tutorial, I’m going to assume you’re running in a unix environment. If you’re having trouble building on Windows, try asking the dev community (dev@quickstep.incubating.apache.org). You can also find a complete guide here in our documentation.</summary></entry></feed>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/content/feed.xslt.xml
----------------------------------------------------------------------
diff --git a/content/feed.xslt.xml b/content/feed.xslt.xml
deleted file mode 100644
index 15901c1..0000000
--- a/content/feed.xslt.xml
+++ /dev/null
@@ -1,4 +0,0 @@
-<?xml version="1.0" encoding="utf-8"?><xsl:transform  version="1.0"
-  xmlns:a="http://www.w3.org/2005/Atom"
-  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
-><xsl:strip-space elements="*"/><xsl:output method="text"/><xsl:template match="*"/><xsl:template match="a:feed"><xsl:text>Atom Feed:</xsl:text><xsl:value-of select="a:id"/><xsl:text>&#10;</xsl:text><xsl:apply-templates/></xsl:template><xsl:template match="a:entry"><xsl:text>  ----------------------------------------&#10;</xsl:text><xsl:text>  Feed entry:</xsl:text><xsl:value-of select="a:id"/><xsl:text>&#10;</xsl:text><xsl:apply-templates/></xsl:template><xsl:template match="a:title"><xsl:if test="parent::a:entry"><xsl:value-of select="'  '"/></xsl:if><xsl:value-of select="local-name()"/>:<xsl:apply-templates/><xsl:text>&#10;</xsl:text></xsl:template><xsl:template match="a:published|a:updated"><xsl:if test="parent::a:entry"><xsl:value-of select="'  '"/></xsl:if><xsl:value-of select="local-name()"/>:<xsl:apply-templates/><xsl:text>&#10;</xsl:text></xsl:template></xsl:transform>

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/content/guides/2016/12/10/FirstQuery.html
----------------------------------------------------------------------
diff --git a/content/guides/2016/12/10/FirstQuery.html b/content/guides/2016/12/10/FirstQuery.html
index 1f7d1e8..db859d1 100644
--- a/content/guides/2016/12/10/FirstQuery.html
+++ b/content/guides/2016/12/10/FirstQuery.html
@@ -10,7 +10,7 @@
   <meta name="description" content="For this tutorial, I’m going to assume you’re running in a unix environment. If you’re having trouble building on Windows, try asking the dev community (dev@...">
 
   <link rel="stylesheet" href="/assets/main.css">
-  <link rel="canonical" href="http://localhost:4000/guides/2016/12/10/FirstQuery.html">
+  <link rel="canonical" href="http://quickstep.apache.org//guides/2016/12/10/FirstQuery.html">
   <link rel="alternate" type="application/rss+xml" title="Apache Quickstep (Incubating)" href="/feed.xml">
   
   
@@ -54,8 +54,6 @@
         
           
         
-          
-        
       </div>
     </nav>
 

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/content/guides/2017/03/30/storage-formats-quickstep.html
----------------------------------------------------------------------
diff --git a/content/guides/2017/03/30/storage-formats-quickstep.html b/content/guides/2017/03/30/storage-formats-quickstep.html
new file mode 100644
index 0000000..17ef4b6
--- /dev/null
+++ b/content/guides/2017/03/30/storage-formats-quickstep.html
@@ -0,0 +1,256 @@
+<!DOCTYPE html>
+<html lang="en">
+
+  <head>
+  <meta charset="utf-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+
+  <title>Storage Formats in Quickstep</title>
+  <meta name="description" content="One of the strengths of Quickstep is the variety of storage formats it offers to store the relational data. The storage management in Quickstep can work with...">
+
+  <link rel="stylesheet" href="/assets/main.css">
+  <link rel="canonical" href="http://quickstep.apache.org//guides/2017/03/30/storage-formats-quickstep.html">
+  <link rel="alternate" type="application/rss+xml" title="Apache Quickstep (Incubating)" href="/feed.xml">
+  
+  
+</head>
+
+
+  <body>
+
+    <header class="site-header" role="banner">
+
+  <div class="wrapper">
+
+    <a class="site-title" href="/">Apache Quickstep (Incubating)</a>
+
+    <nav class="site-nav">
+      <span class="menu-icon">
+        <svg viewBox="0 0 18 15" width="18px" height="15px">
+          <path fill="#424242" d="M18,1.484c0,0.82-0.665,1.484-1.484,1.484H1.484C0.665,2.969,0,2.304,0,1.484l0,0C0,0.665,0.665,0,1.484,0 h15.031C17.335,0,18,0.665,18,1.484L18,1.484z"/>
+          <path fill="#424242" d="M18,7.516C18,8.335,17.335,9,16.516,9H1.484C0.665,9,0,8.335,0,7.516l0,0c0-0.82,0.665-1.484,1.484-1.484 h15.031C17.335,6.031,18,6.696,18,7.516L18,7.516z"/>
+          <path fill="#424242" d="M18,13.516C18,14.335,17.335,15,16.516,15H1.484C0.665,15,0,14.335,0,13.516l0,0 c0-0.82,0.665-1.484,1.484-1.484h15.031C17.335,12.031,18,12.696,18,13.516L18,13.516z"/>
+        </svg>
+      </span>
+
+      <div class="trigger">
+        
+          
+          <a class="page-link" href="/about/">About</a>
+          
+        
+          
+          <a class="page-link" href="/blog/">Blog</a>
+          
+        
+          
+        
+          
+          <a class="page-link" href="/release/">Releases</a>
+          
+        
+          
+        
+          
+        
+      </div>
+    </nav>
+
+  </div>
+
+</header>
+
+
+    <main class="page-content" aria-label="Content">
+      <div class="wrapper">
+        <article class="post" itemscope itemtype="http://schema.org/BlogPosting">
+
+  <header class="post-header">
+    <h1 class="post-title" itemprop="name headline">Storage Formats in Quickstep</h1>
+    <p class="post-meta"><time datetime="2017-03-30T15:00:30-05:00" itemprop="datePublished">Mar 30, 2017</time> • <span itemprop="author" itemscope itemtype="http://schema.org/Person"><span itemprop="name">Harshad</span></span></p>
+  </header>
+
+  <div class="post-content" itemprop="articleBody">
+    <p>One of the strengths of Quickstep is the variety of storage formats it offers to store the relational data. The storage management in Quickstep can work with all these formats, and each format comes with its own strengths and weaknesses. The foundation of this work was laid in a <a href="http://www.vldb.org/pvldb/vol6/p1474-chasseur.pdf">research paper</a> from the Patel Wisconsin Database group that appeared in <a href="http://vldb.org/">VLDB 2014</a> (International Conference on Very Large Databases). In this post, I will provide a brief primer on what these storage formats mean and how to use them in the Quickstep system. I will also provide links to the relevant source code files, so that the readers can start exploring the code base.</p>
+
+<h1 id="storage-formats-101">Storage Formats 101</h1>
+<p>A storage format refers to how the data in a table is laid out in the memory. Let’s consider a toy relational schema.</p>
+
+<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">employee</span> <span class="p">(</span><span class="n">ID</span> <span class="n">INT</span><span class="p">,</span> <span class="n">Age</span> <span class="n">INT</span><span class="p">,</span> <span class="n">Name</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">8</span><span class="p">));</span></code></pre></figure>
+
+<p>A tuple in the employee table consists of three attributes, id (an integer), age (an integer) and the name (character of 8 bytes fixed length). Let’s also populate our table with few tuples. For simplicity, let us assume that the name is a fixed length attribute; however, in general it could be variable length.</p>
+
+<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">INSERT</span> <span class="k">into</span> <span class="n">employee</span> <span class="k">values</span> <span class="p">(</span><span class="mi">101</span><span class="p">,</span> <span class="mi">30</span><span class="p">,</span> <span class="s1">'Jennifer'</span><span class="p">);</span>
+<span class="k">INSERT</span> <span class="k">into</span> <span class="n">employee</span> <span class="k">values</span> <span class="p">(</span><span class="mi">102</span><span class="p">,</span> <span class="mi">25</span><span class="p">,</span> <span class="s1">'Jim'</span><span class="p">);</span>
+<span class="k">INSERT</span> <span class="k">into</span> <span class="n">employee</span> <span class="k">values</span> <span class="p">(</span><span class="mi">103</span><span class="p">,</span> <span class="mi">35</span><span class="p">,</span> <span class="s1">'David'</span><span class="p">);</span></code></pre></figure>
+
+<p>Conceptually, our table looks as follows:</p>
+
+<table>
+  <thead>
+    <tr>
+      <th style="text-align: center">ID</th>
+      <th style="text-align: center">Age</th>
+      <th style="text-align: center">Name</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: center">101</td>
+      <td style="text-align: center">30</td>
+      <td style="text-align: center">Jennifer</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">102</td>
+      <td style="text-align: center">25</td>
+      <td style="text-align: center">Jim</td>
+    </tr>
+    <tr>
+      <td style="text-align: center">103</td>
+      <td style="text-align: center">35</td>
+      <td style="text-align: center">David</td>
+    </tr>
+  </tbody>
+</table>
+
+<p>Dropping a table is easy -</p>
+
+<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">DROP</span> <span class="k">TABLE</span> <span class="n">employee</span><span class="p">;</span></code></pre></figure>
+
+<p>Next, let us dig into understanding the different ways in which this table can be stored in memory.</p>
+
+<h2 id="row-store">Row Store</h2>
+
+<p>The first storage format we will look at is <em>row store</em>, which is a popular format, and one that is intuitive too.</p>
+
+<p>The figure below represents how a row store format for our toy example looks like in memory. The offsets indicate how far is the memory location from the beginning of the block.</p>
+
+<p><img src="/assets/storage-format-row-store.jpg" alt="Row Store" /></p>
+
+<p>In the row store format, a tuple is stored in the table definition order. In memory, the ID (101) of the first tuple is stored at some address, and it is followed by the age field for that tuple (30), and then the name field (“Jennifer”). The second tuple is stored in memory sequentially after the first tuple, and so on.</p>
+
+<h2 id="column-store">Column Store</h2>
+
+<p>Let’s now take a look at the <em>column store</em> format. The figure below depicts a column store layout for our toy example.</p>
+
+<p><img src="/assets/storage-format-column-store.jpg" alt="Column Store" /></p>
+
+<p>In the column store format, the values from the same column are stored together. The order in which the values from a column are stored remains the same for all the columns. Notice the ID values 101, 102, 103 are stored contiguously. The corresponding Age fields 30, 25 and 35 are stored contiguously and likewise for the Name column.</p>
+
+<h2 id="compression">Compression</h2>
+
+<p>Compression is a standard technique to reduce the storage footprint. There is a large body of work on various compression techniques, which we won’t cover here. Let’s look at how compression can be applied in our toy example. If we look at the ID column, it has three values 101, 102 and 103. If they are stored as regular integers, each of them will occupy 4 bytes of memory. Interestingly, we can reduce the memory consumed in storing these three values. Observe that these three values are very close to 100. If we remember the base value 100 and only record the difference of each value from this base value (of 100), then we only need to store the values 1, 2, and 3. To store these three <em>deltas</em>, we don’t even need a 4 byte integer, a 1-byte value is enough.</p>
+
+<p>Putting it all together, the picture below shows the compressed column store format in which the compression is applied on the ID column.</p>
+
+<p><img src="/assets/storage-format-compressed-column-store.jpg" alt="Column Store with Compression" /></p>
+
+<p>Observe that the memory occupied by the three tuples together is only 43 bytes, compared to 48 bytes in the row store and the column store formats. As the number of tuples increase, there may be more opportunities for compression and thus more memory savings.</p>
+
+<p>There are some aspects related to storage management which deserve a blog post of their own. This includes the block-based storage design, and the impact of the above storage formats on various analytical queries. I hope to write more blog posts to cover these topics in the future.</p>
+
+<h1 id="creating-tables-with-various-storage-formats-in-quickstep">Creating tables with various storage formats in Quickstep</h1>
+
+<p>Now I will show you how to use the above storage formats to create tables in Quickstep. The storage format specification is part of the <code class="highlighter-rouge">CREATE TABLE SQL</code> command. We will continue with our toy example above to illustrate the various ways in which we can use the storage formats in Quickstep.</p>
+
+<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">employee</span> <span class="p">(</span>
+<span class="n">ID</span> <span class="n">INT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span> 
+<span class="n">Age</span> <span class="n">INT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span> 
+<span class="n">Name</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">8</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span>
+<span class="p">)</span> <span class="k">WITH</span> <span class="n">BLOCKPROPERTIES</span> <span class="p">(</span>
+  <span class="k">TYPE</span> <span class="n">split_rowstore</span><span class="p">,</span>
+  <span class="n">BLOCKSIZEMB</span> <span class="mi">4</span><span class="p">);</span></code></pre></figure>
+
+<p>I will now describe some keywords used in the above SQL statement. The keyword <code class="highlighter-rouge">BLOCKPROPERTIES</code> refers to the storage properties of this table. The above command means that all the blocks in the employee table will use the row store format (the name <code class="highlighter-rouge">split_rowstore</code> means just simple row store), with each block sized to a maximum of 4 MB.</p>
+
+<p>In the example below, all the blocks in the employee table have compressed column store as their storage format. The values in each block are sorted by the <code class="highlighter-rouge">ID</code> values, and the compression is applied on the <code class="highlighter-rouge">ID</code> and <code class="highlighter-rouge">Name</code> columns.</p>
+
+<p>The <code class="highlighter-rouge">COMPRESS</code> keyword accepts one or more or <code class="highlighter-rouge">ALL</code> the columns from the table. Thus it can look like:</p>
+
+<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">employee</span> <span class="p">(</span>
+<span class="n">ID</span> <span class="n">INT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span> 
+<span class="n">Age</span> <span class="n">INT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span> 
+<span class="n">Name</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">8</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span>
+<span class="p">)</span> <span class="k">WITH</span> <span class="n">BLOCKPROPERTIES</span> <span class="p">(</span>
+  <span class="k">TYPE</span> <span class="n">compressed_columnstore</span><span class="p">,</span>
+  <span class="n">SORT</span> <span class="n">ID</span><span class="p">,</span>
+  <span class="n">COMPRESS</span> <span class="p">(</span><span class="n">ID</span><span class="p">,</span> <span class="n">Name</span><span class="p">),</span>
+  <span class="n">BLOCKSIZEMB</span> <span class="mi">4</span><span class="p">);</span></code></pre></figure>
+
+<p>Finally, the command below indicates that the compression is applied on all the columns.</p>
+
+<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">employee</span> <span class="p">(</span>
+<span class="n">ID</span> <span class="n">INT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span> 
+<span class="n">Age</span> <span class="n">INT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span> 
+<span class="n">Name</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">8</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span>
+<span class="p">)</span> <span class="k">WITH</span> <span class="n">BLOCKPROPERTIES</span> <span class="p">(</span>
+  <span class="k">TYPE</span> <span class="n">compressed_columnstore</span><span class="p">,</span>
+  <span class="n">SORT</span> <span class="n">ID</span><span class="p">,</span>
+  <span class="n">COMPRESS</span> <span class="k">ALL</span><span class="p">,</span>
+  <span class="n">BLOCKSIZEMB</span> <span class="mi">4</span><span class="p">);</span></code></pre></figure>
+
+<h1 id="implementation-details">Implementation Details</h1>
+
+<p>The above illustrations are meant to explain the various storage formats. In the actual implementation, in each storage block, there is a separate region to store the variable length attributes. For such attributes, in the row store implementation, we use a pointer (or offset) to point to the true location of the variable length attribute.</p>
+
+<p>In the current implementation, the compressed column store format requires that all the variable length attributes be compressed.</p>
+
+<p>For folks interested in looking at the source code for these storage formats, I  provide the links to relevant source code files below. Our code is well documented (doxygen) for the most part, so it should be easier to read.</p>
+
+<p><a href="https://github.com/apache/incubator-quickstep/blob/master/parser/ParseBlockProperties.hpp">Parsing the block properties</a></p>
+
+<p><a href="https://github.com/apache/incubator-quickstep/blob/master/storage/SplitRowStoreTupleStorageSubBlock.hpp">Row store implementation</a></p>
+
+<p><a href="https://github.com/apache/incubator-quickstep/blob/master/storage/BasicColumnStoreTupleStorageSubBlock.hpp">Basic column store implementation</a></p>
+
+<p><a href="https://github.com/apache/incubator-quickstep/blob/master/storage/CompressedColumnStoreTupleStorageSubBlock.hpp">Compressed column store implementation</a></p>
+
+<h1 id="conclusion">Conclusion</h1>
+
+<p>I hope this blog post was useful and gives you some idea about the various storage formats implemented in Quickstep. If you have questions, please shoot us an email on dev@quickstep.incubator.apache.org.</p>
+
+  </div>
+
+  
+</article>
+
+      </div>
+    </main>
+
+    
+<footer class="site-footer">
+
+  <div class="wrapper">
+    <div class="footer-col-wrapper">
+      <div class="footer-col footer-col-1">
+        <img src="/assets/incubator-logo.png" />
+        <ul class="contact-list">
+          <li>
+          <h3>
+            
+              Apache Quickstep (Incubating)
+            
+            </h3>
+            </li>
+            
+            <li><a href="mailto:dev@quickstep.incubator.apache.org">dev@quickstep.incubator.apache.org</a></li>
+            
+        </ul>
+        <p>Quickstep is a next-generation data processing platform designed  for high-performance analytical queries.
+</p>
+      </div>
+
+      <div class="footer-col footer-col-2">
+        <h3>Disclaimer</h3>
+        <small>Apache Quickstep is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.</small>
+      </div>
+    </div>
+
+  </div>
+
+</footer>
+
+  </body>
+
+</html>

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/content/index.html
----------------------------------------------------------------------
diff --git a/content/index.html b/content/index.html
index 80708d6..c79ad03 100644
--- a/content/index.html
+++ b/content/index.html
@@ -10,7 +10,7 @@
   <meta name="description" content="Quickstep is a next-generation data processing platform designed for high-performance analytical queries.">
 
   <link rel="stylesheet" href="/assets/main.css">
-  <link rel="canonical" href="http://localhost:4000/">
+  <link rel="canonical" href="http://quickstep.apache.org//">
   <link rel="alternate" type="application/rss+xml" title="Apache Quickstep (Incubating)" href="/feed.xml">
   
   
@@ -54,8 +54,6 @@
         
           
         
-          
-        
       </div>
     </nav>
 

http://git-wip-us.apache.org/repos/asf/incubator-quickstep-site/blob/725d8b94/content/release/index.html
----------------------------------------------------------------------
diff --git a/content/release/index.html b/content/release/index.html
index fc3b5bf..4510d35 100644
--- a/content/release/index.html
+++ b/content/release/index.html
@@ -10,7 +10,7 @@
   <meta name="description" content="Quickstep is a next-generation data processing platform designed for high-performance analytical queries.">
 
   <link rel="stylesheet" href="/assets/main.css">
-  <link rel="canonical" href="http://localhost:4000/release/">
+  <link rel="canonical" href="http://quickstep.apache.org//release/">
   <link rel="alternate" type="application/rss+xml" title="Apache Quickstep (Incubating)" href="/feed.xml">
   
   
@@ -54,8 +54,6 @@
         
           
         
-          
-        
       </div>
     </nav>
 


Mime
View raw message