drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tshi...@apache.org
Subject svn commit: r1651949 [13/13] - in /drill/site/trunk/content/drill: ./ blog/2014/11/19/sql-on-mongodb/ blog/2014/12/02/drill-top-level-project/ blog/2014/12/09/running-sql-queries-on-amazon-s3/ blog/2014/12/11/apache-drill-qa-panelist-spotlight/ blog/20...
Date Thu, 15 Jan 2015 05:11:48 GMT
Added: drill/site/trunk/content/drill/docs/using-jdbc-to-access-apache-drill-from-squirrel/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/using-jdbc-to-access-apache-drill-from-squirrel/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/using-jdbc-to-access-apache-drill-from-squirrel/index.html (added)
+++ drill/site/trunk/content/drill/docs/using-jdbc-to-access-apache-drill-from-squirrel/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,228 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Using JDBC to Access Apache Drill from SQuirreL - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Using JDBC to Access Apache Drill from SQuirreL</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>You can connect to Drill through a JDBC client tool, such as SQuirreL, on
+Windows, Linux, and Mac OS X systems, to access all of your data sources
+registered with Drill. An embedded JDBC driver is included with Drill.
+Configure the JDBC driver in the SQuirreL client to connect to Drill from
+SQuirreL. This document provides instruction for connecting to Drill from
+SQuirreL on Windows.</p>
+
+<p>To use the Drill JDBC driver with SQuirreL on Windows, complete the following
+steps:</p>
+
+<ul>
+<li>Step 1: Getting the Drill JDBC Driver </li>
+<li>Step 2: Installing and Starting SQuirreL</li>
+<li>Step 3: Adding the Drill JDBC Driver to SQuirreL</li>
+<li>Step 4: Running a Drill Query from SQuirreL</li>
+</ul>
+
+<p>For information about how to use SQuirreL, refer to the <a href="http://squirrel-sql.sourceforge.net/user-manual/quick_start.html">SQuirreL Quick
+Start</a>
+guide.</p>
+
+<h3 id="prerequisites">Prerequisites</h3>
+
+<ul>
+<li>SQuirreL requires JRE 7</li>
+<li>Drill installed in distributed mode on one or multiple nodes in a cluster. Refer to the <a href="https://cwiki.apache.org/confluence/display/DRILL/Install+Drill">Install Drill</a> documentation for more information.</li>
+<li><p>The client must be able to resolve the actual hostname of the Drill node(s) with the IP(s). Verify that a DNS entry was created on the client machine for the Drill node(s).<br>
+If a DNS entry does not exist, create the entry for the Drill node(s).</p>
+
+<ul>
+<li>For Windows, create the entry in the %WINDIR%\system32\drivers\etc\hosts file.</li>
+<li>For Linux and Mac, create the entry in /etc/hosts.<br>
+<drill-machine-IP> <drill-machine-hostname><br>
+Example: <code>127.0.1.1 maprdemo</code></li>
+</ul></li>
+</ul>
+
+<h2 id="step-1:-getting-the-drill-jdbc-driver">Step 1: Getting the Drill JDBC Driver</h2>
+
+<p>The Drill JDBC Driver <code>JAR</code> file must exist in a directory on your Windows
+machine in order to configure the driver in the SQuirreL client.</p>
+
+<p>You can copy the Drill JDBC <code>JAR</code> file from the following Drill installation
+directory on the node with Drill installed, to a directory on your Windows
+machine:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">&lt;drill_installation_directory&gt;/jars/jdbc-driver/drill-jdbc-all-0.7.0-SNAPSHOT.jar
+</code></pre></div>
+<p>Or, you can download the <a href="http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0%0A/apache-drill-0.7.0.tar.gz">apache-
+drill-0.7.0.tar.gz</a> file to a location on your Windows machine, and
+extract the contents of the file. You may need to use a decompression utility,
+such as <a href="http://www.7-zip.org/">7-zip</a> to extract the archive. Once extracted,
+you can locate the driver in the following directory:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">&lt;windows_directory&gt;\apache-drill-&lt;version&gt;\jars\jdbc-driver\drill-jdbc-all-0.7.0-SNAPSHOT.jar
+</code></pre></div>
+<h2 id="step-2:-installing-and-starting-squirrel">Step 2: Installing and Starting SQuirreL</h2>
+
+<p>To install and start SQuirreL, complete the following steps:</p>
+
+<ol>
+<li><p>Download the SQuirreL JAR file for Windows from the following location:<br>
+<a href="http://www.squirrelsql.org/#installation">http://www.squirrelsql.org/#installation</a></p></li>
+<li><p>Double-click the SQuirreL <code>JAR</code> file. The SQuirreL installation wizard walks you through the installation process.</p></li>
+<li><p>When installation completes, navigate to the SQuirreL installation folder and then double-click <code>squirrel-sql.bat</code> to start SQuirreL.</p></li>
+</ol>
+
+<h2 id="step-3:-adding-the-drill-jdbc-driver-to-squirrel">Step 3: Adding the Drill JDBC Driver to SQuirreL</h2>
+
+<p>To add the Drill JDBC Driver to SQuirreL, define the driver and create a
+database alias. The alias is a specific instance of the driver configuration.
+SQuirreL uses the driver definition and alias to connect to Drill so you can
+access data sources that you have registered with Drill.</p>
+
+<h3 id="a.-define-the-driver">A. Define the Driver</h3>
+
+<p>To define the Drill JDBC Driver, complete the following steps:</p>
+
+<ol>
+<li>In the SQuirreL toolbar, select <strong>Drivers &gt; New Driver</strong>. The Add Driver dialog box appears.</li>
+</ol>
+
+<p><img src="../../../img/40.png" alt=""></p>
+
+<ol>
+<li><p>Enter the following information:</p>
+
+<p><table class="confluenceTable"><tbody><tr><td valign="top"><p><strong>Option</strong></p></td><td valign="top"><p><strong>Description</strong></p></td></tr><tr><td valign="top"><p>Name</p></td><td valign="top"><p>Name for the Drill JDBC Driver</p></td></tr><tr><td valign="top"><p>Example URL</p></td><td valign="top"><p><code>jdbc:drill:zk=&lt;<em>zookeeper_quorum</em>&gt;[;schema=&lt;<em>schema_to_use_as_default</em>&gt;]</code></p><p><strong>Example:</strong><code> jdbc:drill:zk=maprdemo:5181</code></p><p><strong>Note:</strong> The default ZooKeeper port is 2181. In a MapR cluster, the ZooKeeper port is 5181.</p></td></tr><tr><td valign="top"><p>Website URL</p></td><td valign="top"><p><code>jdbc:drill:zk=&lt;<em>zookeeper_quorum</em>&gt;[;schema=&lt;<em>schema_to_use_as_default</em>&gt;]</code></p><p><strong>Example:</strong><code><code> jdbc:drill:zk=maprdemo:5181</code></code></p><p><strong>Note:</strong><span> The default ZooKeeper port is 2181. In a MapR cluster, the ZooKeep
 er port is 5181.</span></p></td></tr><tr><td valign="top"><p>Extra Class Path</p></td><td valign="top"><p>Click <strong>Add</strong> and navigate to the JDBC <code>JAR</code> file location in the Windows directory:<br /><code>&lt;windows_directory&gt;/jars/jdbc-driver/<span style="color: rgb(34,34,34);">drill-jdbc-all-0.6.0-</span><span style="color: rgb(34,34,34);">incubating.jar</span></code></p><p>Select the <code>JAR</code> file, click <strong>Open</strong>, and then click <strong>List Drivers</strong>.</p></td></tr><tr><td valign="top"><p>Class Name</p></td><td valign="top"><p>Select <code>org.apache.drill.jdbc.Driver</code> from the drop-down menu.</p></td></tr></tbody></table>  </p></li>
+<li><p>Click <strong>OK</strong>. The SQuirreL client displays a message stating that the driver registration is successful, and you can see the driver in the Drivers panel.  </p>
+
+<p><img src="../../../img/52.png" alt=""></p></li>
+</ol>
+
+<h3 id="b.-create-an-alias">B. Create an Alias</h3>
+
+<p>To create an alias, complete the following steps:</p>
+
+<ol>
+<li>Select the <strong>Aliases</strong> tab.</li>
+<li><p>In the SQuirreL toolbar, select <strong>Aliases &gt;</strong><strong>New Alias</strong>. The Add Alias dialog box appears.</p>
+
+<p><img src="../../../img/19.png" alt=""></p></li>
+<li><p>Enter the following information:</p>
+
+<p><table class="confluenceTable"><tbody><tr><td valign="top"><p><strong>Option</strong></p></td><td valign="top"><p><strong>Description</strong></p></td></tr><tr><td valign="top"><p>Alias Name</p></td><td valign="top"><p>A unique name for the Drill JDBC Driver alias.</p></td></tr><tr><td valign="top"><p>Driver</p></td><td valign="top"><p>Select the Drill JDBC Driver.</p></td></tr><tr><td valign="top"><p>URL</p></td><td valign="top"><p>Enter the connection URL with <span>the name of the Drill directory stored in ZooKeeper and the cluster ID:</span></p><p><code>jdbc:drill:zk=&lt;<em>zookeeper_quorum</em>&gt;/&lt;drill_directory_in_zookeeper&gt;/&lt;cluster_ID&gt;;schema=&lt;<em>schema_to_use_as_default</em>&gt;</code></p><p><strong>The following examples show URLs for Drill installed on a single node:</strong><br /><span style="font-family: monospace;font-size: 14.0px;line-height: 1.4285715;background-color: transparent;">jdbc:drill:zk=10.10.100.56:5181/drill/demo_mapr_com-drillbits;
 schema=hive<br /></span><span style="font-family: monospace;font-size: 14.0px;line-height: 1.4285715;background-color: transparent;">jdbc:drill:zk=10.10.100.24:2181/drill/drillbits1;schema=hive<br /> </span></p><div><strong>The following example shows a URL for Drill installed in distributed mode with a connection to a ZooKeeper quorum:</strong></div><div><span style="font-family: monospace;font-size: 14.0px;line-height: 1.4285715;background-color: transparent;">jdbc:drill:zk=10.10.100.30:5181,10.10.100.31:5181,10.10.100.32:5181/drill/drillbits1;schema=hive</span></div>    <div class="aui-message warning shadowed information-macro">
+                        <span class="aui-icon icon-warning"></span>
+            <div class="message-content">
+                        <ul><li style="list-style-type: none;background-image: none;"><ul><li>Including a default schema is optional.</li><li>The ZooKeeper port is 2181. In a MapR cluster, the ZooKeeper port is 5181.</li><li>The Drill directory stored in ZooKeeper is <code>/drill</code>. </li><li>The Drill default cluster ID is<code> drillbits1</code>.</li></ul></li></ul>
+                </div>
+</div>
+</td></tr><tr><td valign="top"><p>User Name</p></td><td valign="top"><p>admin</p></td></tr><tr><td valign="top"><p>Password</p></td><td valign="top"><p>admin</p></td></tr></tbody></table></p></li>
+<li><p>Click **Ok. **The Connect to: dialog box appears.  </p>
+
+<p><img src="../../../img/30.png?version=1&amp;modificationDate=1410385290359&amp;api=v2" alt=""></p></li>
+<li><p>Click <strong>Connect.</strong> SQuirreL displays a message stating that the connection is successful.<br>
+<img src="../../../img/53.png?version=1&amp;modificationDate=1410385313418&amp;api=v2" alt=""></p></li>
+<li><p>Click <strong>OK</strong>. SQuirreL displays a series of tabs.</p></li>
+</ol>
+
+<h2 id="step-4:-running-a-drill-query-from-squirrel">Step 4: Running a Drill Query from SQuirreL</h2>
+
+<p>Once you have SQuirreL successfully connected to your cluster through the
+Drill JDBC Driver, you can issue queries from the SQuirreL client. You can run
+a test query on some sample data included in the Drill installation to try out
+SQuirreL with Drill.</p>
+
+<p>To query sample data with Squirrel, complete the following steps:</p>
+
+<ol>
+<li>Click the <img src="http://doc.mapr.com/download/attachments/26986731/image2014-9-10%2014%3A43%3A14.png?version=1&amp;modificationDate=1410385394576&amp;api=v2" alt=""> tab.</li>
+<li><p>Enter the following query in the query box:<br>
+<code>SELECT * FROM cp.`employee.json`;</code><br>
+Example:<br>
+<img src="../../../img/11.png?version=1&amp;modificationDate=1410385451811&amp;api=v2" alt=""></p></li>
+<li><p>Press <strong>Ctrl+Enter</strong> to run the query. The following query results display:<br>
+<img src="../../../img/42.png?version=1&amp;modificationDate=1410385482574&amp;api=v2" alt=""></p></li>
+</ol>
+
+<p>You have successfully run a Drill query from the SQuirreL client.</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Added: drill/site/trunk/content/drill/docs/using-odbc-to-access-apache-drill-from-bi-tools/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/using-odbc-to-access-apache-drill-from-bi-tools/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/using-odbc-to-access-apache-drill-from-bi-tools/index.html (added)
+++ drill/site/trunk/content/drill/docs/using-odbc-to-access-apache-drill-from-bi-tools/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,112 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Using ODBC to Access Apache Drill from BI Tools - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Using ODBC to Access Apache Drill from BI Tools</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>MapR provides ODBC drivers for Windows, Mac OS X, and Linux. It is recommended
+that you install the latest version of Apache Drill with the latest version of
+the Drill ODBC driver.</p>
+
+<p>For example, if you have Apache Drill 0.5 and a Drill ODBC driver installed on
+your machine, and then you upgrade to Apache Drill 0.6, do not assume that the
+Drill ODBC driver installed on your machine will work with the new version of
+Apache Drill. Install the latest available Drill ODBC driver to ensure that
+the two components work together.</p>
+
+<p>You can access the latest Drill ODBC drivers in the following location:</p>
+
+<p><code>&lt;http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc/&gt;</code></p>
+
+<p>Refer to the following documents for driver installation and configuration
+information, as well as examples for connecting to BI tools:</p>
+
+<ul>
+<li><a href="/confluence/display/DRILL/Using+the+MapR+ODBC+Driver+on+Windows">Using the MapR ODBC Driver on Windows</a></li>
+<li><a href="/confluence/display/DRILL/Using+the+MapR+Drill+ODBC+Driver+on+Linux+and+Mac+OS+X">Using the MapR Drill ODBC Driver on Linux and Mac OS X</a></li>
+</ul>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Added: drill/site/trunk/content/drill/docs/value-vectors/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/value-vectors/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/value-vectors/index.html (added)
+++ drill/site/trunk/content/drill/docs/value-vectors/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,260 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Value Vectors - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Value Vectors</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>This document defines the data structures required for passing sequences of
+columnar data between <a href="https://docs.google.com/a/maprtech.com/docum%0Aent/d/1zaxkcrK9mYyfpGwX1kAV80z0PCi8abefL45zOzb97dI/edit#bookmark=id.iip15ful18%0Amm">Operators</a>.</p>
+
+<h1 id="goals">Goals</h1>
+
+<h4 id="support-operators-written-in-multiple-language">Support Operators Written in Multiple Language</h4>
+
+<p>ValueVectors should support operators written in C/C++/Assembly. To support
+this, the underlying ByteBuffer will not require modification when passed
+through the JNI interface. The ValueVector will be considered immutable once
+constructed. Endianness has not yet been considered.</p>
+
+<h4 id="access">Access</h4>
+
+<p>Reading a random element from a ValueVector must be a constant time operation.
+To accomodate, elements are identified by their offset from the start of the
+buffer. Repeated, nullable and variable width ValueVectors utilize in an
+additional fixed width value vector to index each element. Write access is not
+supported once the ValueVector has been constructed by the RecordBatch.</p>
+
+<h4 id="efficient-subsets-of-value-vectors">Efficient Subsets of Value Vectors</h4>
+
+<p>When an operator returns a subset of values from a ValueVector, it should
+reuse the original ValueVector. To accomplish this, a level of indirection is
+introduced to skip over certain values in the vector. This level of
+indirection is a sequence of offsets which reference an offset in the original
+ValueVector and the count of subsequent values which are to be included in the
+subset.</p>
+
+<h4 id="pooled-allocation">Pooled Allocation</h4>
+
+<p>ValueVectors utilize one or more buffers under the covers. These buffers will
+be drawn from a pool. Value vectors are themselves created and destroyed as a
+schema changes during the course of record iteration.</p>
+
+<h4 id="homogenous-value-types">Homogenous Value Types</h4>
+
+<p>Each value in a Value Vector is of the same type. The <a href="https://d%0Aocs.google.com/a/maprtech.com/document/d/1zaxkcrK9mYyfpGwX1kAV80z0PCi8abefL45z%0AOzb97dI/edit#bookmark=kix.s2xuoqnr8obe">Record Batch</a> implementation is responsible for
+creating a new Value Vector any time there is a change in schema.</p>
+
+<h1 id="definitions">Definitions</h1>
+
+<p>Data Types</p>
+
+<p>The canonical source for value type definitions is the <a href="http://bit.ly/15JO9bC">Drill
+Datatypes</a> document. The individual types are listed
+under the ‘Basic Data Types’ tab, while the value vector types can be found
+under the ‘Value Vectors’ tab.</p>
+
+<p>Operators</p>
+
+<p>An operator is responsible for transforming a stream of fields. It operates on
+Record Batches or constant values.</p>
+
+<p>Record Batch</p>
+
+<p>A set of field values for some range of records. The batch may be composed of
+Value Vectors, in which case each batch consists of exactly one schema.</p>
+
+<p>Value Vector</p>
+
+<p>The value vector is comprised of one or more contiguous buffers; one which
+stores a sequence of values, and zero or more which store any metadata
+associated with the ValueVector.</p>
+
+<h1 id="data-structure">Data Structure</h1>
+
+<p>A ValueVector stores values in a ByteBuf, which is a contiguous region of
+memory. Additional levels of indirection are used to support variable value
+widths, nullable values, repeated values and selection vectors. These levels
+of indirection are primarily lookup tables which consist of one or more fixed
+width ValueVectors which may be combined (e.g. for nullable, variable width
+values). A fixed width ValueVector of non-nullable, non-repeatable values does
+not require an indirect lookup; elements can be accessed directly by
+multiplying position by stride.</p>
+
+<p>Fixed Width Values</p>
+
+<p>Fixed width ValueVectors simply contain a packed sequence of values. Random
+access is supported by accessing element n at ByteBuf[0] + Index * Stride,
+where Index is 0-based. The following illustrates the underlying buffer of
+INT4 values [1 .. 6]:</p>
+
+<p><img src="../../img/value1.png" alt="image">
+&lt;!--<a href="https://lh5.googleusercontent.com/iobQUgeF4dyrWFeqVfhIBZKbkjrLk5sBJqYhWdzm">https://lh5.googleusercontent.com/iobQUgeF4dyrWFeqVfhIBZKbkjrLk5sBJqYhWdzm</a>
+IyMmmcX1pzZaeQiKZ5OzYeafxcY5IZHXDKuG_JkPwJrjxeLJITpXBbn7r5ep1V07a3JBQC0cJg4qKf
+VhzPZ0PDeh--&gt;</p>
+
+<p>Nullable Values</p>
+
+<p>Nullable values are represented by a vector of bit values. Each bit in the
+vector corresponds to an element in the ValueVector. If the bit is not set,
+the value is NULL. Otherwise the value is retrieved from the underlying
+buffer. The following illustrates a NullableValueVector of INT4 values 2, 3
+and 6:</p>
+
+<p><img src="../../img/value2.png" alt=""></p>
+
+<!--![](https://lh5.googleusercontent.com/3M19t18av5cuXflB3WYHS0OJBaO-zFHD8TcNaKF0
+ua6g9h_LPnBijkGavCCwDDsbQzSoT5Glj1dgIwfhzK_xFPjPzc3w5O2NaVrbvEQgFhuOpK3yEr-
+nSyMocEjRuhGB)-->
+
+<h4 id="repeated-values">Repeated Values</h4>
+
+<p>A repeated ValueVector is used for elements which can contain multiple values
+(e.g. a JSON array). A table of offset and count pairs is used to represent
+each repeated element in the ValueVector. A count of zero means the element
+has no values (note the offset field is unused in this case). The following
+illustrates three fields; one with two values, one with no values, and one
+with a single value:</p>
+
+<p><img src="../../img/value3.png" alt="">
+&lt;!--<img src="https://lh6.googleusercontent.com/nFIJjIOPAl9zXttVURgp-xkW8v6z6F7ikN7sMREm%0A58pdtfTlwdfjEUH4CHxknHexGdIeEhPHbMMzAgqMwnL99IZlR_YzAWvJaiStOO4QMtML8zLuwLvFDr%0AhJKLMNc0zg" alt="">--&gt;</p>
+
+<p>ValueVector Representation of the equivalent JSON:</p>
+
+<p>x:[1, 2]</p>
+
+<p>x:[ ]</p>
+
+<p>x:[3]</p>
+
+<p>Variable Width Values</p>
+
+<p>Variable width values are stored contiguously in a ByteBuf. Each element is
+represented by an entry in a fixed width ValueVector of offsets. The length of
+an entry is deduced by subtracting the offset of the following field. Because
+of this, the offset table will always contain one more entry than total
+elements, with the last entry pointing to the end of the buffer.</p>
+
+<p><img src="../../img/value4.png" alt="">
+&lt;!--<img src="https://lh5.googleusercontent.com/ZxAfkmCVRJsKgLYO0pLbRM-%0AaEjR2yyNZWfYkFSmlsod8GnM3huKHQuc6Do-Bp4U1wK-%0AhF3e6vGHTiGPqhEc25YEHEuVTNqb1sBj0LdVrOlvGBzL8nywQbn8O1RlN-vrw" alt="">--&gt;</p>
+
+<p>Repeated Map Vectors</p>
+
+<p>A repeated map vector contains one or more maps (akin to an array of objects
+in JSON). The values of each field in the map are stored contiguously within a
+ByteBuf. To access a specific record, a lookup table of count and offset pairs
+is used. This lookup table points to the first repeated field in each column,
+while the count indicates the maximum number of elements for the column. The
+following example illustrates a RepeatedMap with two records; one with two
+objects, and one with a single object:</p>
+
+<p><img src="../../img/value5.png" alt="">
+&lt;!--<img src="https://lh3.googleusercontent.com%0A/l8yo_z_MbBz9C3OoGQEy1bNOrmnNbo2e0XtCUDRbdRR4mbCYK8h-%0ALz7_VlhDtbTkPQziwwyNpw3ylfEKjMKtj-D0pUah4arohs1hcnHrzoFfE-QZRwUdQmEReMdpSgIT" alt="">--&gt;</p>
+
+<p>ValueVector representation of the equivalent JSON:</p>
+
+<p>x: [ {name:’Sam’, age:1}, {name:’Max’, age:2} ]</p>
+
+<p>x: [ {name:’Joe’, age:3} ]</p>
+
+<p>Selection Vectors</p>
+
+<p>A Selection Vector represents a subset of a ValueVector. It is implemented
+with a list of offsets which identify each element in the ValueVector to be
+included in the SelectionVector. In the case of a fixed width ValueVector, the
+offsets reference the underlying ByteBuf. In the case of a nullable, repeated
+or variable width ValueVector, the offset references the corresponding lookup
+table. The following illustrates a SelectionVector of INT4 (fixed width)
+values 2, 3 and 5 from the original vector of [1 .. 6]:</p>
+
+<p><img src="../../img/value6.png" alt="">
+&lt;!--<img src="https://lh5.googleusercontent.com/-hLlAaq9n-Q0_fZ_MKk3yFpXWZO7JOJLm-%0ANDh_a_x2Ir5BhZDrZX0t-6e_w3K7R4gfgQIsv-sPxryTUzrJRszNpA3pEEn5V5uRCAlMtHejTpcu-%0A_QFPfSTzzpdsf88OS" alt="">--&gt;</p>
+
+<p>The following illustrates the same ValueVector with nullable fields:</p>
+
+<p><img src="../../img/value7.png" alt="">
+&lt;!--<img src="https://lh3.googleusercontent.com%0A/cJxo5H_nsWWlKFUFxjOHHC6YI4sPyG5Fjj1gbdAT2AEo-c6cdkZelso6rYeZV4leMWMfbei_-%0ArncjasvR9u4MUXgkpFpM22CUSnnkVX6ynpkcLW1Q-s5F2NgqCez1Fa_" alt="">--&gt;</p>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Added: drill/site/trunk/content/drill/docs/wikipedia-edit-history/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/wikipedia-edit-history/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/wikipedia-edit-history/index.html (added)
+++ drill/site/trunk/content/drill/docs/wikipedia-edit-history/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,195 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Wikipedia Edit History - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Wikipedia Edit History</h1>
+
+</div>
+
+<div class="int_text" align="left"><h1 id="quick-stats">Quick Stats</h1>
+
+<p>The Wikipedia Edit History is a public dump of the website made available by
+the wikipedia foundation. You can find details
+<a href="http://en.wikipedia.org/wiki/Wikipedia:Database_download">here</a>. The dumps
+are made available as SQL or XML dumps. You can find the entire schema drawn
+together in this great <a href="http://upload.wikimedia.org/wikipedia/commons%0A/thumb/4/42/MediaWiki_1.20_%2844edaa2%29_database_schema.svg/2193px-%0AMediaWiki_1.20_%2844edaa2%29_database_schema.svg.png">diagram</a>.</p>
+
+<h1 id="approach">Approach</h1>
+
+<p>The <em>main</em> distribution files are:</p>
+
+<ul>
+<li>Current Pages: As of January 2013 this SQL dump was 9.0GB in its compressed format.</li>
+<li>Complere Archive: This is what we actually want, but at a size of multiple terrabytes, clearly exceeds the storage available at home.</li>
+</ul>
+
+<p>To have some real historic data, it is recommended to download a <em>Special
+Export</em> use this
+<a href="http://en.wikipedia.org/w/index.php?title=Special:Export">link</a>. Using this
+tool you generate a category specific XML dump and configure various export
+options. There are some limits like a maximum of 1000 revisions per export,
+but otherwise this should work out just fine.</p>
+
+<p><img src="../../img/Overview.png" alt=""></p>
+
+<p>The entities used in the query use cases.</p>
+
+<h1 id="use-cases">Use Cases</h1>
+
+<h2 id="select-change-volume-based-on-time">Select Change Volume Based on Time</h2>
+
+<p><strong>Query</strong></p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">select rev.::parent.title, rev.::parent.id, sum(rev.text.bytes)
+from mediawiki.page.revision as rev
+where rev.timestamp.between(?, ?) 
+group by rev.::parent;
+</code></pre></div>
+<p><em>Explanation</em>: This is my attempt in mixing records and structures. The <code>from</code>
+statement refers to <code>mediawiki</code> as a record type / row, but also mixes in
+structural information, i.e. <code>page.revision</code>, internal to the record. The
+query now uses <code>page.revision</code> as base to all other statements, in this case
+the <code>select</code>, <code>where</code> and the <code>goup by</code>. The <code>where</code> statement again uses a
+JSON like expression to state, that the timestamp must be between two values,
+paramaeters are written as question marks, similar to JDBC. The <code>group by</code>
+statement instructs the query to aggregate results based on the parent of a
+<code>revision</code>, in this case a <code>page</code>. The <code>::parent</code> syntax is borrowed from
+XPath. As we are aggregating on <code>page</code> it is safe to select the <code>title</code> and
+<code>id</code> from the element in the <code>select</code>. We also use an aggregation function to
+add the number of bytes changed in the given time frame, this should be self
+explanatory.</p>
+
+<p><em>Discussion</em>:</p>
+
+<ul>
+<li>I am not very satisfied using the <code>::</code> syntax, as it is <em>ugly</em>. We probably wont need that many axis specifiers, e.g. we dont need any attribute specifiers, but for now, I could not think of anything better,</li>
+<li>Using an <code>as</code> expression in the <code>from</code> statement is optional, you would simply have to replace all references to <code>rev</code> with <code>revision</code>.</li>
+<li>I am not sure if this is desired, but you cannot see on first glance, where the <em>hierarchical</em> stuff starts. This may be confusing to a RDBMS purist, at least it was for me at the beginning. But now I think this strikes the right mix between verbosity and elegance.</li>
+<li>I assume we would need some good indexing, but this should be achievable. We would need to translate the relative index <code>rev.timestamp</code> to an record absolute index <code>$.mediawiki.page.revision.timestamp</code> . Unclear to me now is whether the index would point to the record, or would it point to some kind of record substructure?</li>
+</ul>
+
+<h2 id="select-change-volume-aggregated-on-time">Select Change Volume Aggregated on Time</h2>
+
+<p><strong>Query</strong></p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">select rev.::parent.title, rev.::parent.id, sum(rev.text.bytes), rev.timestamp.monthYear()
+from mediawiki.page.revision as rev
+where rev.timestamp.between(?, ?) 
+group by rev.::parent, rev.timestamp.monthYear()
+order by rev.::parent.id, rev.timestamp.monthYear();
+</code></pre></div>
+<p><em>Explanation</em>: This is refinement of the previous query. In this case we are
+again returning a flat list, but are using an additional scalar result and
+<code>group</code> statement. In the previous example we were returning one result per
+found page, now we are returning one result per page and month of changes.
+<code>Order by</code> is nothing special, in this case.</p>
+
+<p><em>Discussion</em>:</p>
+
+<ul>
+<li>I always considered mySQL confusing using implicit group by statements, as I prefer fail fast mechanisms. Hence I would opt for explicit <code>group by</code> operators.</li>
+<li>I would not provide implicit nodes into the records, i.e. if you want some attribute of a timestamp, call a function and not expect an automatically added element. So we want <code>rev.timestamp.monthYear()</code> and not <code>rev.timestamp.monthYear</code>. This may be quite confusing, especially if we have heterogenous record structures. We might even go ahead and support namespaces for custom, experimental features like <code>rev.timestamp.custom.maya:doomsDay()</code>.</li>
+</ul>
+
+<h2 id="select-change-volume-based-on-contributor">Select Change Volume Based on Contributor</h2>
+
+<p><strong>Query</strong></p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">select ctrbr.username, ctbr.ip, ctbr.userid, sum(ctbr::parent.bytes) as bytesContributed
+from mediawiki.page..contributor as ctbr
+group by ctbr.canonize()
+order by bytesContributed;
+</code></pre></div>
+<p><em>Explanation</em>: This query looks quite similar to the previous queries, but I
+added this one nonetheless, as it hints on an aggregation which may spawn
+multiple records. The previous examples were based on pages, which are unique
+to a record, where as the contributor may appear many times in many different
+records.</p>
+
+<p><em>Discussion</em>:</p>
+
+<ul>
+<li>I have added the <code>..</code> operator in this example. Besides of being syntactic sugar, it also allows us to search for <code>revision</code> and <code>upload</code> which are both children of <code>page</code> and may both have a <code>contributor</code>. The more RBMS like alternative would be a <code>union</code>, but this was not natural enough.</li>
+<li>I am sure the <code>ctbr.canonize()</code> will cause lots of discussions :-). The thing is, that a contributor may repeat itself in many different records, and we dont really have an id. If you look at the wikimedia XSD, all three attributes are optional, and the data says the same, so we cannot just simply say <code>ctbr.userid</code>. Hence the canonize function should create a scalar value containing all available information of the node in a canonical form.</li>
+<li>Last but not least, I always hated, that mySQL would not be able to reuse column definitions from the <code>select</code> statement in the <code>order</code> statements. So I added on my wishlist, that the <code>bytesContributed</code> definition is reusable.</li>
+</ul>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Added: drill/site/trunk/content/drill/docs/workspaces/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/workspaces/index.html?rev=1651949&view=auto
==============================================================================
--- drill/site/trunk/content/drill/docs/workspaces/index.html (added)
+++ drill/site/trunk/content/drill/docs/workspaces/index.html Thu Jan 15 05:11:44 2015
@@ -0,0 +1,166 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Workspaces - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="int_title">
+<h1>Workspaces</h1>
+
+</div>
+
+<div class="int_text" align="left"><p>When you register an instance of a file system data source, you can configure
+one or more workspaces for the instance. A workspace is a directory within the
+file system that you define. Drill searches the workspace to locate data when
+you run a query.</p>
+
+<p>Each workspace that you register defines a schema that you can connect to and
+query. Configuring workspaces is useful when you want to run multiple queries
+on files or tables in a specific directory. You cannot create workspaces for
+<code>hive</code> and <code>hbase</code> instances, though Hive databases show up as workspaces in
+Drill.</p>
+
+<p>The following example shows an instance of a file type storage plugin with a
+workspace named <code>json</code> configured to point Drill to the
+<code>/users/max/drill/json/</code> directory in the local file system <code>(dfs)</code>:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">{
+  &quot;type&quot; : &quot;file&quot;,
+  &quot;enabled&quot; : true,
+  &quot;connection&quot; : &quot;file:///&quot;,
+  &quot;workspaces&quot; : {
+    &quot;json&quot; : {
+      &quot;location&quot; : &quot;/users/max/drill/json/&quot;,
+      &quot;writable&quot; : false,
+      &quot;storageformat&quot; : json
+   } 
+},
+</code></pre></div>
+<p><strong>Note:</strong> The <code>connection</code> parameter in the configuration above is &quot;<code>file:///</code>&quot;, connecting Drill to the local file system (<code>dfs</code>). To connect to a Hadoop or MapR file system the <code>connection</code> parameter would be &quot;<code>hdfs:///&quot;</code>or<code>&quot;maprfs:///&quot;,</code>respectively.</p>
+
+<p>To query a file in the example <code>json</code> workspace, you can issue the <code>USE</code>
+command to tell Drill to use the <code>json</code> workspace configured in the <code>dfs</code>
+instance for each query that you issue:</p>
+
+<p><strong>Example</strong></p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">USE dfs.json;
+SELECT * FROM dfs.json.`donuts.json` WHERE type=&#39;frosted&#39;
+</code></pre></div>
+<p>If the <code>json</code>workspace did not exist, the query would have to include the
+full path to the <code>donuts.json</code> file:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT * FROM dfs.`/users/max/drill/json/donuts.json` WHERE type=&#39;frosted&#39;;
+</code></pre></div>
+<p>Using a workspace alleviates the need to repeatedly enter the directory path
+in subsequent queries on the directory.</p>
+
+<h3 id="default-workspaces">Default Workspaces</h3>
+
+<p>Each <code>file</code> and <code>hive</code> instance includes a <code>default</code> workspace. The <code>default</code>
+workspace points to the file system or to the Hive metastore. When you query
+files and tables in the<code>file</code> or <code>hive default</code> workspaces, you can omit the
+workspace name from the query.</p>
+
+<p>For example, you can issue a query on a Hive table in the <code>default workspace</code>
+using either of the following formats and get the the same results:</p>
+
+<p><strong>Example</strong></p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT * FROM hive.customers LIMIT 10;
+SELECT * FROM hive.`default`.customers LIMIT 10;
+</code></pre></div>
+<p><strong>Note:</strong> Default is a reserved word. You must enclose reserved words in back ticks.</p>
+
+<p>Because HBase instances do not have workspaces, you can use the following
+format to query a table in HBase:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SELECT * FROM hbase.customers LIMIT 10;
+</code></pre></div>
+<p>After you register a data source as a storage plugin instance with Drill, and
+optionally configure workspaces, you can query the data source.</p>
+
+<p>Click any of the following links to learn how to register a data source as a
+storage plugin instance:</p>
+
+<ul>
+<li><a href="/confluence/display/DRILL/Registering+a+File+System">Registering a File System</a></li>
+<li><a href="/confluence/display/DRILL/Registering+HBase">Registering HBase</a></li>
+<li><a href="/confluence/display/DRILL/Registering+Hive">Registering Hive</a></li>
+<li><a href="/confluence/display/DRILL/Drill+Default+Input+Format">Drill Default Input Format</a></li>
+</ul>
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Modified: drill/site/trunk/content/drill/faq/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/faq/index.html?rev=1651949&r1=1651948&r2=1651949&view=diff
==============================================================================
--- drill/site/trunk/content/drill/faq/index.html (original)
+++ drill/site/trunk/content/drill/faq/index.html Thu Jan 15 05:11:44 2015
@@ -67,7 +67,7 @@
 
 </div>
 
-<div class="int_text" align="left"><h2>What use cases should I consider using Drill for?</h2>
+<div class="int_text" align="left"><h2 id="what-use-cases-should-i-consider-using-drill-for?">What use cases should I consider using Drill for?</h2>
 
 <p>Drill provides low latency SQL queries on large-scale datasets. Example use cases for Drill include</p>
 
@@ -80,11 +80,11 @@
 
 <p>We expect Drill to be used in lot more use cases where low latency is required.</p>
 
-<h2>Does Drill replace Hive for batch processing? What about my OLTP applications?</h2>
+<h2 id="does-drill-replace-hive-for-batch-processing?-what-about-my-oltp-applications?">Does Drill replace Hive for batch processing? What about my OLTP applications?</h2>
 
 <p>Drill complements batch-processing frameworks such as Hive, Pig, MapReduce to support low latency queries. Drill at this point doesn&#39;t make an optimal choice for OLTP/operational applications that require sub-second response times.</p>
 
-<h2>There are lots of SQL on Hadoop technologies out there. How is Drill different?</h2>
+<h2 id="there-are-lots-of-sql-on-hadoop-technologies-out-there.-how-is-drill-different?">There are lots of SQL on Hadoop technologies out there. How is Drill different?</h2>
 
 <p>Drill takes a different approach to SQL-on-Hadoop than Hive and other related technologies. The goal for Drill is to bring the SQL ecosystem and performance of the relational systems to Hadoop-scale data without compromising on the flexibility of Hadoop/NoSQL systems. Drill provides a flexible query environment for users with the key capabilities as below.</p>
 
@@ -95,11 +95,11 @@
 <li>Extensibility to go beyond Hadoop environments</li>
 </ul>
 
-<h2>What is self-describing data?</h2>
+<h2 id="what-is-self-describing-data?">What is self-describing data?</h2>
 
 <p>Self-describing data is where schema is specified as part of the data itself. File formats such as Parquet, JSON, ProtoBuf, XML, AVRO and NoSQL databases are all examples of self-describing data. Some of these data formats also dynamic and complex in that every record in the data can have its own set of columns/attributes and each column can be semi-structured/nested.</p>
 
-<h2>How does Drill support queries on self-describing data?</h2>
+<h2 id="how-does-drill-support-queries-on-self-describing-data?">How does Drill support queries on self-describing data?</h2>
 
 <p>Drill enables queries on self-describing data using the fundamental architectural foundations:</p>
 
@@ -110,11 +110,11 @@
 
 <p>Together with the dynamic data discovery and a flexible data model that can handle complex data types, Drill allows users to get fast and complete value from all their data.</p>
 
-<h2>But I already have schemas defined in Hive metastore? Can I use that with Drill?</h2>
+<h2 id="but-i-already-have-schemas-defined-in-hive-metastore?-can-i-use-that-with-drill?">But I already have schemas defined in Hive metastore? Can I use that with Drill?</h2>
 
 <p>Yes, Hive also serves as data source for Drill. So you can simply point to the Hive metastore from Drill and start performing low latency queries on Hive tables with no modifications.</p>
 
-<h2>Is Drill trying to be &quot;anti-schema&quot; or &quot;anti-DBA&quot;?</h2>
+<h2 id="is-drill-trying-to-be-&quot;anti-schema&quot;-or-&quot;anti-dba&quot;?">Is Drill trying to be &quot;anti-schema&quot; or &quot;anti-DBA&quot;?</h2>
 
 <p>Of course not! Central EDW schemas work great if data models are not changing often, value of data is well understood and is ready to be operationalized for regular reporting purposes. However, during data exploration and discovery phase, rigid modeling requirement poses challenges and delays value from data, especially in the Hadoop/NoSQL environments where the data is highly complex, dynamic and evolving fast. Few challenges include</p>
 
@@ -127,7 +127,7 @@
 
 <p>Drill is all about flexibility. The flexible schema management capabilities in Drill lets users explore the data in its native format as it comes in directly and create models/structure if needed in Hive metastore or using the CREATE TABLE/CREATE VIEW syntax within Drill.</p>
 
-<h2>What does a Drill query look like?</h2>
+<h2 id="what-does-a-drill-query-look-like?">What does a Drill query look like?</h2>
 
 <p>Drill uses a de-centralized metadata model and relies on its storage plugins to provide with the metadata. Drill supports queries on file system (distributed and local), HBase and Hive tables. There is a storage plugin associated with each data source that is supported by Drill.</p>
 
@@ -135,19 +135,19 @@
 
 <p><img src="/images/overview-img1.png" alt=""></p>
 
-<h2>Can I connect to Drill from my BI tools (Tableau, MicroStrategy, etc.)?</h2>
+<h2 id="can-i-connect-to-drill-from-my-bi-tools-(tableau,-microstrategy,-etc.)?">Can I connect to Drill from my BI tools (Tableau, MicroStrategy, etc.)?</h2>
 
 <p>Yes, Drill provides JDBC/ODBC drivers for integrating with BI/SQL based tools.</p>
 
-<h2>What SQL functionality can Drill support?</h2>
+<h2 id="what-sql-functionality-can-drill-support?">What SQL functionality can Drill support?</h2>
 
 <p>Drill provides ANSI standard SQL (not SQL &quot;Like&quot; or Hive QL) with support for all key analytics functionality such as SQL data types, joins, aggregations, filters, sort, sub-queries (including correlated), joins in where clause etc. <a href="https://cwiki.apache.org/confluence/display/DRILL/SQL+Overview">Click here</a> for reference on SQL functionality in Drill.</p>
 
-<h2>What Hadoop distributions does Drill work with?</h2>
+<h2 id="what-hadoop-distributions-does-drill-work-with?">What Hadoop distributions does Drill work with?</h2>
 
 <p>Drill is not designed with a particular Hadoop distribution in mind and we expect it to work with all Hadoop distributions that support Hadoop 2.3.x+ API. We have validated it so far with Apache Hadoop/MapR/CDH/Amazon EMR distributions (Amazon EMR requires a custom configuration required - contact <a href="mailto:drill-user@incubator.apache.org">drill-user@incubator.apache.org</a> for questions.</p>
 
-<h2>How does Drill achieve performance?</h2>
+<h2 id="how-does-drill-achieve-performance?">How does Drill achieve performance?</h2>
 
 <p>Drill is built from the ground up for performance on large-scale datasets. The key architectural components that help in achieving performance include.</p>
 
@@ -159,23 +159,23 @@
 <li>Optimistic/pipelined execution</li>
 </ul>
 
-<h2>Does Drill support multi-tenant/high concurrency environments?</h2>
+<h2 id="does-drill-support-multi-tenant/high-concurrency-environments?">Does Drill support multi-tenant/high concurrency environments?</h2>
 
 <p>Drill is built to support several 100s of queries at any given point. Clients can submit requests to any node running Drillbit service in the cluster (no master-slave concept). To support more users, you simply have to add more nodes to the cluster.</p>
 
-<h2>Do I need to load data into Drill to start querying it?</h2>
+<h2 id="do-i-need-to-load-data-into-drill-to-start-querying-it?">Do I need to load data into Drill to start querying it?</h2>
 
 <p>No. Drill can query data &quot;in situ&quot;.</p>
 
-<h2>What is the best way to get started with Drill?</h2>
+<h2 id="what-is-the-best-way-to-get-started-with-drill?">What is the best way to get started with Drill?</h2>
 
 <p>The best way to get started is to just try it out. It just takes a few minutes even if you do not have a cluster. Here is a good place to start: <a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes">Apache Drill in 10 minutes</a>.</p>
 
-<h2>How can I ask questions and provide feedback?</h2>
+<h2 id="how-can-i-ask-questions-and-provide-feedback?">How can I ask questions and provide feedback?</h2>
 
 <p>Please post your questions and feedback on <a href="mailto:drill-user@incubator.apache.org">drill-user@incubator.apache.org</a>. We are happy to have you try out Drill and help with any questions!</p>
 
-<h2>How can I contribute to Drill?</h2>
+<h2 id="how-can-i-contribute-to-drill?">How can I contribute to Drill?</h2>
 
 <p>Please refer to the <a href="/community/#getinvolved">Get Involved</a> page on how to get involved with Drill.</p>
 </div>

Modified: drill/site/trunk/content/drill/feed.xml
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/feed.xml?rev=1651949&r1=1651948&r2=1651949&view=diff
==============================================================================
--- drill/site/trunk/content/drill/feed.xml (original)
+++ drill/site/trunk/content/drill/feed.xml Thu Jan 15 05:11:44 2015
@@ -6,8 +6,8 @@
 </description>
     <link>/</link>
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Thu, 08 Jan 2015 09:46:31 -0800</pubDate>
-    <lastBuildDate>Thu, 08 Jan 2015 09:46:31 -0800</lastBuildDate>
+    <pubDate>Wed, 14 Jan 2015 21:01:22 -0800</pubDate>
+    <lastBuildDate>Wed, 14 Jan 2015 21:01:22 -0800</lastBuildDate>
     <generator>Jekyll v2.5.1</generator>
     
       <item>
@@ -54,7 +54,7 @@ Jacques Nadeau&lt;/p&gt;
 
 &lt;p&gt;This is by no means intended to be an exhaustive list of everything that will be added to Drill in 2015. With Drill&amp;#39;s rapidly expanding community, I anticipate that you&amp;#39;ll see a whole lot more.&lt;/p&gt;
 
-&lt;h2&gt;Flexible Access Control&lt;/h2&gt;
+&lt;h2 id=&quot;flexible-access-control&quot;&gt;Flexible Access Control&lt;/h2&gt;
 
 &lt;p&gt;Many organizations are now interested in providing Drill as a service to their users, supporting many users, groups and organizations with a single cluster. To do so, they need to be able to control who can access what data. Today&amp;#39;s volume and variety of data requires a new approach to access control. For example, it is becoming impractical for organizations to manage a standalone, centralized repository of permissions for every column/row of every table. Drill&amp;#39;s virtual datasets (views) provide a more scalable solution to access control:&lt;/p&gt;
 
@@ -63,7 +63,7 @@ Jacques Nadeau&lt;/p&gt;
 &lt;li&gt;A virtual dataset is owned by a specific user and can only &amp;quot;select&amp;quot; data that the owner has access to. The data sources (HDFS, HBase, MongoDB, etc.) are responsible for access control decisions. Users and administrators do not need to define separate permissions inside Drill or utilize yet another centralized permission repository, such as Sentry and Ranger.&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;h2&gt;JSON in Any Shape or Form&lt;/h2&gt;
+&lt;h2 id=&quot;json-in-any-shape-or-form&quot;&gt;JSON in Any Shape or Form&lt;/h2&gt;
 
 &lt;p&gt;When data is &lt;strong&gt;Big&lt;/strong&gt; (as in Big Data), it is painful to copy and transform it. Users should be able to explore the raw data without (or at least prior to) transforming it into another format. Drill is designed to enable in-situ analytics. Just point it at a file or directory and run the queries.&lt;/p&gt;
 
@@ -100,11 +100,11 @@ Jacques Nadeau&lt;/p&gt;
 &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
 &lt;p&gt;While this works today, the dataset is technically a single JSON document, so Drill ends up reading the entire dataset into memory. We&amp;#39;re developing a FLATTEN-pushdown mechanism that will enable the JSON reader to emit the individual records into the downstream operators, thereby making this work with datasets of arbitrary size. Once that&amp;#39;s implemented, users will be able to explore any JSON-based dataset in-situ (ie, without having to transform it).&lt;/p&gt;
 
-&lt;h2&gt;Full SQL&lt;/h2&gt;
+&lt;h2 id=&quot;full-sql&quot;&gt;Full SQL&lt;/h2&gt;
 
 &lt;p&gt;Unlike the majority of SQL engines for Hadoop and NoSQL databases, which support SQL-like languages (HiveQL, CQL, etc.), Drill is designed from the ground up to be compliant with ANSI SQL. We simply started with a real SQL parser (Apache Calcite, previously known as Optiq). We&amp;#39;re currently implementing the remaining SQL constructs, and plan to support the full TPC-DS suite (with no query modifications) in 2015. Full SQL support makes BI tools work better, and enables users who are proficient with SQL to leverage their existing knowledge and skills.&lt;/p&gt;
 
-&lt;h2&gt;New Data Sources&lt;/h2&gt;
+&lt;h2 id=&quot;new-data-sources&quot;&gt;New Data Sources&lt;/h2&gt;
 
 &lt;p&gt;Drill is a standalone, distributed SQL engine. It has a pluggable architecture that allows it to support multiple data sources. Drill 0.6 includes storage plugins for:&lt;/p&gt;
 
@@ -127,7 +127,7 @@ Jacques Nadeau&lt;/p&gt;
 
 &lt;p&gt;If you&amp;#39;re interested in implementing a new storage plugin, I would encourage you to reach out to the Drill developer community on &lt;a href=&quot;mailto:dev@drill.apache.org&quot;&gt;dev@drill.apache.org&lt;/a&gt;. I&amp;#39;m looking forward to publishing an example of a single-query join across 10 data sources.&lt;/p&gt;
 
-&lt;h2&gt;Drill/Spark Integration&lt;/h2&gt;
+&lt;h2 id=&quot;drill/spark-integration&quot;&gt;Drill/Spark Integration&lt;/h2&gt;
 
 &lt;p&gt;We&amp;#39;re seeing growing interest in Spark as an execution engine for data pipelines, providing an alternative to MapReduce. The Drill community is working on integrating Drill and Spark to address a few new use cases:&lt;/p&gt;
 
@@ -143,7 +143,7 @@ Jacques Nadeau&lt;/p&gt;
 &lt;li&gt;&lt;p&gt;Use Drill to query Spark RDDs. Analysts will be able to use BI tools like MicroStrategy, Spotfire and Tableau to query in-memory data in Spark. In addition, Spark developers will be able to embed Drill execution in a Spark data pipeline, thereby enjoying the power of Drill&amp;#39;s schema-free, columnar execution engine.&lt;/p&gt;&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;h2&gt;Operational Enhancements&lt;/h2&gt;
+&lt;h2 id=&quot;operational-enhancements&quot;&gt;Operational Enhancements&lt;/h2&gt;
 
 &lt;p&gt;As we continue with our monthly releases and march towards the 1.0 release early next year, we&amp;#39;re focused on improving Drill&amp;#39;s speed and scalability. We&amp;#39;ll also enhance Drill&amp;#39;s multi-tenancy with more advanced workload management.&lt;/p&gt;
 
@@ -153,7 +153,7 @@ Jacques Nadeau&lt;/p&gt;
 &lt;li&gt;&lt;strong&gt;Workload management&lt;/strong&gt;: A single cluster is often shared among many users and groups, and everyone expects answers in real-time. Workload management prioritizes the allocation of resources to ensure that the most important workloads get done first so that business demands can be met. Administrators need to be able to assign priorities and quotas at a fine granularity. We&amp;#39;re working on enhancing Drill&amp;#39;s workload management to provide these capabilities while providing tight integration with YARN and Mesos.&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;h2&gt;We Would Love to Hear From You!&lt;/h2&gt;
+&lt;h2 id=&quot;we-would-love-to-hear-from-you!&quot;&gt;We Would Love to Hear From You!&lt;/h2&gt;
 
 &lt;p&gt;Are there other features you would like to see in Drill? We would love to hear from you:&lt;/p&gt;
 
@@ -188,7 +188,7 @@ Tomer Shiran&lt;/p&gt;
     &lt;span class=&quot;_description&quot;&gt;Join us on Twitter for a one-hour, live SQL-on-Hadoop Q&amp;amp;A. Use the &lt;strong&gt;hashtag #DrillQA&lt;/strong&gt; so the panelists can engage with your questions and comments. Apache Drill committers Tomer Shiran, Jacques Nadeau, and Ted Dunning, as well as Tableau Product Manager Jeff Feng and Data Scientist Dr. Kirk Borne will be on hand to answer your questions.&lt;/span&gt;
     &lt;span class=&quot;_location&quot;&gt;Twitter: #DrillQA&lt;/span&gt;
     &lt;span class=&quot;_organizer&quot;&gt;Tomer Shiran&lt;/span&gt;
-    &lt;span class=&quot;_organizer_email&quot;&gt;tshiran@apache.org&lt;/span&gt;
+    &lt;span class=&quot;_organizer_email&quot;&gt;&lt;a href=&quot;mailto:tshiran@apache.org&quot;&gt;tshiran@apache.org&lt;/a&gt;&lt;/span&gt;
     &lt;span class=&quot;_all_day_event&quot;&gt;false&lt;/span&gt;
     &lt;span class=&quot;_date_format&quot;&gt;MM-DD-YYYY&lt;/span&gt;
 &lt;/a&gt;&lt;/p&gt;
@@ -203,23 +203,23 @@ Tomer Shiran&lt;/p&gt;
 
 &lt;p&gt;Apache Drill committers Tomer Shiran, Jacques Nadeau, and Ted Dunning, as well as Tableau Product Manager Jeff Feng and Data Scientist Dr. Kirk Borne will be on hand to answer your questions.&lt;/p&gt;
 
-&lt;h4&gt;Tomer Shiran, Apache Drill Founder (@tshiran)&lt;/h4&gt;
+&lt;h4 id=&quot;tomer-shiran,-apache-drill-founder-(@tshiran)&quot;&gt;Tomer Shiran, Apache Drill Founder (@tshiran)&lt;/h4&gt;
 
 &lt;p&gt;Tomer Shiran is the founder of Apache Drill, and a PMC member and committer on the project. He is VP Product Management at MapR, responsible for product strategy, roadmap and new feature development. Prior to MapR, Tomer held numerous product management and engineering roles at Microsoft, most recently as the product manager for Microsoft Internet Security &amp;amp; Acceleration Server (now Microsoft Forefront). He is the founder of two websites that have served tens of millions of users, and received coverage in prestigious publications such as The New York Times, USA Today and The Times of London. Tomer is also the author of a 900-page programming book. He holds an MS in Computer Engineering from Carnegie Mellon University and a BS in Computer Science from Technion - Israel Institute of Technology.&lt;/p&gt;
 
-&lt;h4&gt;Jeff Feng, Product Manager Tableau Software (@jtfeng)&lt;/h4&gt;
+&lt;h4 id=&quot;jeff-feng,-product-manager-tableau-software-(@jtfeng)&quot;&gt;Jeff Feng, Product Manager Tableau Software (@jtfeng)&lt;/h4&gt;
 
 &lt;p&gt;Jeff Feng is a Product Manager at Tableau and leads their Big Data product roadmap &amp;amp; strategic vision.  In his role, he focuses on joint technology integration and partnership efforts with a number of Hadoop, NoSQL and web application partners in helping users see and understand their data.&lt;/p&gt;
 
-&lt;h4&gt;Ted Dunning, Apache Drill Comitter (@Ted_Dunning)&lt;/h4&gt;
+&lt;h4 id=&quot;ted-dunning,-apache-drill-comitter-(@ted_dunning)&quot;&gt;Ted Dunning, Apache Drill Comitter (@Ted_Dunning)&lt;/h4&gt;
 
 &lt;p&gt;Ted Dunning is Chief Applications Architect at MapR Technologies and committer and PMC member of the Apache Mahout, Apache ZooKeeper, and Apache Drill projects and mentor for Apache Storm. He contributed to Mahout clustering, classification and matrix decomposition algorithms  and helped expand the new version of Mahout Math library. Ted was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems, he built fraud detection systems for ID Analytics (LifeLock) and he has issued 24 patents to date. Ted has a PhD in computing science from University of Sheffield. When he’s not doing data science, he plays guitar and mandolin.&lt;/p&gt;
 
-&lt;h4&gt;Jacques Nadeau, Vice President, Apache Drill (@intjesus)&lt;/h4&gt;
+&lt;h4 id=&quot;jacques-nadeau,-vice-president,-apache-drill-(@intjesus)&quot;&gt;Jacques Nadeau, Vice President, Apache Drill (@intjesus)&lt;/h4&gt;
 
 &lt;p&gt;Jacques Nadeau leads Apache Drill development efforts at MapR Technologies. He is an industry veteran with over 15 years of big data and analytics experience. Most recently, he was cofounder and CTO of search engine startup YapMap. Before that, he was director of new product engineering with Quigo (contextual advertising, acquired by AOL in 2007). He also built the Avenue A | Razorfish analytics data warehousing system and associated services practice (acquired by Microsoft).&lt;/p&gt;
 
-&lt;h4&gt;Dr. Kirk Borne, George Mason University (@KirkDBorne)&lt;/h4&gt;
+&lt;h4 id=&quot;dr.-kirk-borne,-george-mason-university-(@kirkdborne)&quot;&gt;Dr. Kirk Borne, George Mason University (@KirkDBorne)&lt;/h4&gt;
 
 &lt;p&gt;Dr. Kirk Borne is a Transdisciplinary Data Scientist and an Astrophysicist. He is Professor of Astrophysics and Computational Science in the George Mason University School of Physics, Astronomy, and Computational Sciences. He has been at Mason since 2003, where he teaches and advises students in the graduate and undergraduate Computational Science, Informatics, and Data Science programs. Previously, he spent nearly 20 years in positions supporting NASA projects, including an assignment as NASA&amp;#39;s Data Archive Project Scientist for the Hubble Space Telescope, and as Project Manager in NASA&amp;#39;s Space Science Data Operations Office. He has extensive experience in big data and data science, including expertise in scientific data mining and data systems. He has published over 200 articles (research papers, conference papers, and book chapters), and given over 200 invited talks at conferences and universities worldwide.&lt;/p&gt;
 </description>
@@ -249,11 +249,11 @@ Tomer Shiran&lt;/p&gt;
 
 &lt;p&gt;Consult the &lt;a href=&quot;https://cwiki.apache.org/confluence/display/DRILL/Architectural+Overview&quot;&gt;Architectural Overview&lt;/a&gt; for a refresher on the architecture of Drill.&lt;/p&gt;
 
-&lt;h3&gt;Prerequisites&lt;/h3&gt;
+&lt;h3 id=&quot;prerequisites&quot;&gt;Prerequisites&lt;/h3&gt;
 
 &lt;p&gt;These steps assume you have a &lt;a href=&quot;https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes&quot;&gt;typical Drill cluster and ZooKeeper quorum&lt;/a&gt; configured and running.  To access data in S3, you will need an S3 bucket configured and have the required Amazon security credentials in your possession.  An &lt;a href=&quot;http://blogs.aws.amazon.com/security/post/Tx1R9KDN9ISZ0HF/Where-s-my-secret-access-key&quot;&gt;Amazon blog post&lt;/a&gt; has more information on how to get these from your account.&lt;/p&gt;
 
-&lt;h3&gt;Configuration Steps&lt;/h3&gt;
+&lt;h3 id=&quot;configuration-steps&quot;&gt;Configuration Steps&lt;/h3&gt;
 
 &lt;p&gt;To connect Drill to S3, all of the drillbit nodes will need to access code in the JetS3t library developed by Amazon.  As of this writing, 0.9.2 is the latest version but you might want to check &lt;a href=&quot;https://jets3t.s3.amazonaws.com/toolkit/toolkit.html&quot;&gt;the main page&lt;/a&gt; to see if anything has been updated.  Be sure to get version 0.9.2 or later as earlier versions have a bug relating to reading Parquet data.&lt;/p&gt;
 &lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;wget http://bitbucket.org/jmurty/jets3t/downloads/jets3t-0.9.2.zip
@@ -321,9 +321,9 @@ cp jets3t-0.9.2/jars/jets3t-0.9.2.jar &l
 
 &lt;p&gt;In this post I wanted to reflect on the past and future of Drill.&lt;/p&gt;
 
-&lt;h2&gt;Why We Started Drill&lt;/h2&gt;
+&lt;h2 id=&quot;why-we-started-drill&quot;&gt;Why We Started Drill&lt;/h2&gt;
 
-&lt;h3&gt;The Evolution of Application Development and Data&lt;/h3&gt;
+&lt;h3 id=&quot;the-evolution-of-application-development-and-data&quot;&gt;The Evolution of Application Development and Data&lt;/h3&gt;
 
 &lt;p&gt;Over the last decade, organizations have been striving to become more agile and data-driven, seeking to gain competitive advantage in their markets. This trend has led to dramatic changes in the way applications are built and delivered, and in the type and volume of data that is being leveraged.&lt;/p&gt;
 
@@ -331,11 +331,11 @@ cp jets3t-0.9.2/jars/jets3t-0.9.2.jar &l
 
 &lt;p&gt;&lt;strong&gt;Data&lt;/strong&gt;: In previous decades, data was measured in MBs or GBs, and it was highly structured and denormalized. Today&amp;#39;s data is often measured in TBs or PBs, and it tends to be multi-structured — a combination of unstructured, semi-structured and structured. The data comes from many different sources, including a variety of applications, devices and services, and its structure changes much more frequently.&lt;/p&gt;
 
-&lt;h3&gt;A New Generation of Datastores&lt;/h3&gt;
+&lt;h3 id=&quot;a-new-generation-of-datastores&quot;&gt;A New Generation of Datastores&lt;/h3&gt;
 
 &lt;p&gt;The relational database, which was invented in 1970, was not designed for these new processes and data volumes and structures. As a result, a new generation of datastores has emerged, including HDFS, NoSQL (HBase, MongoDB, etc.) and search (Elasticsearch, Solr).  These systems are schema-free (also known as &amp;quot;dynamic schema&amp;quot;). Applications, as opposed to DBAs, control the data structure, enabling more agility and flexibility. For example, an application developer can independently evolve the data structure with each application release (which could be daily or weekly) without filing a ticket with IT and waiting for the schema of the databae to be modified.&lt;/p&gt;
 
-&lt;h2&gt;The Need for a New Query Engine&lt;/h2&gt;
+&lt;h2 id=&quot;the-need-for-a-new-query-engine&quot;&gt;The Need for a New Query Engine&lt;/h2&gt;
 
 &lt;p&gt;With data increasingly being stored in schema-free datastores (HDFS, HBase, MongoDB, etc.) and a variety of cloud services, users need a way to explore and analyze this data, and a way to visualize it with BI tools (reports, dashboards, etc.). In 2012 we decided to embark on a journey to create the world&amp;#39;s next-generation SQL engine. We had several high-level requirements in mind:&lt;/p&gt;
 
@@ -350,7 +350,7 @@ cp jets3t-0.9.2/jars/jets3t-0.9.2.jar &l
 
 &lt;p&gt;After almost two years of research and development, we released Drill 0.4 in August, and continued with monthly releases since then.&lt;/p&gt;
 
-&lt;h2&gt;What&amp;#39;s Next&lt;/h2&gt;
+&lt;h2 id=&quot;what&amp;#39;s-next&quot;&gt;What&amp;#39;s Next&lt;/h2&gt;
 
 &lt;p&gt;Graduating to a top-level project is a significant milestone, but it&amp;#39;s really just the beginning of the journey. In fact, we&amp;#39;re currently wrapping up Drill 0.7, which includes hundreds of fixes and enhancements, and we expect to release that in the next couple weeks.&lt;/p&gt;
 
@@ -385,9 +385,9 @@ Tomer Shiran&lt;/p&gt;
 &lt;li&gt;Optimizations&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;h2&gt;Drill and MongoDB Setup (Standalone/Replicated/Sharded)&lt;/h2&gt;
+&lt;h2 id=&quot;drill-and-mongodb-setup-(standalone/replicated/sharded)&quot;&gt;Drill and MongoDB Setup (Standalone/Replicated/Sharded)&lt;/h2&gt;
 
-&lt;h3&gt;Standalone&lt;/h3&gt;
+&lt;h3 id=&quot;standalone&quot;&gt;Standalone&lt;/h3&gt;
 
 &lt;ul&gt;
 &lt;li&gt;Start &lt;code&gt;mongod&lt;/code&gt; process (&lt;a href=&quot;http://docs.mongodb.org/manual/installation/&quot;&gt;Install MongoDB&lt;/a&gt;)&lt;/li&gt;
@@ -406,7 +406,7 @@ Tomer Shiran&lt;/p&gt;
 
 &lt;p&gt;&lt;img src=&quot;/static/sql-on-mongodb/standalone.png&quot; alt=&quot;Drill on MongoDB in standalone mode&quot;&gt;&lt;/p&gt;
 
-&lt;h3&gt;Replica Set&lt;/h3&gt;
+&lt;h3 id=&quot;replica-set&quot;&gt;Replica Set&lt;/h3&gt;
 
 &lt;ul&gt;
 &lt;li&gt;Start &lt;code&gt;mongod&lt;/code&gt; processes in replication mode&lt;/li&gt;
@@ -426,7 +426,7 @@ Tomer Shiran&lt;/p&gt;
 
 &lt;p&gt;In replicated mode, whichever drillbit receives the query connects to the nearest &lt;code&gt;mongod&lt;/code&gt; (local &lt;code&gt;mongod&lt;/code&gt;) to read the data.&lt;/p&gt;
 
-&lt;h3&gt;Sharded/Sharded with Replica Set&lt;/h3&gt;
+&lt;h3 id=&quot;sharded/sharded-with-replica-set&quot;&gt;Sharded/Sharded with Replica Set&lt;/h3&gt;
 
 &lt;ul&gt;
 &lt;li&gt;Start Mongo processes in sharded mode&lt;/li&gt;
@@ -446,7 +446,7 @@ Tomer Shiran&lt;/p&gt;
 
 &lt;p&gt;In sharded mode, drillbit first connects to the &lt;code&gt;mongos&lt;/code&gt; server to get the shard information.&lt;/p&gt;
 
-&lt;h2&gt;Endpoint Assignments&lt;/h2&gt;
+&lt;h2 id=&quot;endpoint-assignments&quot;&gt;Endpoint Assignments&lt;/h2&gt;
 
 &lt;p&gt;Drill is designed to maximize data locality:&lt;/p&gt;
 
@@ -456,7 +456,7 @@ Tomer Shiran&lt;/p&gt;
 &lt;li&gt;When some of drillbits and shards are colocated, and some of them are running on different machines, partial data locality is achieved.&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;h2&gt;Running Queries&lt;/h2&gt;
+&lt;h2 id=&quot;running-queries&quot;&gt;Running Queries&lt;/h2&gt;
 
 &lt;p&gt;Here is a simple exercise that provides steps for creating an &lt;code&gt;empinfo&lt;/code&gt; collection in an &lt;code&gt;employee&lt;/code&gt; database in Mongo that you can query using Drill:&lt;/p&gt;
 
@@ -491,7 +491,7 @@ mongoimport --host localhost --db employ
 &lt;p&gt;To set &lt;code&gt;store.mongo.all_text_mode = true&lt;/code&gt;, execute the following command in sqlline:&lt;/p&gt;
 &lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;alter&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;session&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;set&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;store&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mongo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;all_text_mode&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;true&lt;/span&gt;
 &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h2&gt;Securely Accessing MongoDB&lt;/h2&gt;
+&lt;h2 id=&quot;securely-accessing-mongodb&quot;&gt;Securely Accessing MongoDB&lt;/h2&gt;
 
 &lt;p&gt;Create two databases, emp and zips. For each database, create a user with read privileges. As an example, for the zips database, create a user “apache” with read privileges. For the emp database, create a user “drill” with read privileges.&lt;/p&gt;
 
@@ -511,7 +511,7 @@ mongoimport --host localhost --db employ
 &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
 &lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: The security patch may be included in next release. Check &lt;a href=&quot;https://issues.apache.org/jira/browse/DRILL-1502&quot;&gt;DRILL-1502&lt;/a&gt; for status.&lt;/p&gt;
 
-&lt;h2&gt;Optimizations&lt;/h2&gt;
+&lt;h2 id=&quot;optimizations&quot;&gt;Optimizations&lt;/h2&gt;
 
 &lt;p&gt;The MongoDB storage plugin supports predicate pushdown and projection pushdown. As of now, predicate pushdown is implemented for the following filters: &lt;code&gt;&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;gt;=&lt;/code&gt;, &lt;code&gt;&amp;lt;&lt;/code&gt;, &lt;code&gt;&amp;lt;=&lt;/code&gt;, &lt;code&gt;==&lt;/code&gt;, &lt;code&gt;!=&lt;/code&gt;, &lt;code&gt;isNull&lt;/code&gt; and &lt;code&gt;isNotNull&lt;/code&gt;.&lt;/p&gt;
 



Mime
View raw message