Return-Path: X-Original-To: apmail-drill-commits-archive@www.apache.org Delivered-To: apmail-drill-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EFA5E10137 for ; Thu, 27 Nov 2014 01:30:57 +0000 (UTC) Received: (qmail 76027 invoked by uid 500); 27 Nov 2014 01:30:57 -0000 Delivered-To: apmail-drill-commits-archive@drill.apache.org Received: (qmail 75997 invoked by uid 500); 27 Nov 2014 01:30:57 -0000 Mailing-List: contact commits-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: commits@drill.apache.org Delivered-To: mailing list commits@drill.apache.org Received: (qmail 75987 invoked by uid 99); 27 Nov 2014 01:30:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Nov 2014 01:30:57 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Nov 2014 01:30:29 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 02F842388CA2; Thu, 27 Nov 2014 01:28:57 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r1641986 [2/2] - in /drill/site/trunk/content/drill: ./ architecture/ blog/ blog/2014/ blog/2014/11/ blog/2014/11/19/ blog/2014/11/19/sql-on-mongodb/ community/ css/ download/ faq/ overview/ static/ static/sql-on-mongodb/ team/ why/ Date: Thu, 27 Nov 2014 01:28:56 -0000 To: commits@drill.apache.org From: tshiran@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20141127012857.02F842388CA2@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Added: drill/site/trunk/content/drill/feed.xml URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/feed.xml?rev=1641986&view=auto ============================================================================== --- drill/site/trunk/content/drill/feed.xml (added) +++ drill/site/trunk/content/drill/feed.xml Thu Nov 27 01:28:55 2014 @@ -0,0 +1,172 @@ + + + + Apache Drill - Self-Service Data Exploration + Write an awesome description for your new site here. You can edit this line in _config.yml. It will appear in your document head meta (for Google search results) and in your feed.xml site description. + + / + + Wed, 26 Nov 2014 17:22:07 -0800 + Wed, 26 Nov 2014 17:22:07 -0800 + Jekyll v2.5.1 + + + SQL on MongoDB + <p>One of the many interesting and unique features about Drill is its ability to support multiple query languages, data formats, and data sources, as well as cross-platform querying (such as joining HBase tables with your Mongo collections) using ANSI SQL.</p> + +<p>As of now, Drill supports multiple storage plugins, including HDFS, HBase, Hive, and LocalFileSystem. Since Drill is optimized for nested data, we realized that a Mongo storage plugin would be a useful feature.</p> + +<p>So, recently Kamesh and I (we have an avid interest in all things Mongo) contributed the MongoDB storage plugin feature to the Apache Drill project. As part of this blog post, we would like to provide instructions on how to use this plugin, which has been included in the <a href="http://incubator.apache.org/drill/download/">Drill 0.6 release</a>.</p> + +<p>The instructions are divided into the following subtopics:</p> + +<ul> +<li>Drill and Mongo setup (standalone/replicated/sharded)</li> +<li>Running queries</li> +<li>Securely accessing MongoDB</li> +<li>Optimizations</li> +</ul> + +<h2>Drill and MongoDB Setup (Standalone/Replicated/Sharded)</h2> + +<h3>Standalone</h3> + +<ul> +<li>Start <code>mongod</code> process (<a href="http://docs.mongodb.org/manual/installation/">Install MongoDB</a>)</li> +<li>Start Drill in embedded mode (<a href="https://cwiki.apache.org/confluence/display/DRILL/Installing+Drill+in+Embedded+Mode">Installing Drill in Embedded Mode</a> &amp; <a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=44994063">Starting/Stopping Drill</a>) </li> +<li>Access the Web UI through the local drillbit: <a href="http://localhost:8047/">http://localhost:8047/</a></li> +<li><p>Enable the Mongo storage plugin and update its configuration:</p> +<div class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span> + <span class="nt">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;mongo&quot;</span><span class="p">,</span> + <span class="nt">&quot;connection&quot;</span><span class="p">:</span> <span class="s2">&quot;mongodb://localhost:27017&quot;</span><span class="p">,</span> + <span class="nt">&quot;enabled&quot;</span><span class="p">:</span> <span class="kc">true</span> +<span class="p">}</span> +</code></pre></div></li> +</ul> + +<p>By default, <code>mongod</code> listens on port 27017.</p> + +<p><img src="/static/sql-on-mongodb/standalone.png" alt="Drill on MongoDB in standalone mode"></p> + +<h3>Replica Set</h3> + +<ul> +<li>Start <code>mongod</code> processes in replication mode</li> +<li>Start Drill in distributed mode (<a href="https://cwiki.apache.org/confluence/display/DRILL/Installing+Drill+in+Distributed+Mode">Installing Drill in Distributed Mode</a> &amp; <a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=44994063">Starting/Stopping Drill</a>)</li> +<li>Access the Web UI through any drillbit: <a href="http://drillbit2:8047">http://drillbit2:8047</a></li> +<li><p>Enable the Mongo storage plugin and update its configuration:</p> +<div class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span> + <span class="nt">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;mongo&quot;</span><span class="p">,</span> + <span class="nt">&quot;connection&quot;</span><span class="p">:</span> <span class="s2">&quot;mongodb://&lt;host1&gt;:&lt;port1&gt;,&lt;host2&gt;:&lt;port2&gt;&quot;</span><span class="p">,</span> + <span class="nt">&quot;enabled&quot;</span><span class="p">:</span> <span class="kc">true</span> +<span class="p">}</span> +</code></pre></div> +<p>Where <code>host1</code> and <code>host2</code> are <code>mongod</code> hostnames in the replica set.</p></li> +</ul> + +<p><img src="/static/sql-on-mongodb/replicated.png" alt="Drill on MongoDB in replicated mode"></p> + +<p>In replicated mode, whichever drillbit receives the query connects to the nearest <code>mongod</code> (local <code>mongod</code>) to read the data.</p> + +<h3>Sharded/Sharded with Replica Set</h3> + +<ul> +<li>Start Mongo processes in sharded mode</li> +<li>Start Drill in distributed mode (<a href="https://cwiki.apache.org/confluence/display/DRILL/Installing+Drill+in+Distributed+Mode">Installing Drill in Distributed Mode</a> &amp; <a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=44994063">Starting/Stopping Drill</a>)</li> +<li>Access the Web UI through any drillbit: <a href="http://drillbit3:8047">http://drillbit3:8047</a></li> +<li><p>Enable the Mongo storage plugin and update its configuration:</p> +<div class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span> + <span class="nt">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;mongo&quot;</span><span class="p">,</span> + <span class="nt">&quot;connection&quot;</span><span class="p">:</span> <span class="s2">&quot;mongodb://&lt;host1&gt;:&lt;port1&gt;,&lt;host2&gt;:&lt;port2&gt;&quot;</span><span class="p">,</span> + <span class="nt">&quot;enabled&quot;</span><span class="p">:</span> <span class="kc">true</span> +<span class="p">}</span> +</code></pre></div> +<p>Where <code>host1</code> and <code>host2</code> are the <code>mongos</code> hostnames.</p></li> +</ul> + +<p><img src="/static/sql-on-mongodb/sharded.png" alt="Drill on MongoDB in sharded mode"></p> + +<p>In sharded mode, drillbit first connects to the <code>mongos</code> server to get the shard information.</p> + +<h2>Endpoint Assignments</h2> + +<p>Drill is designed to maximize data locality:</p> + +<ul> +<li>When drillbits and shards run together on the same machines, each drillbit (endpoint) will read the chunks from the local shard. That is, all the chunks from a shard will be assigned to its local drillbit. This is known as data locality, and is the ideal scenario.</li> +<li>When all drillbits and shards are running on different machines, chunks will be assigned to drillbits in a round-robin fashion. In this case there is no data locality.</li> +<li>When some of drillbits and shards are colocated, and some of them are running on different machines, partial data locality is achieved.</li> +</ul> + +<h2>Running Queries</h2> + +<p>Here is a simple exercise that provides steps for creating an <code>empinfo</code> collection in an <code>employee</code> database in Mongo that you can query using Drill:</p> + +<ol> +<li>Download <a href="http://media.mongodb.org/zips.json">zips.json</a> and the <a href="/static/sql-on-mongodb/empinfo.json">empinfo.json</a> dataset referenced at the end of blog.</li> +<li><p>Import the zips.json and empinfo.json files into Mongo using the following command: </p> +<div class="highlight"><pre><code class="language-bash" data-lang="bash">mongoimport --host localhost --db <span class="nb">test</span> --collection zips &lt; zips.json +mongoimport --host localhost --db employee --collection empinfo &lt; empinfo.json +</code></pre></div></li> +<li><p>Issue the following queries either from sqlline (Drill’s shell) or from the Drill Web UI to get corresponding results from the Mongo collection. </p> + +<ul> +<li>To issue queries from the web UI, open the Drill web UI and go to Query tab. </li> +<li><p>To issue queries from sqlline, connect to sqlline using the following command: </p> +<div class="highlight"><pre><code class="language-bash" data-lang="bash">&lt;DRILLHOME&gt;/bin/sqlline -u jdbc:drill:zk<span class="o">=</span>zkhost:2181 -n admin -p admin +</code></pre></div></li> +</ul></li> +<li><p>Queries:</p> +<div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">SELECT</span> <span class="n">first_name</span><span class="p">,</span> <span class="n">last_name</span><span class="p">,</span> <span class="n">position_id</span> +<span class="k">FROM</span> <span class="n">mongo</span><span class="p">.</span><span class="n">employee</span><span class="p">.</span><span class="o">`</span><span class="n">empinfo</span><span class="o">`</span> +<span class="k">WHERE</span> <span class="n">employee_id</span> <span class="o">=</span> <span class="mi">1107</span> <span class="k">AND</span> <span class="n">position_id</span> <span class="o">=</span> <span class="mi">17</span> <span class="k">AND</span> <span class="n">last_name</span> <span class="o">=</span> <span class="s1">&#39;Yonce&#39;</span><span class="p">;</span> + +<span class="k">SELECT</span> <span class="n">city</span><span class="p">,</span> <span class="k">sum</span><span class="p">(</span><span class="n">pop</span><span class="p">)</span> +<span class="k">FROM</span> <span class="n">mongo</span><span class="p">.</span><span class="n">test</span><span class="p">.</span><span class="o">`</span><span class="n">zips</span><span class="o">`</span> <span class="n">zipcodes</span> +<span class="k">WHERE</span> <span class="k">state</span> <span class="k">IS</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">GROUP</span> <span class="k">BY</span> <span class="n">city</span> +<span class="k">ORDER</span> <span class="k">BY</span> <span class="k">sum</span><span class="p">(</span><span class="n">pop</span><span class="p">)</span> <span class="k">DESC</span> <span class="k">LIMIT</span> <span class="mi">1</span><span class="p">;</span> +</code></pre></div></li> +</ol> + +<p><em>Note</em>: If a field contains a mixture of different data types across different records, such as both int and decimal values, then queries fail unless <code>store.mongo.all_text_mode = true</code> and aggregations fail in that case. For more information refer to <a href="https://issues.apache.org/jira/browse/DRILL-1475">DRILL-1475</a> and <a href="https://issues.apache.org/jira/browse/DRILL-1460">DRILL-1460</a>.</p> + +<p>To set <code>store.mongo.all_text_mode = true</code>, execute the following command in sqlline:</p> +<div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">alter</span> <span class="k">session</span> <span class="k">set</span> <span class="n">store</span><span class="p">.</span><span class="n">mongo</span><span class="p">.</span><span class="n">all_text_mode</span> <span class="o">=</span> <span class="k">true</span> +</code></pre></div> +<h2>Securely Accessing MongoDB</h2> + +<p>Create two databases, emp and zips. For each database, create a user with read privileges. As an example, for the zips database, create a user “apache” with read privileges. For the emp database, create a user “drill” with read privileges.</p> + +<p>The apache user will be able to query the zips database by using the following storage plugin configuration:</p> +<div class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span> + <span class="nt">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;mongo&quot;</span><span class="p">,</span> + <span class="nt">&quot;connection&quot;</span><span class="p">:</span> <span class="s2">&quot;mongodb://apache:apache@localhost:27017/zips&quot;</span><span class="p">,</span> + <span class="nt">&quot;enabled&quot;</span><span class="p">:</span> <span class="kc">true</span> +<span class="p">}</span> +</code></pre></div> +<p>The <code>drill</code> user will be able to query the <code>emp</code> database by using the following storage plugin configuration:</p> +<div class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span> + <span class="nt">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;mongo&quot;</span><span class="p">,</span> + <span class="nt">&quot;connection&quot;</span><span class="p">:</span> <span class="s2">&quot;mongodb://drill:drill@localhost:27017/emp&quot;</span><span class="p">,</span> + <span class="nt">&quot;enabled&quot;</span><span class="p">:</span> <span class="kc">true</span> +<span class="p">}</span> +</code></pre></div> +<p><em>Note</em>: The security patch may be included in next release. Check <a href="https://issues.apache.org/jira/browse/DRILL-1502">DRILL-1502</a> for status.</p> + +<h2>Optimizations</h2> + +<p>The MongoDB storage plugin supports predicate pushdown and projection pushdown. As of now, predicate pushdown is implemented for the following filters: <code>&gt;</code>, <code>&gt;=</code>, <code>&lt;</code>, <code>&lt;=</code>, <code>==</code>, <code>!=</code>, <code>isNull</code> and <code>isNotNull</code>.</p> + +<p>We are excited about the release of the MongoDB storage plugin, and we believe that Drill is the perfect SQL query tool for MongoDB.</p> + + Wed, 19 Nov 2014 05:50:01 -0800 + /blog/2014/11/19/sql-on-mongodb/ + /blog/2014/11/19/sql-on-mongodb/ + + + blog + + + + + Modified: drill/site/trunk/content/drill/index.html URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/index.html?rev=1641986&r1=1641985&r2=1641986&view=diff ============================================================================== --- drill/site/trunk/content/drill/index.html (original) +++ drill/site/trunk/content/drill/index.html Thu Nov 27 01:28:55 2014 @@ -8,18 +8,18 @@ Apache Drill - Self-Service Data Exploration - - - - - - - - - - - - + + + + + + + + + + + + @@ -33,32 +33,32 @@ @@ -77,7 +77,7 @@

Apache Drill

Self-Service Data Exploration

- NOW READY FOR DOWNLOAD + NOW READY FOR DOWNLOAD
Modified: drill/site/trunk/content/drill/overview/index.html URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/overview/index.html?rev=1641986&r1=1641985&r2=1641986&view=diff ============================================================================== --- drill/site/trunk/content/drill/overview/index.html (original) +++ drill/site/trunk/content/drill/overview/index.html Thu Nov 27 01:28:55 2014 @@ -8,18 +8,18 @@ Overview - Apache Drill - - - - - - - - - - - - + + + + + + + + + + + + @@ -33,32 +33,32 @@ Added: drill/site/trunk/content/drill/static/sql-on-mongodb/empinfo.json URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/static/sql-on-mongodb/empinfo.json?rev=1641986&view=auto ============================================================================== --- drill/site/trunk/content/drill/static/sql-on-mongodb/empinfo.json (added) +++ drill/site/trunk/content/drill/static/sql-on-mongodb/empinfo.json Thu Nov 27 01:28:55 2014 @@ -0,0 +1,18 @@ +{"employee_id":1101,"full_name":"Steve Eurich","first_name":"Steve","last_name":"Eurich","position_id":16,"position":"Store T","isFTE":true} +{"employee_id":1102,"full_name":"Mary Pierson","first_name":"Mary","last_name":"Pierson","position_id":16,"position":"Store T","isFTE":true} +{"employee_id":1103,"full_name":"Leo Jones","first_name":"Leo","last_name":"Jones","position_id":16,"position":"Store Tem","isFTE":true} +{"employee_id":1104,"full_name":"Nancy Beatty","first_name":"Nancy","last_name":"Beatty","position_id":16,"position":"Store T","isFTE":false} +{"employee_id":1105,"full_name":"Clara McNight","first_name":"Clara","last_name":"McNight","position_id":16,"position":"Store","isFTE":true} +{"employee_id":1106,"full_name":"Marcella Isaacs","first_name":"Marcella","last_name":"Isaacs","position_id":17,"position":"Stor","isFTE":false} +{"employee_id":1107,"full_name":"Charlotte Yonce","first_name":"Charlotte","last_name":"Yonce","position_id":17,"position":"Stor","isFTE":true} +{"employee_id":1108,"full_name":"Benjamin Foster","first_name":"Benjamin","last_name":"Foster","position_id":17,"position":"Stor","isFTE":false} +{"employee_id":1109,"full_name":"John Reed","first_name":"John","last_name":"Reed","position_id":17,"position":"Store Per","isFTE":false} +{"employee_id":1110,"full_name":"Lynn Kwiatkowski","first_name":"Lynn","last_name":"Kwiatkowski","position_id":17,"position":"St","isFTE":true} +{"employee_id":1111,"full_name":"Donald Vann","first_name":"Donald","last_name":"Vann","position_id":17,"position":"Store Per","isFTE":false} +{"employee_id":1112,"full_name":"William Smith","first_name":"William","last_name":"Smith","position_id":17,"position":"St","isFTE":true} +{"employee_id":1113,"full_name":"Amy Hensley","first_name":"Amy","last_name":"Hensley","position_id":17,"position":"Store Pe","isFTE":false} +{"employee_id":1114,"full_name":"Judy Owens","first_name":"Judy","last_name":"Owens","position_id":17,"position":"Store Per","isFTE":true} +{"employee_id":1115,"full_name":"Frederick Castillo","first_name":"Frederick","last_name":"Castillo","position_id":17,"position":"S","isFTE":false} +{"employee_id":1116,"full_name":"Phil Munoz","first_name":"Phil","last_name":"Munoz","position_id":17,"position":"Store Per","isFTE":false} +{"employee_id":1117,"full_name":"Lori Lightfoot","first_name":"Lori","last_name":"Lightfoot","position_id":17,"position":"Store","isFTE":true} +{"employee_id":1,"full_name":"Kumar","first_name":"Anil","last_name":"B","position_id":19,"position":"Store","isFTE":true} \ No newline at end of file Added: drill/site/trunk/content/drill/static/sql-on-mongodb/replicated.png URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/static/sql-on-mongodb/replicated.png?rev=1641986&view=auto ============================================================================== Binary file - no diff available. Propchange: drill/site/trunk/content/drill/static/sql-on-mongodb/replicated.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: drill/site/trunk/content/drill/static/sql-on-mongodb/sharded.png URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/static/sql-on-mongodb/sharded.png?rev=1641986&view=auto ============================================================================== Binary file - no diff available. Propchange: drill/site/trunk/content/drill/static/sql-on-mongodb/sharded.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: drill/site/trunk/content/drill/static/sql-on-mongodb/standalone.png URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/static/sql-on-mongodb/standalone.png?rev=1641986&view=auto ============================================================================== Binary file - no diff available. Propchange: drill/site/trunk/content/drill/static/sql-on-mongodb/standalone.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Modified: drill/site/trunk/content/drill/team/index.html URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/team/index.html?rev=1641986&r1=1641985&r2=1641986&view=diff ============================================================================== --- drill/site/trunk/content/drill/team/index.html (original) +++ drill/site/trunk/content/drill/team/index.html Thu Nov 27 01:28:55 2014 @@ -8,18 +8,18 @@ Team - Apache Drill - - - - - - - - - - - - + + + + + + + + + + + + @@ -33,32 +33,32 @@ @@ -74,7 +74,7 @@

Current list of Drill committers:

-Jacques Nadeau    <jacques at apache dot org> +Jacques Nadeau    <jacques at apache dot org>    (PMC Chair and VP Apache Drill)
Tomer Shiran    <tshiran at apache dot org>
Ted Dunning    <tdunning at apache dot org>
Jason Frantz    <jason at apache dot org> Modified: drill/site/trunk/content/drill/why/index.html URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/why/index.html?rev=1641986&r1=1641985&r2=1641986&view=diff ============================================================================== --- drill/site/trunk/content/drill/why/index.html (original) +++ drill/site/trunk/content/drill/why/index.html Thu Nov 27 01:28:55 2014 @@ -8,18 +8,18 @@ Why Drill - Apache Drill - - - - - - - - - - - - + + + + + + + + + + + + @@ -33,32 +33,32 @@