drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tshi...@apache.org
Subject svn commit: r1644262 - in /drill/site/trunk/content/drill: ./ blog/ blog/2014/12/02/drill-top-level-project/ blog/2014/12/09/ blog/2014/12/09/running-sql-queries-on-amazon-s3/
Date Wed, 10 Dec 2014 00:24:19 GMT
Author: tshiran
Date: Wed Dec 10 00:24:19 2014
New Revision: 1644262

URL: http://svn.apache.org/r1644262
Log:
S3 blog post

Added:
    drill/site/trunk/content/drill/blog/2014/12/09/
    drill/site/trunk/content/drill/blog/2014/12/09/running-sql-queries-on-amazon-s3/
    drill/site/trunk/content/drill/blog/2014/12/09/running-sql-queries-on-amazon-s3/index.html
Modified:
    drill/site/trunk/content/drill/blog/2014/12/02/drill-top-level-project/index.html
    drill/site/trunk/content/drill/blog/index.html
    drill/site/trunk/content/drill/feed.xml
    drill/site/trunk/content/drill/index.html

Modified: drill/site/trunk/content/drill/blog/2014/12/02/drill-top-level-project/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/blog/2014/12/02/drill-top-level-project/index.html?rev=1644262&r1=1644261&r2=1644262&view=diff
==============================================================================
--- drill/site/trunk/content/drill/blog/2014/12/02/drill-top-level-project/index.html (original)
+++ drill/site/trunk/content/drill/blog/2014/12/02/drill-top-level-project/index.html Wed
Dec 10 00:24:19 2014
@@ -68,7 +68,7 @@
     <h1 class="post-title">Apache Drill Graduates to a Top-Level Project</h1>
     <p class="post-meta"><strong>Date:</strong> Dec 2, 2014
 
-<br/><strong>Authors:</strong> Tomer Shiran, Apache Drill Founder and PMC
member
+<br/><strong>Authors:</strong> Tomer Shiran, Apache Drill Founder, PMC
Member and Committer
 </p>
   </header>
 

Added: drill/site/trunk/content/drill/blog/2014/12/09/running-sql-queries-on-amazon-s3/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/blog/2014/12/09/running-sql-queries-on-amazon-s3/index.html?rev=1644262&view=auto
==============================================================================
--- drill/site/trunk/content/drill/blog/2014/12/09/running-sql-queries-on-amazon-s3/index.html
(added)
+++ drill/site/trunk/content/drill/blog/2014/12/09/running-sql-queries-on-amazon-s3/index.html
Wed Dec 10 00:24:19 2014
@@ -0,0 +1,186 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Running SQL Queries on Amazon S3 - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes"
target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue
Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill"
target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding:
0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="post int_text">
+
+  <header class="post-header">
+    <h1 class="post-title">Running SQL Queries on Amazon S3</h1>
+    <p class="post-meta"><strong>Date:</strong> Dec 9, 2014
+
+<br/><strong>Authors:</strong> Nick Amato, MapR Technologies
+</p>
+  </header>
+
+  <article class="post-content">
+    <h1>Running SQL Queries on Amazon S3</h1>
+
+<p>The functionality and sheer usefulness of Drill is growing fast.  If you&#39;re
a user of some of the popular BI tools out there like Tableau or SAP Lumira, now is a good
time to take a look at how Drill can make your life easier, especially if  you&#39;re
faced with the task of quickly getting a handle on large sets of unstructured data.  With
schema generated on the fly, you can save a lot of time and headaches by running SQL queries
on the data where it rests without knowing much about columns or formats.  There&#39;s
even more good news:  Drill also works with data stored in the cloud.  With a few simple steps,
you can configure the S3 storage plugin for Drill and be off to the races running queries.
 In this post we&#39;ll look at how to configure Drill to access data stored in an S3
bucket.</p>
+
+<p>If you&#39;re more of a visual person, you can skip this article entirely and
<a href="https://www.youtube.com/watch?v=w8gZ2nn_ZUQ">go straight to a video</a>
I put together that walks through an end-to-end example with Tableau.  This example is easily
extended to other BI tools, as the steps are identical on the Drill side.</p>
+
+<p>At a high level, configuring Drill to access S3 bucket data is accomplished with
the following steps on each node running a drillbit.</p>
+
+<ul>
+<li>Download and install the <a href="http://www.jets3t.org/">JetS3t</a>
JAR files and enable them.</li>
+<li>Add your S3 credentials in the relevant XML configuration file.</li>
+<li>Configure and enable the S3 storage plugin through the Drill web interface.</li>
+<li>Connect your BI tool of choice and query away.</li>
+</ul>
+
+<p>Consult the <a href="https://cwiki.apache.org/confluence/display/DRILL/Architectural+Overview">Architectural
Overview</a> for a refresher on the architecture of Drill.</p>
+
+<h3>Prerequisites</h3>
+
+<p>These steps assume you have a <a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes">typical
Drill cluster and ZooKeeper quorum</a> configured and running.  To access data in S3,
you will need an S3 bucket configured and have the required Amazon security credentials in
your possession.  An <a href="http://blogs.aws.amazon.com/security/post/Tx1R9KDN9ISZ0HF/Where-s-my-secret-access-key">Amazon
blog post</a> has more information on how to get these from your account.</p>
+
+<h3>Configuration Steps</h3>
+
+<p>To connect Drill to S3, all of the drillbit nodes will need to access code in the
JetS3t library developed by Amazon.  As of this writing, 0.9.2 is the latest version but you
might want to check <a href="https://jets3t.s3.amazonaws.com/toolkit/toolkit.html">the
main page</a> to see if anything has been updated.  Be sure to get version 0.9.2 or
later as earlier versions have a bug relating to reading Parquet data.</p>
+<div class="highlight"><pre><code class="language-bash" data-lang="bash">wget
http://bitbucket.org/jmurty/jets3t/downloads/jets3t-0.9.2.zip
+cp jets3t-0.9.2/jars/jets3t-0.9.2.jar <span class="nv">$DRILL_HOME</span>/jars/3rdparty
+</code></pre></div>
+<p>Next, enable the plugin by editing the file:</p>
+<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span
class="nv">$DRILL_HOME</span>/bin/hadoop_excludes.txt
+</code></pre></div>
+<p>and removing the line <code>jets3t</code>.</p>
+
+<p>Drill will need to know your S3 credentials in order to access data there. These
credentials will need to be placed in the core-site.xml file for your installation.  If you
already have a core-site.xml file configured for your environment, add the following parameters
to it, otherwise create the file from scratch.  If you do end up creating it from scratch
you will need to wrap these parameters with <code>&lt;configuration&gt;</code>
and <code>&lt;/configuration&gt;</code>.</p>
+<div class="highlight"><pre><code class="language-xml" data-lang="xml"><span
class="nt">&lt;property&gt;</span>
+  <span class="nt">&lt;name&gt;</span>fs.s3.awsAccessKeyId<span class="nt">&lt;/name&gt;</span>
+  <span class="nt">&lt;value&gt;</span>ID<span class="nt">&lt;/value&gt;</span>
+<span class="nt">&lt;/property&gt;</span>
+
+<span class="nt">&lt;property&gt;</span>
+  <span class="nt">&lt;name&gt;</span>fs.s3.awsSecretAccessKey<span
class="nt">&lt;/name&gt;</span>
+  <span class="nt">&lt;value&gt;</span>SECRET<span class="nt">&lt;/value&gt;</span>
+<span class="nt">&lt;/property&gt;</span>
+
+<span class="nt">&lt;property&gt;</span>
+  <span class="nt">&lt;name&gt;</span>fs.s3n.awsAccessKeyId<span class="nt">&lt;/name&gt;</span>
+  <span class="nt">&lt;value&gt;</span>ID<span class="nt">&lt;/value&gt;</span>
+<span class="nt">&lt;/property&gt;</span>
+
+<span class="nt">&lt;property&gt;</span>
+  <span class="nt">&lt;name&gt;</span>fs.s3n.awsSecretAccessKey<span
class="nt">&lt;/name&gt;</span>
+  <span class="nt">&lt;value&gt;</span>SECRET<span class="nt">&lt;/value&gt;</span>
+<span class="nt">&lt;/property&gt;</span>
+</code></pre></div>
+<p>The steps so far give Drill enough information to connect to the S3 service.  Remember,
you have to do this on all the nodes running drillbit.</p>
+
+<p>Next, let&#39;s go into the Drill web interface and enable the S3 storage plugin.
 In this case you only need to connect to <strong>one</strong> of the nodes because
Drill&#39;s configuration is synchronized across the cluster.  Complete the following
steps:</p>
+
+<ol>
+<li>Point your browser to <code>http://&lt;host&gt;:8047</code></li>
+<li>Select the &#39;Storage&#39; tab.</li>
+<li>A good starting configuration for S3 can be entirely the same as the <code>dfs</code>
plugin, except the connection parameter is changed to <code>s3://bucket</code>.
 So first select the <code>Update</code> button for <code>dfs</code>,
then select the text area and copy it into the clipboard (on Windows, ctrl-A, ctrl-C works).</li>
+<li>Press <code>Back</code>, then create a new plugin by typing the name
into the <code>New Storage Plugin</code>, then press <code>Create</code>.
 You can choose any name, but a good convention is to use <code>s3-&lt;bucketname&gt;</code>
so you can easily identify it later.</li>
+<li>In the configuration area, paste the configuration you just grabbed from &#39;dfs&#39;.
 Change the line <code>connection: &quot;file:///&quot;</code> to <code>connection:
&quot;s3://&lt;bucket&gt;&quot;</code>.</li>
+<li>Click <code>Update</code>.  You should see a message that indicates
success.</li>
+</ol>
+
+<p>At this point you can run queries on the data directly and you have a couple of
options on how you want to access it.  You can use Drill Explorer and create a custom view
(based on an SQL query) that you can then access in Tableau or other BI tools, or just use
Drill directly from within the tool.</p>
+
+<p>You may want to check out the <a href="http://www.youtube.com/watch?v=jNUsprJNQUg">Tableau
demo</a>.</p>
+
+<p>With just a few lines of configuration, you&#39;ve just opened the vast world
of data available in the Amazon cloud and reduced the amount of work you have to do in advance
to access data stored there with SQL.  There are even some <a href="https://aws.amazon.com/datasets">public
datasets</a> available directly on S3 that are great for experimentation.</p>
+
+<p>Happy Drilling!</p>
+
+  </article>
+ <div id="disqus_thread"></div>
+    <script type="text/javascript">
+        /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
+        var disqus_shortname = 'drill'; // required: replace example with your forum shortname
+
+        /* * * DON'T EDIT BELOW THIS LINE * * */
+        (function() {
+            var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async
= true;
+            dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
+            (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+        })();
+    </script>
+    <noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments
powered by Disqus.</a></noscript>
+    
+</div>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License,
Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other
names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Modified: drill/site/trunk/content/drill/blog/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/blog/index.html?rev=1644262&r1=1644261&r2=1644262&view=diff
==============================================================================
--- drill/site/trunk/content/drill/blog/index.html (original)
+++ drill/site/trunk/content/drill/blog/index.html Wed Dec 10 00:24:19 2014
@@ -68,6 +68,8 @@
 </div>
 
 <div class="int_text" align="left"><!-- previously: site.posts -->
+<p><a class="post-link" href="/blog/2014/12/09/running-sql-queries-on-amazon-s3/">Running
SQL Queries on Amazon S3</a> (Dec 9, 2014)<br/>Drill enables you to run SQL queries
directly on data in S3. There's no need to ingest the data into a managed cluster or transform
the data. This is a step-by-step tutorial on how to use Drill with S3.</p>
+<!-- previously: site.posts -->
 <p><a class="post-link" href="/blog/2014/12/02/drill-top-level-project/">Apache
Drill Graduates to a Top-Level Project</a> (Dec 2, 2014)<br/>Drill has graduated
to a Top-Level Project at Apache. This marks a significant accomplishment for the Drill community,
which now includes dozens of developers working at a variety of companies.</p>
 <!-- previously: site.posts -->
 <p><a class="post-link" href="/blog/2014/11/19/sql-on-mongodb/">SQL on MongoDB</a>
(Nov 19, 2014)<br/>The MongoDB storage plugin for Drill enables analytical queries on
MongoDB databases. Drill's schema-free JSON data model is a natural fit for MongoDB's data
model.</p>

Modified: drill/site/trunk/content/drill/feed.xml
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/feed.xml?rev=1644262&r1=1644261&r2=1644262&view=diff
==============================================================================
--- drill/site/trunk/content/drill/feed.xml (original)
+++ drill/site/trunk/content/drill/feed.xml Wed Dec 10 00:24:19 2014
@@ -6,11 +6,96 @@
 </description>
     <link>/</link>
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Tue, 02 Dec 2014 06:43:12 -0800</pubDate>
-    <lastBuildDate>Tue, 02 Dec 2014 06:43:12 -0800</lastBuildDate>
+    <pubDate>Tue, 09 Dec 2014 16:22:33 -0800</pubDate>
+    <lastBuildDate>Tue, 09 Dec 2014 16:22:33 -0800</lastBuildDate>
     <generator>Jekyll v2.5.1</generator>
     
       <item>
+        <title>Running SQL Queries on Amazon S3</title>
+        <description>&lt;h1&gt;Running SQL Queries on Amazon S3&lt;/h1&gt;
+
+&lt;p&gt;The functionality and sheer usefulness of Drill is growing fast.  If you&amp;#39;re
a user of some of the popular BI tools out there like Tableau or SAP Lumira, now is a good
time to take a look at how Drill can make your life easier, especially if  you&amp;#39;re
faced with the task of quickly getting a handle on large sets of unstructured data.  With
schema generated on the fly, you can save a lot of time and headaches by running SQL queries
on the data where it rests without knowing much about columns or formats.  There&amp;#39;s
even more good news:  Drill also works with data stored in the cloud.  With a few simple steps,
you can configure the S3 storage plugin for Drill and be off to the races running queries.
 In this post we&amp;#39;ll look at how to configure Drill to access data stored in an
S3 bucket.&lt;/p&gt;
+
+&lt;p&gt;If you&amp;#39;re more of a visual person, you can skip this article
entirely and &lt;a href=&quot;https://www.youtube.com/watch?v=w8gZ2nn_ZUQ&quot;&gt;go
straight to a video&lt;/a&gt; I put together that walks through an end-to-end example
with Tableau.  This example is easily extended to other BI tools, as the steps are identical
on the Drill side.&lt;/p&gt;
+
+&lt;p&gt;At a high level, configuring Drill to access S3 bucket data is accomplished
with the following steps on each node running a drillbit.&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;Download and install the &lt;a href=&quot;http://www.jets3t.org/&quot;&gt;JetS3t&lt;/a&gt;
JAR files and enable them.&lt;/li&gt;
+&lt;li&gt;Add your S3 credentials in the relevant XML configuration file.&lt;/li&gt;
+&lt;li&gt;Configure and enable the S3 storage plugin through the Drill web interface.&lt;/li&gt;
+&lt;li&gt;Connect your BI tool of choice and query away.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Consult the &lt;a href=&quot;https://cwiki.apache.org/confluence/display/DRILL/Architectural+Overview&quot;&gt;Architectural
Overview&lt;/a&gt; for a refresher on the architecture of Drill.&lt;/p&gt;
+
+&lt;h3&gt;Prerequisites&lt;/h3&gt;
+
+&lt;p&gt;These steps assume you have a &lt;a href=&quot;https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes&quot;&gt;typical
Drill cluster and ZooKeeper quorum&lt;/a&gt; configured and running.  To access data
in S3, you will need an S3 bucket configured and have the required Amazon security credentials
in your possession.  An &lt;a href=&quot;http://blogs.aws.amazon.com/security/post/Tx1R9KDN9ISZ0HF/Where-s-my-secret-access-key&quot;&gt;Amazon
blog post&lt;/a&gt; has more information on how to get these from your account.&lt;/p&gt;
+
+&lt;h3&gt;Configuration Steps&lt;/h3&gt;
+
+&lt;p&gt;To connect Drill to S3, all of the drillbit nodes will need to access code
in the JetS3t library developed by Amazon.  As of this writing, 0.9.2 is the latest version
but you might want to check &lt;a href=&quot;https://jets3t.s3.amazonaws.com/toolkit/toolkit.html&quot;&gt;the
main page&lt;/a&gt; to see if anything has been updated.  Be sure to get version 0.9.2
or later as earlier versions have a bug relating to reading Parquet data.&lt;/p&gt;
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;
data-lang=&quot;bash&quot;&gt;wget http://bitbucket.org/jmurty/jets3t/downloads/jets3t-0.9.2.zip
+cp jets3t-0.9.2/jars/jets3t-0.9.2.jar &lt;span class=&quot;nv&quot;&gt;$DRILL_HOME&lt;/span&gt;/jars/3rdparty
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;Next, enable the plugin by editing the file:&lt;/p&gt;
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;
data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$DRILL_HOME&lt;/span&gt;/bin/hadoop_excludes.txt
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;and removing the line &lt;code&gt;jets3t&lt;/code&gt;.&lt;/p&gt;
+
+&lt;p&gt;Drill will need to know your S3 credentials in order to access data there.
These credentials will need to be placed in the core-site.xml file for your installation.
 If you already have a core-site.xml file configured for your environment, add the following
parameters to it, otherwise create the file from scratch.  If you do end up creating it from
scratch you will need to wrap these parameters with &lt;code&gt;&amp;lt;configuration&amp;gt;&lt;/code&gt;
and &lt;code&gt;&amp;lt;/configuration&amp;gt;&lt;/code&gt;.&lt;/p&gt;
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;
data-lang=&quot;xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;property&amp;gt;&lt;/span&gt;
+  &lt;span class=&quot;nt&quot;&gt;&amp;lt;name&amp;gt;&lt;/span&gt;fs.s3.awsAccessKeyId&lt;span
class=&quot;nt&quot;&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
+  &lt;span class=&quot;nt&quot;&gt;&amp;lt;value&amp;gt;&lt;/span&gt;ID&lt;span
class=&quot;nt&quot;&gt;&amp;lt;/value&amp;gt;&lt;/span&gt;
+&lt;span class=&quot;nt&quot;&gt;&amp;lt;/property&amp;gt;&lt;/span&gt;
+
+&lt;span class=&quot;nt&quot;&gt;&amp;lt;property&amp;gt;&lt;/span&gt;
+  &lt;span class=&quot;nt&quot;&gt;&amp;lt;name&amp;gt;&lt;/span&gt;fs.s3.awsSecretAccessKey&lt;span
class=&quot;nt&quot;&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
+  &lt;span class=&quot;nt&quot;&gt;&amp;lt;value&amp;gt;&lt;/span&gt;SECRET&lt;span
class=&quot;nt&quot;&gt;&amp;lt;/value&amp;gt;&lt;/span&gt;
+&lt;span class=&quot;nt&quot;&gt;&amp;lt;/property&amp;gt;&lt;/span&gt;
+
+&lt;span class=&quot;nt&quot;&gt;&amp;lt;property&amp;gt;&lt;/span&gt;
+  &lt;span class=&quot;nt&quot;&gt;&amp;lt;name&amp;gt;&lt;/span&gt;fs.s3n.awsAccessKeyId&lt;span
class=&quot;nt&quot;&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
+  &lt;span class=&quot;nt&quot;&gt;&amp;lt;value&amp;gt;&lt;/span&gt;ID&lt;span
class=&quot;nt&quot;&gt;&amp;lt;/value&amp;gt;&lt;/span&gt;
+&lt;span class=&quot;nt&quot;&gt;&amp;lt;/property&amp;gt;&lt;/span&gt;
+
+&lt;span class=&quot;nt&quot;&gt;&amp;lt;property&amp;gt;&lt;/span&gt;
+  &lt;span class=&quot;nt&quot;&gt;&amp;lt;name&amp;gt;&lt;/span&gt;fs.s3n.awsSecretAccessKey&lt;span
class=&quot;nt&quot;&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
+  &lt;span class=&quot;nt&quot;&gt;&amp;lt;value&amp;gt;&lt;/span&gt;SECRET&lt;span
class=&quot;nt&quot;&gt;&amp;lt;/value&amp;gt;&lt;/span&gt;
+&lt;span class=&quot;nt&quot;&gt;&amp;lt;/property&amp;gt;&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;The steps so far give Drill enough information to connect to the S3 service.
 Remember, you have to do this on all the nodes running drillbit.&lt;/p&gt;
+
+&lt;p&gt;Next, let&amp;#39;s go into the Drill web interface and enable the S3
storage plugin.  In this case you only need to connect to &lt;strong&gt;one&lt;/strong&gt;
of the nodes because Drill&amp;#39;s configuration is synchronized across the cluster.
 Complete the following steps:&lt;/p&gt;
+
+&lt;ol&gt;
+&lt;li&gt;Point your browser to &lt;code&gt;http://&amp;lt;host&amp;gt;:8047&lt;/code&gt;&lt;/li&gt;
+&lt;li&gt;Select the &amp;#39;Storage&amp;#39; tab.&lt;/li&gt;
+&lt;li&gt;A good starting configuration for S3 can be entirely the same as the &lt;code&gt;dfs&lt;/code&gt;
plugin, except the connection parameter is changed to &lt;code&gt;s3://bucket&lt;/code&gt;.
 So first select the &lt;code&gt;Update&lt;/code&gt; button for &lt;code&gt;dfs&lt;/code&gt;,
then select the text area and copy it into the clipboard (on Windows, ctrl-A, ctrl-C works).&lt;/li&gt;
+&lt;li&gt;Press &lt;code&gt;Back&lt;/code&gt;, then create a new
plugin by typing the name into the &lt;code&gt;New Storage Plugin&lt;/code&gt;,
then press &lt;code&gt;Create&lt;/code&gt;.  You can choose any name, but
a good convention is to use &lt;code&gt;s3-&amp;lt;bucketname&amp;gt;&lt;/code&gt;
so you can easily identify it later.&lt;/li&gt;
+&lt;li&gt;In the configuration area, paste the configuration you just grabbed from
&amp;#39;dfs&amp;#39;.  Change the line &lt;code&gt;connection: &amp;quot;file:///&amp;quot;&lt;/code&gt;
to &lt;code&gt;connection: &amp;quot;s3://&amp;lt;bucket&amp;gt;&amp;quot;&lt;/code&gt;.&lt;/li&gt;
+&lt;li&gt;Click &lt;code&gt;Update&lt;/code&gt;.  You should see
a message that indicates success.&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;p&gt;At this point you can run queries on the data directly and you have a couple
of options on how you want to access it.  You can use Drill Explorer and create a custom view
(based on an SQL query) that you can then access in Tableau or other BI tools, or just use
Drill directly from within the tool.&lt;/p&gt;
+
+&lt;p&gt;You may want to check out the &lt;a href=&quot;http://www.youtube.com/watch?v=jNUsprJNQUg&quot;&gt;Tableau
demo&lt;/a&gt;.&lt;/p&gt;
+
+&lt;p&gt;With just a few lines of configuration, you&amp;#39;ve just opened the
vast world of data available in the Amazon cloud and reduced the amount of work you have to
do in advance to access data stored there with SQL.  There are even some &lt;a href=&quot;https://aws.amazon.com/datasets&quot;&gt;public
datasets&lt;/a&gt; available directly on S3 that are great for experimentation.&lt;/p&gt;
+
+&lt;p&gt;Happy Drilling!&lt;/p&gt;
+</description>
+        <pubDate>Tue, 09 Dec 2014 10:50:01 -0800</pubDate>
+        <link>/blog/2014/12/09/running-sql-queries-on-amazon-s3/</link>
+        <guid isPermaLink="true">/blog/2014/12/09/running-sql-queries-on-amazon-s3/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>Apache Drill Graduates to a Top-Level Project</title>
         <description>&lt;p&gt;The Apache Software Foundation has just announced
that it has promoted Drill to a top-level project at Apache, similar to other well-known projects
like Apache Hadoop and httpd (the world&amp;#39;s most popular Web server). This marks
a significant accomplishment for the Drill community, and I wanted to personally thank everyone
who has contributed to the project. It takes many people, and countless hours, to develop
something as complex and innovative as Drill.&lt;/p&gt;
 

Modified: drill/site/trunk/content/drill/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/index.html?rev=1644262&r1=1644261&r2=1644262&view=diff
==============================================================================
--- drill/site/trunk/content/drill/index.html (original)
+++ drill/site/trunk/content/drill/index.html Wed Dec 10 00:24:19 2014
@@ -126,7 +126,7 @@
   <h1>Compatibility with existing SQL environments<br>and Apache Hive deployments</h1>
   <br><br>
   <img src="images/home-img3.jpg" width="380" alt="Compatibility with existing SQL environments
and Apache Hive deployments">
-  <p>With Drill, businesses can minimize switching costs and learning curves for users
with the familiar ANSI SQL syntax. Analysts can continue to use familiar BI/analytics tools
that assume and auto-generate ANSI SQL code to interact with Hadoop data by leveraging the
standard JDBC/ODBC interfaces that Drill exposes. Users can also plug-and-play with Hive environments
to enable ad-hoc low latency queries on existing Hive tables and reuse Hive's metadata, hundreds
of file formats and UDFs out-of-the-box.</p>
+  <p>With Drill, businesses can minimize switching costs and learning curves for users
with the familiar ANSI SQL syntax. Analysts can continue to use familiar BI/analytics tools
that assume and auto-generate ANSI SQL code to interact with Hadoop data by leveraging the
standard JDBC/ODBC interfaces that Drill exposes. Users can also plug-and-play with Hive environments
to enable ad-hoc low latency queries on existing Hive tables and reuse Hive's metadata, hundreds
of file formats and UDFs out of the box.</p>
 </div>
 
 



Mime
View raw message