drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tshi...@apache.org
Subject svn commit: r1646095 - in /drill/site/trunk/content/drill: blog/2014/12/16/ blog/2014/12/16/whats-coming-in-2015/ blog/2014/12/16/whats-coming-in-2015/index.html blog/index.html css/style.css download/index.html feed.xml
Date Tue, 16 Dec 2014 21:38:02 GMT
Author: tshiran
Date: Tue Dec 16 21:38:01 2014
New Revision: 1646095

URL: http://svn.apache.org/r1646095
Log:
Fixed download buttons. Added blog post on 2015 plans.

Added:
    drill/site/trunk/content/drill/blog/2014/12/16/
    drill/site/trunk/content/drill/blog/2014/12/16/whats-coming-in-2015/
    drill/site/trunk/content/drill/blog/2014/12/16/whats-coming-in-2015/index.html
Modified:
    drill/site/trunk/content/drill/blog/index.html
    drill/site/trunk/content/drill/css/style.css
    drill/site/trunk/content/drill/download/index.html
    drill/site/trunk/content/drill/feed.xml

Added: drill/site/trunk/content/drill/blog/2014/12/16/whats-coming-in-2015/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/blog/2014/12/16/whats-coming-in-2015/index.html?rev=1646095&view=auto
==============================================================================
--- drill/site/trunk/content/drill/blog/2014/12/16/whats-coming-in-2015/index.html (added)
+++ drill/site/trunk/content/drill/blog/2014/12/16/whats-coming-in-2015/index.html Tue Dec 16 21:38:01 2014
@@ -0,0 +1,240 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>What's Coming in 2015? - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes" target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill" target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding: 0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="post int_text">
+
+  <header class="post-header">
+    <h1 class="post-title">What's Coming in 2015?</h1>
+    <p class="post-meta">
+
+<strong>Authors:</strong> Tomer Shiran, Apache Drill Founder, PMC Member and Committer
+<br/><strong>Date:</strong> Dec 16, 2014
+</p>
+  </header>
+  <div class="addthis_sharing_toolbox"></div>
+
+  <article class="post-content">
+    <p>2014 was an exciting year for the Drill community. In August we made Drill available for downloads, and last week the Apache Software Foundation promoted Drill to a top-level project. Many of you have asked me what&#39;s coming next, so I decided to sit down and outline some of the interesting initiatives that the Drill community is currently working on:</p>
+
+<ul>
+<li>Flexible Access Control</li>
+<li>JSON in Any Shape or Form</li>
+<li>Advanced SQL</li>
+<li>New Data Sources</li>
+<li>Drill/Spark Integration</li>
+<li>Operational Enhancements: Speed, Scalability and Workload Management</li>
+</ul>
+
+<p>This is by no means intended to be an exhaustive list of everything that will be added to Drill in 2015. With Drill&#39;s rapidly expanding community, I anticipate that you&#39;ll see a whole lot more.</p>
+
+<h2>Flexible Access Control</h2>
+
+<p>Many organizations are now interested in providing Drill as a service to their users, supporting many users, groups and organizations with a single cluster. To do so, they need to be able to control who can access what data. Today&#39;s volume and variety of data requires a new approach to access control. For example, it is becoming impractical for organizations to manage a standalone, centralized repository of permissions for every column/row of every table. Drill&#39;s virtual datasets (views) provide a more scalable solution to access control:</p>
+
+<ul>
+<li>The user creates a virtual dataset (<code>CREATE VIEW vd AS SELECT ...</code>), selecting the data to be exposed/shared. The virtual dataset is defined as a SQL statement. For example, a virtual dataset may represent only the records that were created in the last 30 days and don&#39;t have the <code>restricted</code> flag. It could even mask some columns. Drill&#39;s virtual datasets (just the SQL statement) are stored as files in the file system, so users can leverage file system permissions to control who can access the virtual dataset, without granting access to the source data.</li>
+<li>A virtual dataset is owned by a specific user and can only &quot;select&quot; data that the owner has access to. The data sources (HDFS, HBase, MongoDB, etc.) are responsible for access control decisions. Users and administrators do not need to define separate permissions inside Drill or utilize yet another centralized permission repository, such as Sentry and Ranger.</li>
+</ul>
+
+<h2>JSON in Any Shape or Form</h2>
+
+<p>When data is <strong>Big</strong> (as in Big Data), it is painful to copy and transform it. Users should be able to explore the raw data without (or at least prior to) transforming it into another format. Drill is designed to enable in-situ analytics. Just point it at a file or directory and run the queries.</p>
+
+<p>JSON has emerged as the most common self-describing format, and Drill is able to query JSON files out of the box. Drill currently assumes that the JSON documents (or records) are stored sequentially in a file:</p>
+<div class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span> <span class="nt">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;Lee&quot;</span><span class="p">,</span> <span class="nt">&quot;yelping_since&quot;</span><span class="p">:</span> <span class="s2">&quot;2012-02&quot;</span> <span class="p">}</span>
+<span class="p">{</span> <span class="nt">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;Matthew&quot;</span><span class="p">,</span> <span class="nt">&quot;yelping_since&quot;</span><span class="p">:</span> <span class="s2">&quot;2011-12&quot;</span> <span class="p">}</span>
+<span class="p">{</span> <span class="nt">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;Jasmine&quot;</span><span class="p">,</span> <span class="nt">&quot;yelping_since&quot;</span><span class="p">:</span> <span class="s2">&quot;2010-09&quot;</span> <span class="p">}</span>
+</code></pre></div>
+<p>However, many JSON-based datasets, ranging from <a href="http://data.gov">data.gov</a> (government) datasets to Twitter API responses, are not organized as simple sequences of JSON documents. In some cases the actual records are listed as elements of an internal array inside a single JSON document. For example, consider the following file, which technically consists of a single JSON document, but really contains three records (under the <code>data.records</code> field):</p>
+<div class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span>
+  <span class="nt">&quot;metadata&quot;</span><span class="p">:</span> <span class="err">...</span><span class="p">,</span>
+  <span class="nt">&quot;data&quot;</span><span class="p">:</span> <span class="p">{</span>
+    <span class="nt">&quot;records&quot;</span><span class="p">:</span> <span class="p">[</span>
+      <span class="p">{</span> <span class="nt">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;Lee&quot;</span><span class="p">,</span> <span class="nt">&quot;yelping_since&quot;</span><span class="p">:</span> <span class="s2">&quot;2012-02&quot;</span> <span class="p">},</span>
+      <span class="p">{</span> <span class="nt">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;Matthew&quot;</span><span class="p">,</span> <span class="nt">&quot;yelping_since&quot;</span><span class="p">:</span> <span class="s2">&quot;2011-12&quot;</span> <span class="p">},</span>
+      <span class="p">{</span> <span class="nt">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;Jasmine&quot;</span><span class="p">,</span> <span class="nt">&quot;yelping_since&quot;</span><span class="p">:</span> <span class="s2">&quot;2010-09&quot;</span> <span class="p">}</span>
+    <span class="p">]</span>
+  <span class="p">}</span>
+<span class="p">}</span>
+</code></pre></div>
+<p>The <code>FLATTEN</code> function in Drill 0.7+ takes an array and converts each item into a top-level record:</p>
+<div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">SELECT</span> <span class="n">FLATTEN</span><span class="p">(</span><span class="k">data</span><span class="p">.</span><span class="n">records</span><span class="p">)</span> <span class="k">FROM</span> <span class="n">dfs</span><span class="p">.</span><span class="n">tmp</span><span class="p">.</span><span class="o">`</span><span class="n">foo</span><span class="p">.</span><span class="n">json</span><span class="o">`</span><span class="p">;</span>
+</code></pre></div>
+<p>You can use this as an inner query (or inside a view):</p>
+<div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="o">&gt;</span> <span class="k">SELECT</span> <span class="n">t</span><span class="p">.</span><span class="n">record</span><span class="p">.</span><span class="n">name</span> <span class="k">AS</span> <span class="n">name</span>
+  <span class="k">FROM</span> <span class="p">(</span><span class="k">SELECT</span> <span class="n">FLATTEN</span><span class="p">(</span><span class="k">data</span><span class="p">.</span><span class="n">records</span><span class="p">)</span> <span class="k">AS</span> <span class="n">record</span> <span class="k">FROM</span> <span class="n">dfs</span><span class="p">.</span><span class="n">tmp</span><span class="p">.</span><span class="o">`</span><span class="n">test</span><span class="o">/</span><span class="n">foo</span><span class="p">.</span><span class="n">json</span><span class="o">`</span><span class="p">)</span> <span class="n">t</span><span class="p">;</span>
+<span class="o">+</span><span class="c1">------------+</span>
+<span class="o">|</span>    <span class="n">name</span>    <span class="o">|</span>
+<span class="o">+</span><span class="c1">------------+</span>
+<span class="o">|</span> <span class="n">Lee</span>        <span class="o">|</span>
+<span class="o">|</span> <span class="n">Matthew</span>    <span class="o">|</span>
+<span class="o">|</span> <span class="n">Jasmine</span>    <span class="o">|</span>
+<span class="o">+</span><span class="c1">------------+</span>
+</code></pre></div>
+<p>While this works today, the dataset is technically a single JSON document, so Drill ends up reading the entire dataset into memory. We&#39;re developing a FLATTEN-pushdown mechanism that will enable the JSON reader to emit the individual records into the downstream operators, thereby making this work with datasets of arbitrary size. Once that&#39;s implemented, users will be able to explore any JSON-based dataset in-situ (ie, without having to transform it).</p>
+
+<h2>Full SQL</h2>
+
+<p>Unlike the majority of SQL engines for Hadoop and NoSQL databases, which support SQL-like languages (HiveQL, CQL, etc.), Drill is designed from the ground up to be compliant with ANSI SQL. We simply started with a real SQL parser (Apache Calcite, previously known as Optiq). We&#39;re currently implementing the remaining SQL constructs, and plan to support the full TPC-DS suite (with no query modifications) in 2015. Full SQL support makes BI tools work better, and enables users who are proficient with SQL to leverage their existing knowledge and skills.</p>
+
+<h2>New Data Sources</h2>
+
+<p>Drill is a standalone, distributed SQL engine. It has a pluggable architecture that allows it to support multiple data sources. Drill 0.6 includes storage plugins for:</p>
+
+<ul>
+<li><a href="https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html">Hadoop File System</a> implementations (local file system, HDFS, MapR-FS, Amazon S3, etc.)</li>
+<li>HBase and MapR-DB</li>
+<li>MongoDB</li>
+<li>Hive Metastore (query any dataset that is registered in Hive Metastore)</li>
+</ul>
+
+<p>A single query can join data from different systems. For example, a query can join user profiles in MongoDB with log files in Hadoop, or datasets in multiple Hadoop clusters.</p>
+
+<p>I&#39;m eager to see what storage plugins the community develops over the next 12 months. In the last few weeks alone, developers in the community have expressed their desire (on the <a href="mailto:dev@drill.apache.org">public list</a>) to develop additional storage plugins for the following data sources:</p>
+
+<ul>
+<li>Cassandra</li>
+<li>Solr</li>
+<li>JDBC (any RDBMS, including Oracle, MySQL, PostgreSQL and SQL Server)</li>
+</ul>
+
+<p>If you&#39;re interested in implementing a new storage plugin, I would encourage you to reach out to the Drill developer community on <a href="mailto:dev@drill.apache.org">dev@drill.apache.org</a>. I&#39;m looking forward to publishing an example of a single-query join across 10 data sources.</p>
+
+<h2>Drill/Spark Integration</h2>
+
+<p>We&#39;re seeing growing interest in Spark as an execution engine for data pipelines, providing an alternative to MapReduce. The Drill community is working on integrating Drill and Spark to address a few new use cases:</p>
+
+<ul>
+<li><p>Use a Drill query (or view) as the input to Spark. Drill is a powerful engine for extracting and pre-processing data from various data sources, thereby reducing development time and effort. Here&#39;s an example:</p>
+<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span>
+<span class="k">val</span> <span class="n">result</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">drillRDD</span><span class="o">(</span><span class="s">&quot;SELECT * FROM dfs.root.`path/to/logs` l, mongo.mydb.users u WHERE l.user_id = u.id GROUP BY ...&quot;</span><span class="o">)</span>
+<span class="k">val</span> <span class="n">formatted</span> <span class="k">=</span> <span class="n">result</span><span class="o">.</span><span class="n">map</span> <span class="o">{</span> <span class="n">r</span> <span class="k">=&gt;</span>
+  <span class="k">val</span> <span class="o">(</span><span class="n">first</span><span class="o">,</span> <span class="n">last</span><span class="o">,</span> <span class="n">visits</span><span class="o">)</span> <span class="k">=</span> <span class="o">(</span><span class="n">r</span><span class="o">.</span><span class="n">name</span><span class="o">.</span><span class="n">first</span><span class="o">,</span> <span class="n">r</span><span class="o">.</span><span class="n">name</span><span class="o">.</span><span class="n">last</span><span class="o">,</span> <span class="n">r</span><span class="o">.</span><span class="n">visits</span><span class="o">)</span>
+  <span class="n">s</span><span class="s">&quot;$first $last $visits&quot;</span>
+<span class="o">}</span>
+</code></pre></div></li>
+<li><p>Use Drill to query Spark RDDs. Analysts will be able to use BI tools like MicroStrategy, Spotfire and Tableau to query in-memory data in Spark. In addition, Spark developers will be able to embed Drill execution in a Spark data pipeline, thereby enjoying the power of Drill&#39;s schema-free, columnar execution engine.</p></li>
+</ul>
+
+<h2>Operational Enhancements</h2>
+
+<p>As we continue with our monthly releases and march towards the 1.0 release early next year, we&#39;re focused on improving Drill&#39;s speed and scalability. We&#39;ll also enhance Drill&#39;s multi-tenancy with more advanced workload management.</p>
+
+<ul>
+<li><strong>Speed</strong>: Drill is already extremely fast, and we&#39;re going to make it even faster over the next few months. With that said, we think that improving user productivity and time-to-insight is as important as shaving a few milliseconds off a query&#39;s runtime.</li>
+<li><strong>Scalability</strong>: To date we&#39;ve focused mainly on clusters of up to a couple hundred nodes. We&#39;re currently working to support clusters with thousands of nodes. We&#39;re also improving concurrency to better support deployments in which hundreds of analysts or developers are running queries at the same time.</li>
+<li><strong>Workload management</strong>: A single cluster is often shared among many users and groups, and everyone expects answers in real-time. Workload management prioritizes the allocation of resources to ensure that the most important workloads get done first so that business demands can be met. Administrators need to be able to assign priorities and quotas at a fine granularity. We&#39;re working on enhancing Drill&#39;s workload management to provide these capabilities while providing tight integration with YARN and Mesos.</li>
+</ul>
+
+<h2>We Would Love to Hear From You!</h2>
+
+<p>Are there other features you would like to see in Drill? We would love to hear from you:</p>
+
+<ul>
+<li>Drill users: <a href="mailto:user@drill.apache.org">user@drill.apache.org</a></li>
+<li>Drill developers: <a href="mailto:dev@drill.apache.org">dev@drill.apache.org</a></li>
+<li>Me: <a href="mailto:tshiran@apache.org">tshiran@apache.org</a></li>
+</ul>
+
+<p>Happy Drilling!<br>
+Tomer Shiran</p>
+
+  </article>
+ <div id="disqus_thread"></div>
+    <script type="text/javascript">
+        /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
+        var disqus_shortname = 'drill'; // required: replace example with your forum shortname
+
+        /* * * DON'T EDIT BELOW THIS LINE * * */
+        (function() {
+            var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+            dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
+            (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+        })();
+    </script>
+    <noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
+    
+</div>
+<script type="text/javascript" src="//s7.addthis.com/js/300/addthis_widget.js#pubid=ra-548b2caa33765e8d" async="async"></script>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Modified: drill/site/trunk/content/drill/blog/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/blog/index.html?rev=1646095&r1=1646094&r2=1646095&view=diff
==============================================================================
--- drill/site/trunk/content/drill/blog/index.html (original)
+++ drill/site/trunk/content/drill/blog/index.html Tue Dec 16 21:38:01 2014
@@ -68,6 +68,8 @@
 </div>
 
 <div class="int_text" align="left"><!-- previously: site.posts -->
+<p><a class="post-link" href="/blog/2014/12/16/whats-coming-in-2015/">What's Coming in 2015?</a> (Dec 16, 2014)<br/>Drill is now a top-level project, and the community is expanding rapidly. Find out more about some of the new features planned for 2015.</p>
+<!-- previously: site.posts -->
 <p><a class="post-link" href="/blog/2014/12/11/apache-drill-qa-panelist-spotlight/">Apache Drill Q&A Panelist Spotlight</a> (Dec 11, 2014)<br/>Join us on Twitter for a live Q&A on Wednesday, December 17.</p>
 <!-- previously: site.posts -->
 <p><a class="post-link" href="/blog/2014/12/09/running-sql-queries-on-amazon-s3/">Running SQL Queries on Amazon S3</a> (Dec 9, 2014)<br/>Drill enables you to run SQL queries directly on data in S3. There's no need to ingest the data into a managed cluster or transform the data. This is a step-by-step tutorial on how to use Drill with S3.</p>

Modified: drill/site/trunk/content/drill/css/style.css
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/css/style.css?rev=1646095&r1=1646094&r2=1646095&view=diff
==============================================================================
--- drill/site/trunk/content/drill/css/style.css (original)
+++ drill/site/trunk/content/drill/css/style.css Tue Dec 16 21:38:01 2014
@@ -788,7 +788,7 @@ div.download table a {
 	background-size:16px auto;
 	background-position:17px center;
 	background-repeat:no-repeat;
-	padding:0 35px 0 45px;
+	padding:10px 35px 10px 45px;
 	line-height:40px;
 	font-size:12px;
 	font-weight:normal;
@@ -806,12 +806,21 @@ div.download table a.dl:hover {
 }
 
 div.download table a.find {
-	background-color:#1a6bc7;
+	background-color:#4aaf4c;
 	background-image:url(../images/btn-lens.png);
 }
 
 div.download table a.find:hover {
-	background-color:#145aa8;
+	background-color:#348436;
+}
+
+div.download table a.tutorial {
+    background-color:#1a6bc7;
+    background-image:url(../images/btn-lens.png);
+}
+
+div.download table a.tutorial:hover {
+    background-color:#145aa8;
 }
 
 p.info {

Modified: drill/site/trunk/content/drill/download/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/download/index.html?rev=1646095&r1=1646094&r2=1646095&view=diff
==============================================================================
--- drill/site/trunk/content/drill/download/index.html (original)
+++ drill/site/trunk/content/drill/download/index.html Tue Dec 16 21:38:01 2014
@@ -73,15 +73,15 @@
 
   <table>
     <tr>
-      <td><a href="http://www.apache.org/dyn/closer.cgi/drill/drill-0.6.0-incubating/apache-drill-0.6.0-incubating.tar.gz" class="find" id="apachemirror" style="background-color: #4aaf4c;">FIND AN APACHE MIRROR</a></td>
+      <td><a href="http://www.apache.org/dyn/closer.cgi/drill/drill-0.6.0-incubating/apache-drill-0.6.0-incubating.tar.gz" class="find" id="apachemirror">FIND AN APACHE MIRROR</a></td>
       <td><a href="http://getdrill.org/drill/download/apache-drill-0.6.0-incubating.tar.gz" rel="nofollow" class="dl" id="directdownload">DIRECT FILE DOWNLOAD</a></td>
       <td><a href="http://doc.mapr.com/display/MapR/Step+1.+Install+the+MapR+Drill+ODBC+Driver" rel="nofollow" class="dl">ODBC DRIVERS FOR DRILL*</a></td>
     </tr>
   </table>
 
   <p style="margin-top:1px; padding-top:1px;">
-    <strong>Release Notes: </strong><a href="https://cwiki.apache.org/confluence/display/DRILL/Release+Notes">&nbsp;Click here</a> &nbsp;&nbsp;|&nbsp;&nbsp;
-    <strong>Fork Drill 0.6 on GitHub: </strong><a href="https://github.com/apache/drill/tree/0.6.0-incubating" rel="nofollow">&nbsp;Click here</a>
+    <strong>Release Notes: </strong><a href="https://cwiki.apache.org/confluence/display/DRILL/Release+Notes"> Click here</a> &nbsp;&nbsp;|&nbsp;&nbsp;
+    <strong>Fork Drill 0.6 on GitHub: </strong><a href="https://github.com/apache/drill/tree/0.6.0-incubating" rel="nofollow">Click here</a>
   </p>
 
   <br>
@@ -91,7 +91,7 @@
   <table>
     <tr>
       <td>&nbsp;</td>
-      <td style="padding-left: 38px"><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Tutorial" rel="nofollow" target="_blank" class="find">DRILL TUTORIAL</a></td>
+      <td style="padding-left: 38px"><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Tutorial" rel="nofollow" target="_blank" class="tutorial">DRILL TUTORIAL</a></td>
       <td>&nbsp;</td>
     </tr>
     <tr>

Modified: drill/site/trunk/content/drill/feed.xml
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/feed.xml?rev=1646095&r1=1646094&r2=1646095&view=diff
==============================================================================
--- drill/site/trunk/content/drill/feed.xml (original)
+++ drill/site/trunk/content/drill/feed.xml Tue Dec 16 21:38:01 2014
@@ -6,11 +6,147 @@
 </description>
     <link>/</link>
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Fri, 12 Dec 2014 10:44:56 -0800</pubDate>
-    <lastBuildDate>Fri, 12 Dec 2014 10:44:56 -0800</lastBuildDate>
+    <pubDate>Tue, 16 Dec 2014 13:37:17 -0800</pubDate>
+    <lastBuildDate>Tue, 16 Dec 2014 13:37:17 -0800</lastBuildDate>
     <generator>Jekyll v2.5.1</generator>
     
       <item>
+        <title>What&#39;s Coming in 2015?</title>
+        <description>&lt;p&gt;2014 was an exciting year for the Drill community. In August we made Drill available for downloads, and last week the Apache Software Foundation promoted Drill to a top-level project. Many of you have asked me what&amp;#39;s coming next, so I decided to sit down and outline some of the interesting initiatives that the Drill community is currently working on:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;Flexible Access Control&lt;/li&gt;
+&lt;li&gt;JSON in Any Shape or Form&lt;/li&gt;
+&lt;li&gt;Advanced SQL&lt;/li&gt;
+&lt;li&gt;New Data Sources&lt;/li&gt;
+&lt;li&gt;Drill/Spark Integration&lt;/li&gt;
+&lt;li&gt;Operational Enhancements: Speed, Scalability and Workload Management&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;This is by no means intended to be an exhaustive list of everything that will be added to Drill in 2015. With Drill&amp;#39;s rapidly expanding community, I anticipate that you&amp;#39;ll see a whole lot more.&lt;/p&gt;
+
+&lt;h2&gt;Flexible Access Control&lt;/h2&gt;
+
+&lt;p&gt;Many organizations are now interested in providing Drill as a service to their users, supporting many users, groups and organizations with a single cluster. To do so, they need to be able to control who can access what data. Today&amp;#39;s volume and variety of data requires a new approach to access control. For example, it is becoming impractical for organizations to manage a standalone, centralized repository of permissions for every column/row of every table. Drill&amp;#39;s virtual datasets (views) provide a more scalable solution to access control:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;The user creates a virtual dataset (&lt;code&gt;CREATE VIEW vd AS SELECT ...&lt;/code&gt;), selecting the data to be exposed/shared. The virtual dataset is defined as a SQL statement. For example, a virtual dataset may represent only the records that were created in the last 30 days and don&amp;#39;t have the &lt;code&gt;restricted&lt;/code&gt; flag. It could even mask some columns. Drill&amp;#39;s virtual datasets (just the SQL statement) are stored as files in the file system, so users can leverage file system permissions to control who can access the virtual dataset, without granting access to the source data.&lt;/li&gt;
+&lt;li&gt;A virtual dataset is owned by a specific user and can only &amp;quot;select&amp;quot; data that the owner has access to. The data sources (HDFS, HBase, MongoDB, etc.) are responsible for access control decisions. Users and administrators do not need to define separate permissions inside Drill or utilize yet another centralized permission repository, such as Sentry and Ranger.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2&gt;JSON in Any Shape or Form&lt;/h2&gt;
+
+&lt;p&gt;When data is &lt;strong&gt;Big&lt;/strong&gt; (as in Big Data), it is painful to copy and transform it. Users should be able to explore the raw data without (or at least prior to) transforming it into another format. Drill is designed to enable in-situ analytics. Just point it at a file or directory and run the queries.&lt;/p&gt;
+
+&lt;p&gt;JSON has emerged as the most common self-describing format, and Drill is able to query JSON files out of the box. Drill currently assumes that the JSON documents (or records) are stored sequentially in a file:&lt;/p&gt;
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;Lee&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;yelping_since&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2012-02&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
+&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;Matthew&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;yelping_since&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2011-12&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
+&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;Jasmine&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;yelping_since&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-09&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;However, many JSON-based datasets, ranging from &lt;a href=&quot;http://data.gov&quot;&gt;data.gov&lt;/a&gt; (government) datasets to Twitter API responses, are not organized as simple sequences of JSON documents. In some cases the actual records are listed as elements of an internal array inside a single JSON document. For example, consider the following file, which technically consists of a single JSON document, but really contains three records (under the &lt;code&gt;data.records&lt;/code&gt; field):&lt;/p&gt;
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
+  &lt;span class=&quot;nt&quot;&gt;&amp;quot;metadata&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;...&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
+  &lt;span class=&quot;nt&quot;&gt;&amp;quot;data&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
+    &lt;span class=&quot;nt&quot;&gt;&amp;quot;records&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
+      &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;Lee&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;yelping_since&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2012-02&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
+      &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;Matthew&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;yelping_since&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2011-12&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
+      &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;Jasmine&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;yelping_since&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-09&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
+    &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
+  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
+&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;The &lt;code&gt;FLATTEN&lt;/code&gt; function in Drill 0.7+ takes an array and converts each item into a top-level record:&lt;/p&gt;
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FLATTEN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dfs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tmp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;You can use this as an inner query (or inside a view):&lt;/p&gt;
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;
+  &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FLATTEN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;record&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dfs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tmp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;p&quot;
 &gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;------------+&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;------------+&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Lee&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Matthew&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Jasmine&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;------------+&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;While this works today, the dataset is technically a single JSON document, so Drill ends up reading the entire dataset into memory. We&amp;#39;re developing a FLATTEN-pushdown mechanism that will enable the JSON reader to emit the individual records into the downstream operators, thereby making this work with datasets of arbitrary size. Once that&amp;#39;s implemented, users will be able to explore any JSON-based dataset in-situ (ie, without having to transform it).&lt;/p&gt;
+
+&lt;h2&gt;Full SQL&lt;/h2&gt;
+
+&lt;p&gt;Unlike the majority of SQL engines for Hadoop and NoSQL databases, which support SQL-like languages (HiveQL, CQL, etc.), Drill is designed from the ground up to be compliant with ANSI SQL. We simply started with a real SQL parser (Apache Calcite, previously known as Optiq). We&amp;#39;re currently implementing the remaining SQL constructs, and plan to support the full TPC-DS suite (with no query modifications) in 2015. Full SQL support makes BI tools work better, and enables users who are proficient with SQL to leverage their existing knowledge and skills.&lt;/p&gt;
+
+&lt;h2&gt;New Data Sources&lt;/h2&gt;
+
+&lt;p&gt;Drill is a standalone, distributed SQL engine. It has a pluggable architecture that allows it to support multiple data sources. Drill 0.6 includes storage plugins for:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;&lt;a href=&quot;https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html&quot;&gt;Hadoop File System&lt;/a&gt; implementations (local file system, HDFS, MapR-FS, Amazon S3, etc.)&lt;/li&gt;
+&lt;li&gt;HBase and MapR-DB&lt;/li&gt;
+&lt;li&gt;MongoDB&lt;/li&gt;
+&lt;li&gt;Hive Metastore (query any dataset that is registered in Hive Metastore)&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;A single query can join data from different systems. For example, a query can join user profiles in MongoDB with log files in Hadoop, or datasets in multiple Hadoop clusters.&lt;/p&gt;
+
+&lt;p&gt;I&amp;#39;m eager to see what storage plugins the community develops over the next 12 months. In the last few weeks alone, developers in the community have expressed their desire (on the &lt;a href=&quot;mailto:dev@drill.apache.org&quot;&gt;public list&lt;/a&gt;) to develop additional storage plugins for the following data sources:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;Cassandra&lt;/li&gt;
+&lt;li&gt;Solr&lt;/li&gt;
+&lt;li&gt;JDBC (any RDBMS, including Oracle, MySQL, PostgreSQL and SQL Server)&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;If you&amp;#39;re interested in implementing a new storage plugin, I would encourage you to reach out to the Drill developer community on &lt;a href=&quot;mailto:dev@drill.apache.org&quot;&gt;dev@drill.apache.org&lt;/a&gt;. I&amp;#39;m looking forward to publishing an example of a single-query join across 10 data sources.&lt;/p&gt;
+
+&lt;h2&gt;Drill/Spark Integration&lt;/h2&gt;
+
+&lt;p&gt;We&amp;#39;re seeing growing interest in Spark as an execution engine for data pipelines, providing an alternative to MapReduce. The Drill community is working on integrating Drill and Spark to address a few new use cases:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;&lt;p&gt;Use a Drill query (or view) as the input to Spark. Drill is a powerful engine for extracting and pre-processing data from various data sources, thereby reducing development time and effort. Here&amp;#39;s an example:&lt;/p&gt;
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sc&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;SparkContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;conf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;drillRDD&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;SELECT * FROM dfs.root.`path/to/logs` l, mongo.mydb.users u WHERE l.user_id = u.id GROUP BY ...&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;formatted&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt;
+  &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;last&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;visits&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;last&lt;/span
 &gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;visits&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;$first $last $visits&amp;quot;&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
+&lt;li&gt;&lt;p&gt;Use Drill to query Spark RDDs. Analysts will be able to use BI tools like MicroStrategy, Spotfire and Tableau to query in-memory data in Spark. In addition, Spark developers will be able to embed Drill execution in a Spark data pipeline, thereby enjoying the power of Drill&amp;#39;s schema-free, columnar execution engine.&lt;/p&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2&gt;Operational Enhancements&lt;/h2&gt;
+
+&lt;p&gt;As we continue with our monthly releases and march towards the 1.0 release early next year, we&amp;#39;re focused on improving Drill&amp;#39;s speed and scalability. We&amp;#39;ll also enhance Drill&amp;#39;s multi-tenancy with more advanced workload management.&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;&lt;strong&gt;Speed&lt;/strong&gt;: Drill is already extremely fast, and we&amp;#39;re going to make it even faster over the next few months. With that said, we think that improving user productivity and time-to-insight is as important as shaving a few milliseconds off a query&amp;#39;s runtime.&lt;/li&gt;
+&lt;li&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: To date we&amp;#39;ve focused mainly on clusters of up to a couple hundred nodes. We&amp;#39;re currently working to support clusters with thousands of nodes. We&amp;#39;re also improving concurrency to better support deployments in which hundreds of analysts or developers are running queries at the same time.&lt;/li&gt;
+&lt;li&gt;&lt;strong&gt;Workload management&lt;/strong&gt;: A single cluster is often shared among many users and groups, and everyone expects answers in real-time. Workload management prioritizes the allocation of resources to ensure that the most important workloads get done first so that business demands can be met. Administrators need to be able to assign priorities and quotas at a fine granularity. We&amp;#39;re working on enhancing Drill&amp;#39;s workload management to provide these capabilities while providing tight integration with YARN and Mesos.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2&gt;We Would Love to Hear From You!&lt;/h2&gt;
+
+&lt;p&gt;Are there other features you would like to see in Drill? We would love to hear from you:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;Drill users: &lt;a href=&quot;mailto:user@drill.apache.org&quot;&gt;user@drill.apache.org&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;Drill developers: &lt;a href=&quot;mailto:dev@drill.apache.org&quot;&gt;dev@drill.apache.org&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;Me: &lt;a href=&quot;mailto:tshiran@apache.org&quot;&gt;tshiran@apache.org&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Happy Drilling!&lt;br&gt;
+Tomer Shiran&lt;/p&gt;
+</description>
+        <pubDate>Tue, 16 Dec 2014 00:00:00 -0800</pubDate>
+        <link>/blog/2014/12/16/whats-coming-in-2015/</link>
+        <guid isPermaLink="true">/blog/2014/12/16/whats-coming-in-2015/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>Apache Drill Q&amp;A Panelist Spotlight</title>
         <description>&lt;script type=&quot;text/javascript&quot; src=&quot;https://addthisevent.com/libs/1.5.8/ate.min.js&quot;&gt;&lt;/script&gt;
 



Mime
View raw message