drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tshi...@apache.org
Subject svn commit: r1655190 - in /drill/site/trunk/content/drill: blog/2015/ blog/2015/01/ blog/2015/01/27/ blog/2015/01/27/schema-free-json-data-infrastructure/ blog/2015/01/27/schema-free-json-data-infrastructure/index.html blog/index.html feed.xml
Date Tue, 27 Jan 2015 23:28:48 GMT
Author: tshiran
Date: Tue Jan 27 23:28:48 2015
New Revision: 1655190

URL: http://svn.apache.org/r1655190
Log:
New blog post

Added:
    drill/site/trunk/content/drill/blog/2015/
    drill/site/trunk/content/drill/blog/2015/01/
    drill/site/trunk/content/drill/blog/2015/01/27/
    drill/site/trunk/content/drill/blog/2015/01/27/schema-free-json-data-infrastructure/
    drill/site/trunk/content/drill/blog/2015/01/27/schema-free-json-data-infrastructure/index.html
Modified:
    drill/site/trunk/content/drill/blog/index.html
    drill/site/trunk/content/drill/feed.xml

Added: drill/site/trunk/content/drill/blog/2015/01/27/schema-free-json-data-infrastructure/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/blog/2015/01/27/schema-free-json-data-infrastructure/index.html?rev=1655190&view=auto
==============================================================================
--- drill/site/trunk/content/drill/blog/2015/01/27/schema-free-json-data-infrastructure/index.html
(added)
+++ drill/site/trunk/content/drill/blog/2015/01/27/schema-free-json-data-infrastructure/index.html
Tue Jan 27 23:28:48 2015
@@ -0,0 +1,172 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+
+<meta charset="UTF-8">
+
+
+<title>Schema-free JSON Data Infrastructure - Apache Drill</title>
+
+<link href="/css/syntax.css" rel="stylesheet" type="text/css">
+<link href="/css/style.css" rel="stylesheet" type="text/css">
+<link href="/css/arrows.css" rel="stylesheet" type="text/css">
+<link href="/css/button.css" rel="stylesheet" type="text/css">
+
+<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+<link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+<script language="javascript" type="text/javascript" src="/js/lib/jquery-1.11.1.min.js"></script>
+<script language="javascript" type="text/javascript" src="/js/lib/jquery.easing.1.3.js"></script>
+<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
+<script language="javascript" type="text/javascript" src="/js/script.js"></script>
+
+</head>
+
+<body onResize="resized();">
+
+<div class="bui"></div>
+
+<div id="search">
+<input type="text" placeholder="Enter search term here">
+</div>
+
+<div id="menu" class="mw">
+<ul>
+  <li class="logo"><a href="/"></a></li>
+  <li>
+    <a href="/overview/">Documentation</a>
+    <ul>
+      <li><a href="/overview/">Overview&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes"
target="_blank">Drill in 10 Minutes</a></li>
+      <li><a href="/why/">Why Drill? &nbsp;&nbsp;&nbsp;&nbsp;</a></li>
+      <li><a href="/architecture/">Architecture</a></li>
+    </ul>
+  </li>
+  <li>
+    <a href="/community/">Community</a>
+    <ul>
+      <li><a href="/team/">Team</a></li>
+      <li><a href="/community/#events">Events and Meetups</a></li>
+      <li><a href="/community/#mailinglists">Mailing Lists</a></li>
+      <li><a href="/community/#getinvolved">Get Involved</a></li>
+      <li><a href="https://issues.apache.org/jira/browse/DRILL/" target="_blank">Issue
Tracker</a></li>
+      <li><a href="https://github.com/apache/drill" target="_blank">GitHub</a></li>
+    </ul>
+  </li>
+  <li><a href="/faq/">FAQ</a></li>
+  <li><a href="/blog/">Blog</a></li>
+  <li style="width:30px; padding-left: 2px; padding-right:10px"><a href="https://twitter.com/apachedrill"
target="_blank"><img src="/images/twitterbw.png" alt="" align="center" width="22" style="padding:
0px 10px 1px 0px;"></a> </li>
+  <li class="l"><span>&nbsp;</span></li>
+  <li class="d"><a href="/download/">Download</a></li>
+</ul>
+</div>
+
+<div class="post int_text">
+
+  <header class="post-header">
+    <h1 class="post-title">Schema-free JSON Data Infrastructure</h1>
+    <p class="post-meta">
+    
+      
+      
+      <strong>Author:</strong> Tomer Shiran (Founder, PMC Member and Committer,
Apache Drill)
+    
+<br/><strong>Date:</strong> Jan 27, 2015
+</p>
+  </header>
+  <div class="addthis_sharing_toolbox"></div>
+
+  <article class="post-content">
+    <p>JSON has emerged in recent years as the de-facto standard data exchange format.
It is being used everywhere. Front-end Web applications use JSON to maintain data and communicate
with back-end applications. Web APIs are JSON-based (eg, <a href="https://dev.twitter.com/rest/public">Twitter
REST APIs</a>, <a href="http://developers.marketo.com/documentation/rest/">Marketo
REST APIs</a>, <a href="https://developer.github.com/v3/">GitHub API</a>).
It&#39;s the format of choice for public datasets, operational log files and more.</p>
+
+<h1 id="why-is-json-a-convenient-data-exchange-format?">Why is JSON a Convenient Data
Exchange Format?</h1>
+
+<p>While I won&#39;t dive into the historical roots of JSON (JavaScript Object
Notation, <a href="http://en.wikipedia.org/wiki/JSON#JavaScript_eval.28.29"><code>eval()</code></a>,
etc.), I do want to highlight several attributes of JSON that make it a convenient data exchange
format:</p>
+
+<ul>
+<li><strong>JSON is self-describing</strong>. You can look at a JSON document
and understand what it represents. The field names are included in the document. You don&#39;t
need an external schema or definition to interpret JSON-encoded data. This makes life easier
for anyone who wants to deal with the data, and it also means that a collection of JSON documents
represents what many people call a &quot;schema-less dataset&quot; (where structure
can evolve, and different records can have different fields).</li>
+<li><strong>JSON is simple</strong>. Other self-describing formats such
as XML are much more complicated. A JSON document is made up of arrays and maps (or objects,
in JSON terminology), and that&#39;s about it.</li>
+<li><strong>JSON can naturally represent real-world objects</strong>. Try
representing your application&#39;s <code>Customer</code> object (with the
person&#39;s address, order history, etc.) in a CSV file or a relational database. It&#39;s
hard. In fact, ORM systems were invented to help alleviate this issue.</li>
+<li><strong>JSON libraries are available in virtually every programming language</strong>.
Take a look at <a href="http://www.json.org/">the list of supported languages on JSON.org</a>.
I counted 15 languages that start with the letters A, B or C.</li>
+<li><strong>JSON is idiomatic in loosely typed languages</strong>. Many
loosely typed languages, such as Python, Ruby and JavaScript, have data structures that are
similar to JSON objects, making it very natural to handle JSON data in those languages. For
example, a Python dictionary looks just like a JSON object. This makes it easy for developers
to utilize JSON in their applications.</li>
+</ul>
+
+<h1 id="json-data-infrastructure">JSON Data Infrastructure</h1>
+
+<p>Traditional data infrastructure, such as relational databases, has some features
that make it easier to store and process JSON-encoded data. For example, Oracle has <a
href="https://docs.oracle.com/database/121/ADXDB/json.htm">a JSON data type and a set of
functions for handling JSON data</a>.</p>
+
+<p>However, a new class of data infrastructure is providing a much more seamless experience
via a full-fledged JSON data model. For example:</p>
+
+<ul>
+<li>Drill is a SQL engine in which each record is conceptually a JSON document.</li>
+<li>Elasticsearch is a search engine in which each indexed document is conceptually
a JSON document.</li>
+<li>MongoDB is an operational database in which each record is conceptually a JSON
document.</li>
+</ul>
+
+<p>These systems view JSON as a data model as opposed to one of many data types, realizing
that JSON offers a simple way to represent real-world objects.</p>
+
+<table><thead>
+<tr>
+<th></th>
+<th>Traditional Infrastructure</th>
+<th>JSON Infrastructure</th>
+</tr>
+</thead><tbody>
+<tr>
+<td><strong>Examples:</strong></td>
+<td>Oracle, SQL Server</td>
+<td>Drill, Elasticsearch, MongoDB</td>
+</tr>
+<tr>
+<td><strong>Record:</strong></td>
+<td>Tuple</td>
+<td>JSON document</td>
+</tr>
+<tr>
+<td><strong>Variable schema:</strong></td>
+<td>No</td>
+<td>Yes</td>
+</tr>
+</tbody></table>
+
+<p>If you happen to be in the Bay Area tomorrow, please join Gaurav Gupta (VP Product
Management, Elasticsearch), Paul Pedersen (Deputy CTO, MongoDB), Robert Greene (Senior Principal
Product Manager, Oracle), Sukanta Ganguly (VP Solutions Architecture, Aerospike) and me for
a panel moderated by Gartner&#39;s Nick Heudecker on this new world of schema-free JSON.
Check out <a href="http://www.meetup.com/SF-Bay-Areas-Big-Data-Think-Tank/">The Hive
Big Data Think Tank</a> for more information.</p>
+
+  </article>
+ <div id="disqus_thread"></div>
+    <script type="text/javascript">
+        /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
+        var disqus_shortname = 'drill'; // required: replace example with your forum shortname
+
+        /* * * DON'T EDIT BELOW THIS LINE * * */
+        (function() {
+            var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async
= true;
+            dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
+            (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+        })();
+    </script>
+    <noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments
powered by Disqus.</a></noscript>
+    
+</div>
+<script type="text/javascript" src="//s7.addthis.com/js/300/addthis_widget.js#pubid=ra-548b2caa33765e8d"
async="async"></script>
+
+
+<div id="footer" class="mw">
+<div class="wrapper">
+Copyright © 2012-2014 The Apache Software Foundation, licensed under the Apache License,
Version 2.0.<br>
+Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other
names appearing on the site may be trademarks of their respective owners.<br/><br/>
+</div>
+</div>
+
+<script>
+(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-53379651-1', 'auto');
+ga('send', 'pageview');
+</script>
+
+</body>
+</html>

Modified: drill/site/trunk/content/drill/blog/index.html
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/blog/index.html?rev=1655190&r1=1655189&r2=1655190&view=diff
==============================================================================
--- drill/site/trunk/content/drill/blog/index.html (original)
+++ drill/site/trunk/content/drill/blog/index.html Tue Jan 27 23:28:48 2015
@@ -68,6 +68,8 @@
 </div>
 
 <div class="int_text" align="left"><!-- previously: site.posts -->
+<p><a class="post-link" href="/blog/2015/01/27/schema-free-json-data-infrastructure/">Schema-free
JSON Data Infrastructure</a> (Jan 27, 2015)<br/>JSON has emerged as the de-facto
standard data exchange format. Data infrastructure technologies such as Apache Drill, MongoDB
and Elasticsearch are embracing JSON as their native data models, bringing game-changing ease-of-use
and agility to developers and analysts.</p>
+<!-- previously: site.posts -->
 <p><a class="post-link" href="/blog/2014/12/23/drill-0.7-released/">Drill 0.7
Released</a> (Dec 23, 2014)<br/>The community has just released Drill 0.7, which
includes 228 resolved JIRAs and numerous enhancements.</p>
 <!-- previously: site.posts -->
 <p><a class="post-link" href="/blog/2014/12/16/whats-coming-in-2015/">What's
Coming in 2015?</a> (Dec 16, 2014)<br/>Drill is now a top-level project, and the
community is expanding rapidly. Find out more about some of the new features planned for 2015.</p>

Modified: drill/site/trunk/content/drill/feed.xml
URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/feed.xml?rev=1655190&r1=1655189&r2=1655190&view=diff
==============================================================================
--- drill/site/trunk/content/drill/feed.xml (original)
+++ drill/site/trunk/content/drill/feed.xml Tue Jan 27 23:28:48 2015
@@ -6,11 +6,76 @@
 </description>
     <link>/</link>
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Wed, 14 Jan 2015 21:01:22 -0800</pubDate>
-    <lastBuildDate>Wed, 14 Jan 2015 21:01:22 -0800</lastBuildDate>
+    <pubDate>Tue, 27 Jan 2015 15:28:02 -0800</pubDate>
+    <lastBuildDate>Tue, 27 Jan 2015 15:28:02 -0800</lastBuildDate>
     <generator>Jekyll v2.5.1</generator>
     
       <item>
+        <title>Schema-free JSON Data Infrastructure</title>
+        <description>&lt;p&gt;JSON has emerged in recent years as the de-facto
standard data exchange format. It is being used everywhere. Front-end Web applications use
JSON to maintain data and communicate with back-end applications. Web APIs are JSON-based
(eg, &lt;a href=&quot;https://dev.twitter.com/rest/public&quot;&gt;Twitter
REST APIs&lt;/a&gt;, &lt;a href=&quot;http://developers.marketo.com/documentation/rest/&quot;&gt;Marketo
REST APIs&lt;/a&gt;, &lt;a href=&quot;https://developer.github.com/v3/&quot;&gt;GitHub
API&lt;/a&gt;). It&amp;#39;s the format of choice for public datasets, operational
log files and more.&lt;/p&gt;
+
+&lt;h1 id=&quot;why-is-json-a-convenient-data-exchange-format?&quot;&gt;Why
is JSON a Convenient Data Exchange Format?&lt;/h1&gt;
+
+&lt;p&gt;While I won&amp;#39;t dive into the historical roots of JSON (JavaScript
Object Notation, &lt;a href=&quot;http://en.wikipedia.org/wiki/JSON#JavaScript_eval.28.29&quot;&gt;&lt;code&gt;eval()&lt;/code&gt;&lt;/a&gt;,
etc.), I do want to highlight several attributes of JSON that make it a convenient data exchange
format:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;&lt;strong&gt;JSON is self-describing&lt;/strong&gt;. You
can look at a JSON document and understand what it represents. The field names are included
in the document. You don&amp;#39;t need an external schema or definition to interpret
JSON-encoded data. This makes life easier for anyone who wants to deal with the data, and
it also means that a collection of JSON documents represents what many people call a &amp;quot;schema-less
dataset&amp;quot; (where structure can evolve, and different records can have different
fields).&lt;/li&gt;
+&lt;li&gt;&lt;strong&gt;JSON is simple&lt;/strong&gt;. Other self-describing
formats such as XML are much more complicated. A JSON document is made up of arrays and maps
(or objects, in JSON terminology), and that&amp;#39;s about it.&lt;/li&gt;
+&lt;li&gt;&lt;strong&gt;JSON can naturally represent real-world objects&lt;/strong&gt;.
Try representing your application&amp;#39;s &lt;code&gt;Customer&lt;/code&gt;
object (with the person&amp;#39;s address, order history, etc.) in a CSV file or a relational
database. It&amp;#39;s hard. In fact, ORM systems were invented to help alleviate this
issue.&lt;/li&gt;
+&lt;li&gt;&lt;strong&gt;JSON libraries are available in virtually every programming
language&lt;/strong&gt;. Take a look at &lt;a href=&quot;http://www.json.org/&quot;&gt;the
list of supported languages on JSON.org&lt;/a&gt;. I counted 15 languages that start
with the letters A, B or C.&lt;/li&gt;
+&lt;li&gt;&lt;strong&gt;JSON is idiomatic in loosely typed languages&lt;/strong&gt;.
Many loosely typed languages, such as Python, Ruby and JavaScript, have data structures that
are similar to JSON objects, making it very natural to handle JSON data in those languages.
For example, a Python dictionary looks just like a JSON object. This makes it easy for developers
to utilize JSON in their applications.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h1 id=&quot;json-data-infrastructure&quot;&gt;JSON Data Infrastructure&lt;/h1&gt;
+
+&lt;p&gt;Traditional data infrastructure, such as relational databases, has some
features that make it easier to store and process JSON-encoded data. For example, Oracle has
&lt;a href=&quot;https://docs.oracle.com/database/121/ADXDB/json.htm&quot;&gt;a
JSON data type and a set of functions for handling JSON data&lt;/a&gt;.&lt;/p&gt;
+
+&lt;p&gt;However, a new class of data infrastructure is providing a much more seamless
experience via a full-fledged JSON data model. For example:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;Drill is a SQL engine in which each record is conceptually a JSON document.&lt;/li&gt;
+&lt;li&gt;Elasticsearch is a search engine in which each indexed document is conceptually
a JSON document.&lt;/li&gt;
+&lt;li&gt;MongoDB is an operational database in which each record is conceptually
a JSON document.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;These systems view JSON as a data model as opposed to one of many data types,
realizing that JSON offers a simple way to represent real-world objects.&lt;/p&gt;
+
+&lt;table&gt;&lt;thead&gt;
+&lt;tr&gt;
+&lt;th&gt;&lt;/th&gt;
+&lt;th&gt;Traditional Infrastructure&lt;/th&gt;
+&lt;th&gt;JSON Infrastructure&lt;/th&gt;
+&lt;/tr&gt;
+&lt;/thead&gt;&lt;tbody&gt;
+&lt;tr&gt;
+&lt;td&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/td&gt;
+&lt;td&gt;Oracle, SQL Server&lt;/td&gt;
+&lt;td&gt;Drill, Elasticsearch, MongoDB&lt;/td&gt;
+&lt;/tr&gt;
+&lt;tr&gt;
+&lt;td&gt;&lt;strong&gt;Record:&lt;/strong&gt;&lt;/td&gt;
+&lt;td&gt;Tuple&lt;/td&gt;
+&lt;td&gt;JSON document&lt;/td&gt;
+&lt;/tr&gt;
+&lt;tr&gt;
+&lt;td&gt;&lt;strong&gt;Variable schema:&lt;/strong&gt;&lt;/td&gt;
+&lt;td&gt;No&lt;/td&gt;
+&lt;td&gt;Yes&lt;/td&gt;
+&lt;/tr&gt;
+&lt;/tbody&gt;&lt;/table&gt;
+
+&lt;p&gt;If you happen to be in the Bay Area tomorrow, please join Gaurav Gupta (VP
Product Management, Elasticsearch), Paul Pedersen (Deputy CTO, MongoDB), Robert Greene (Senior
Principal Product Manager, Oracle), Sukanta Ganguly (VP Solutions Architecture, Aerospike)
and me for a panel moderated by Gartner&amp;#39;s Nick Heudecker on this new world of
schema-free JSON. Check out &lt;a href=&quot;http://www.meetup.com/SF-Bay-Areas-Big-Data-Think-Tank/&quot;&gt;The
Hive Big Data Think Tank&lt;/a&gt; for more information.&lt;/p&gt;
+</description>
+        <pubDate>Tue, 27 Jan 2015 00:50:01 -0800</pubDate>
+        <link>/blog/2015/01/27/schema-free-json-data-infrastructure/</link>
+        <guid isPermaLink="true">/blog/2015/01/27/schema-free-json-data-infrastructure/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>Drill 0.7 Released</title>
         <description>&lt;p&gt;I&amp;#39;m excited to announce that the
community has just released Drill 0.7, which includes &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&amp;amp;version=12327473&quot;&gt;228
resolved JIRAs&lt;/a&gt; and numerous enhancements such as: &lt;/p&gt;
 



Mime
View raw message