drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r928467 - in /websites/staging/drill/trunk/content: ./ drill/download.html drill/top-10-reasons-for-using-drill.html
Date Sun, 09 Nov 2014 07:16:19 GMT
Author: buildbot
Date: Sun Nov  9 07:16:18 2014
New Revision: 928467

Staging update by buildbot for drill

    websites/staging/drill/trunk/content/   (props changed)

Propchange: websites/staging/drill/trunk/content/
--- cms:source-revision (original)
+++ cms:source-revision Sun Nov  9 07:16:18 2014
@@ -1 +1 @@

Modified: websites/staging/drill/trunk/content/drill/download.html
--- websites/staging/drill/trunk/content/drill/download.html (original)
+++ websites/staging/drill/trunk/content/drill/download.html Sun Nov  9 07:16:18 2014
@@ -67,7 +67,7 @@
         <div class="int_text download">
-            <h2>The latest release is Drill 0.6.0, released November 7, 2014</h2>
+            <h2>The latest release is Drill 0.6.0, released November 1, 2014</h2>

Modified: websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html
--- websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html (original)
+++ websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html Sun Nov
 9 07:16:18 2014
@@ -92,45 +92,39 @@ font-family: Consolas, "Liberation Mono"
            <!-- Blog -->
-<p>There are several options available for SQL-on-Hadoop today. What makes Drill different?
-<p>Here are the top 10 reasons why Drill is a valuable and innovative technology in
your toolset for interactive data exploration on big data</p>
-<div align="center">
-<p><img alt="Apache Drill" src="https://www.mapr.com/sites/default/files/blogimages/Apache-Drill.png"
style="height:39px; width:551px"></p>
-<p style="margin-left:40px"><img alt="quick and easy ramp up for apache drill" src="https://www.mapr.com/sites/default/files/blogimages/Quick-Easy-Ramp-Up-2.png"
style="height:329px; width:550px; padding-right:35px"></p>
-<h2>1. Quick and easy ramp up</h2>
-<p>First and foremost, it takes just minutes to start working with Apache Drill. Install
it on a local Windows or Mac machine and do queries right away - you don't even need Hadoop.</p><p>Here
are three simple steps to run your first query with Drill.</p>
+<h2>1. Get started in minutes</h2>
+<p>It only takes a couple minutes to start working with Drill. Untar it on your Mac
or Windows laptop and run a query on a local file. No need to set up any infrastructure. No
need to define schemas. Just point at the data and drill!</p>
-// Install, launch SQLLine CLI and query a JSON file on local file system
-$ tar -xvf apache-drill-0.5.0-incubating.tar  
-$ apache-drill-0.5.0-incubating/bin/sqlline -u jdbc:drill:zk=local
-0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json` limit 5;
+$ tar -xvf apache-drill-0.6.0-incubating.tar.gz
+$ apache-drill-0.6.0-incubating/bin/sqlline -u jdbc:drill:zk=local
+0: jdbc:drill:zk=local> SELECT * FROM dfs.root.`path/to/employee.json` limit 5;
 | employee_id | full_name        | first_name | last_name  | position_id | position_title
      |  store_id  | department_id | birt 
 | 1           | Sheri Nowmer     | Sheri      | Nowmer     | 1           | President    
       | 0          | 1             | 19   
 | 2           | Derrick Whelply  | Derrick    | Whelply    | 2           | VP Country Manager
  | 0          | 1             |
 | 4           | Michael Spence   | Michael    | Spence     | 2           | VP Country Manager
  | 0          | 1             |
 | 5           | Maya Gutierrez   | Maya       | Gutierrez  | 2           | VP Country Manager
  | 0          | 1             |
 | 6           | Roberta Damstra  | Roberta    | Damstra    | 3           | VP Information
Systems | 0        | 2             |
+<h2>2. Schema-free JSON model</h2>
+<p>Drill is the world's first and only distributed SQL engine that doesn't require
schemas. It shares the same schema-free JSON model as MongoDB and Elasticsearch. Instead of
spending weeks or months defining schemas, transforming data (ETL) and maintaining those schemas,
simply point Drill at your data (file, directory, HBase table, etc.) and run your queries.
Drill automatically understands the structure of the data. Drill's self-service approach reduces
the burden on IT and increases the productivity and agility of analysts and developers.</p>
+<h2>3. Query complex, semi-structured data in-situ</h2><p>Drill's schema-free
JSON model allows you to query complex, semi-structured data in situ. No need to flatten or
transform the data prior to or during query execution. Drill also provides intuitive extensions
to SQL to work with nested data. Here's a simple query on a JSON file demonstrating how to
access nested elements and arrays:</p>
+SELECT * FROM (SELECT t.trans_id,
+                      t.trans_info.prod_id[0] AS prod_id,
+                      t.trans_info.purch_flag AS purchased
+               FROM `clicks/clicks.json` t) sq
+WHERE sq.prod_id BETWEEN 700 AND 750 AND
+      sq.purchased = 'true'
+ORDER BY sq.prod_id;
-<h2>2. Supports ANSI SQL - as you know it</h2><p>Apache Drill is compatible
with ANSI SQL standards. This means that users don't need to learn a new query language or
know the nuances of "SQL Like" to work with Drill or migrate existing workloads to Drill.
 </p><p>Drill supports SQL 2003 syntax and provides all the key SQL data types
(such as DATE, INTERVAL, TIMESTAMP, VARCHAR, DECIMAL) and query constructs (such as correlated
sub-queries, joins in WHERE clause) to provide a smooth and familiar analytics experience.
 </p><p>Here is an example of a TPC-H standard query that runs in Drill "as is".
+<h2>4. Real SQL - not "SQL-like"</h2>
+<p>Drill supports the standard SQL:2003 syntax. No need to learn a new "SQL-like" language
or struggle with a semi-functional BI tool. Drill supports many data types including DATE,
INTERVAL, TIMESTAMP, VARCHAR and DECIMAL, as well as complex query constructs such as correlated
sub-queries and joins in WHERE clauses. Here is an example of a TPC-H standard query that
runs in Drill "as is":</p>
 # TPC-H query 4
 SELECT  o.o_orderpriority, count(*) AS order_count
@@ -146,62 +140,32 @@ WHERE o.o_orderdate >= date '1996-10-01'
       ORDER BY o.o_orderpriority;
-<h2>3. Works with your BI tools</h2><p>Apache Drill integrates with the
BI/SQL tools such as Tableau, MicroStrategy, Pentaho and Jaspersoft using JDBC/ODBC drivers.
This means that users can now use same BI/Analytics tools they are deeply familiar with in
order to perform proactive business intelligence using more raw data, up-to-date data and
new types of data available in Hadoop/NoSQL stores at a significantly low cost and rapid time
to market.  </p><p>Here is a quick look at the Drill ODBC Driver DSN UI - Drill
explorer - a data exploration environment to understand Drill data and create views along
with a BI visualization using Drill as a data source.  </p><p style="margin-left:40px"><img
alt="MapR Drill ODBC Driver DSN Setup" src="https://www.mapr.com/sites/default/files/blogimages/MapR-Drill-ODBC-Driver-DSN-Setup.png"
style="height:498px; width:450px"></p><p style="margin-left:40px"><img alt="data
exploration enviroment" src="https://www.mapr.com/sites/default/files/blogimages
 /Data-exploration-enviroment.png" style="height:354px; width:600px"></p><p style="margin-left:40px"><img
alt="Tableau example" src="https://www.mapr.com/sites/default/files/blogimages/Tableau-example.png"
style="height:583px; width:600px"></p><h2>4. Supports self-describing data
with no ETL</h2><p>Self-describing data is where schema is specified as part of
the data itself. File formats such as Parquet, JSON, Protobuf, XML, Avro and NoSQL databases
are all examples of self-describing data. Some of these data formats are also dynamic and
complex in that every record in the data can have its own set of columns/attributes and each
column can be semi-structured/nested.  </p><p>Think about a JSON document with
multiple levels of nesting and optional/repeated elements at each level or a wide HBase table
with 100s-1000s of columns with varying schema across rows. How about third party data that
you are looking to leverage in BI/Analytics, but you have no control on how schemas will evolve?
   </p><p>Drill supports querying self-describing data without defining and managing
any centralized schema definitions in Hive metastore. Schema is discovered dynamically on
the fly when the queries come in.  </p><p>Dynamic schema discovery with no upfront
modeling/schema management means that companies now can eliminate time delays of weeks/months
of ETL before data is available to users for data exploration. Users can get more up-to-date/real-time
data in order to make informed and timely decisions.  </p><p>Here are a few quick
examples on querying files and directories using Drill.  </p>
-//clicks.json is a file and logs is a partitioned directory by year & month on Hadoop
+<h2>5. Leverage standard BI tools</h2>
+<p>Drill works with standard BI tools. You can keep using the tools you love, such
as Tableau, MicroStrategy, QlikView and Excel. No need to introduce yet another visualization
or dashboard tool. Combine a self-service BI tool with the only self-service SQL engine to
enable true self-service data exploration.</p>
-0: jdbc:drill:> select * from `clicks/clicks.json` limit 2;
-0: jdbc:drill:> select cust_id, dir1 month_no, count(*) month_count from logs 
-where dir0=2014 group by cust_id, dir1 order by cust_id, month_no limit 10;
-<h2>5. Handles Complex Data Types</h2><p>Drill comes with a flexible JSON-like
data model to natively query and process complex/multi-structured data. The data doesn't need
to be flattened or transformed either at the design time or runtime providing high performance
for queries on complex data. Drill provides intuitive extensions to SQL to work with nested
data using MAP and ARRAY data types.  </p><p>Here is an example indicating how
Drill queries a JSON file and accesses the nested maps and array fields.  </p>
+<h2>6. Interactive queries on Hive tables</h2><p>Apache Drill lets you
leverage your investments in Hive. You can run interactive queries with Drill on your Hive
tables and access all Hive input/output formats (including custom SerDes). You can join tables
associated with different Hive metastores, and you can join a Hive table with an HBase table
or a directory of log files. Here's a simple query in Drill on a Hive table:</p>
-// prod_id is an array field in clicks.json file  
-select * from (select t.trans_id, t.trans_info.prod_id[0] as prodid,
-t.trans_info.purch_flag as purchased
-from `clicks/clicks.json` t) sq
-where sq.prodid between 700 and 750 and sq.purchased='true' order by sq.prodid;
+SELECT `month`, state, sum(order_total) AS sales
+FROM hive.orders 
+GROUP BY `month`, state
-<h2>6. Plays Well with Hive</h2><p>Apache Drill lets you reuse investments
made in existing Hive deployments. You can do queries on Hive tables and access 100+ Hive
input/output formats (including custom serdes) with no re-work. Drill serves as a complement
to Hive deployments by offering low latency queries.</p><p>Here is a sample Hive
storage plugin configuration looks like in Drill, followed by a query on a Hive table.  </p>
+<h2>7. Access multiple data sources</h2><p>Drill is designed with extensibility
in mind. It provides out-of-the-box connectivity to file systems (local or distributed file
systems such as S3, HDFS and MapR-FS), HBase and Hive. You can implement a storage plugin
to make Drill work with any other data source. Drill can combine data from multiple data sources
on the fly in a single query, with no centralized metadata definitions. Here's a query that
combines data from a Hive table, an HBase table (view) and a JSON file:</p>
-//Storage plugin configuration for Hive
- "type": "hive",
- "enabled": true,
- "configProps": {
-   "hive.metastore.uris": "thrift://localhost:9083",
-   "hive.metastore.sasl.enabled": "false"
- }
-//Query on a Hive table 'orders'
-0: jdbc:drill:> select `month`, state, sum(order_total) as sales from hive.orders 
-group by `month`, state order by 3 desc limit 5;
+SELECT custview.membership, sum(orders.order_total) AS sales
+FROM hive.orders, custview, dfs.`clicks/clicks.json` c 
+WHERE orders.cust_id = custview.cust_id AND orders.cust_id = c.user_info.cust_id 
+GROUP BY custview.membership
+<h2>8. User-Defined Functions (UDFs)</h2><p>Drill exposes a simple and
high-performance Java API to build custom functions (UDFs and UDAFs) so that you can add your
own business logic. If you have already built UDFs in Hive, you can reuse them with Drill
with no modifications. Refer to <a href="https://cwiki.apache.org/confluence/display/DRILL/Develop+Custom+Functions">Developing
Custom Functions</a> for more information.
+<h2>9. High performance</h2><p>Drill is designed fround the ground up for
high throughput and low latency. It doesn't use a general purpose execution engine like MapReduce,
Tez or Spark. As a result, Drill is able to deliver its unparalleled flexibility (schema-free
JSON model) without compromising performance. Drill's optimizer leverages rule- and cost-based
techniques, as well as data locality and operator push-down (the ability to push down query
fragments into the back-end data sources). Drill also provides a columnar and vectorized execution
engine, resulting in higher memory and CPU efficiency.</p>
-<h2>7. Works with Hadoop and Beyond</h2><p>Drill is designed with extensibility
in mind. It provides out-of-the-box connectivity to file systems (local or distributed file
systems such as S3, HDFS, MapR-FS), HBase, or Hive. The storage plugin interface is extensible
to other NoSQL stores (such as Couchbase, Elasticsearch, MongoDB) or relational databases
(such as Postgres, MySQL, etc.) or your own custom store. Drill can also combine data from
all these data sources in a single query on the fly without any central metadata definitions.</p><p>Here
is an example Drill that combines data from Hive, HBase and JSON. </p>
-// Hive table 'orders', HBase view 'custview' and JSON file 'clicks.json' are joined together
-select custview.membership, sum(orders.order_total) 
-as sales from hive.orders, custview, dfs.`/mapr/demo.mapr.com/data/nested/clicks/clicks.json`
-where orders.cust_id=custview.cust_id and orders.cust_id=c.user_info.cust_id 
-group by custview.membership order by 2;
-<h2>8. Ease of UDFs</h2><p>Drill exposes an easy and high performance Java
API to build custom functions (UDFs and UDAFs) and extend SQL for the data and the business
logic that is specific to your organization. If you have already built UDFs in Hive, you can
reuse them with Drill with no modifications. Refer to <a href="https://cwiki.apache.org/confluence/display/DRILL/Develop+Custom+Functions">Developing
Custom Functions</a> for more information.  </p><h2>9. Provides low latency
queries</h2><p>Drill is built from the ground up for short and low-latency queries
on large datasets. Drill doesn't use MapReduce; instead it comes with a distributed SQL MPP
engine to execute queries in parallel on a cluster. Any of the Drillbits (core service in
Drill) is capable of receiving requests from users. The optimizer in Drill is sophisticated
and leverages various rule- based and cost-based techniques, optimization capabilities of
the data sources, along with data locality to determine the most
  efficient query plan and then distribute the execution across multiple nodes in the cluster.
Drill also provides a columnar and vectorized execution engine to offer high memory and CPU
efficiencies along with rapid performance for a wide variety of analytic queries.  </p><h2>10.
Supports large datasets</h2><p>Drill is built to scale to big data needs and is
not restricted by memory available on the cluster nodes. For performance, Drill tries to do
query execution in-memory when possible, using an optimistic/pipelined model and spills to
disk only if the working dataset doesn't fit in memory.  </p><p>For more examples
on how to use Drill, download  <a href="https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill">Apache
Drill sandbox</a>  and try out the  <a href="http://doc.mapr.com/display/MapR/Apache+Drill+Tutorial">sandbox
+<h2>10. Scales from a single laptop to a 1000-node cluster</h2><p>Drill
is available as a simple download you can run on your laptop. When you're ready to analyze
larger datasets, simply deploy Drill on your Hadoop cluster (up to 1000 commodity servers).
Drill leverages the aggregate memory in the cluster to execute queries using an optimistic
pipelined model, and automatically spills to disk when the working set doesn't fit in memory.</p>.
 						<!-- Last Line -->

View raw message