drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject [1/5] drill git commit: remove avro, not in mapr
Date Wed, 01 Jul 2015 01:33:23 GMT
Repository: drill
Updated Branches:
  refs/heads/gh-pages 8587b7f31 -> 3a9872370


remove avro, not in mapr

forgot to remove this file

remove leftover

wordsmithing


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/9784cfc1
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/9784cfc1
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/9784cfc1

Branch: refs/heads/gh-pages
Commit: 9784cfc17b712342c3c57eed55837ea80db24a48
Parents: 8587b7f
Author: Kristine Hahn <khahn@maprtech.com>
Authored: Tue Jun 30 11:26:41 2015 -0700
Committer: Kristine Hahn <khahn@maprtech.com>
Committed: Tue Jun 30 11:54:39 2015 -0700

----------------------------------------------------------------------
 _docs/img/18.png                                | Bin 18137 -> 66175 bytes
 .../040-using-tibco-spotfire-with-drill.md      |  50 -------------------
 .../sql-commands/035-partition-by-clause.md     |  49 ++++++++++++++----
 3 files changed, 39 insertions(+), 60 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/9784cfc1/_docs/img/18.png
----------------------------------------------------------------------
diff --git a/_docs/img/18.png b/_docs/img/18.png
index ac5b802..ebc7d81 100644
Binary files a/_docs/img/18.png and b/_docs/img/18.png differ

http://git-wip-us.apache.org/repos/asf/drill/blob/9784cfc1/_docs/odbc-jdbc-interfaces/using-drill-with-bi-tools/040-using-tibco-spotfire-with-drill.md
----------------------------------------------------------------------
diff --git a/_docs/odbc-jdbc-interfaces/using-drill-with-bi-tools/040-using-tibco-spotfire-with-drill.md
b/_docs/odbc-jdbc-interfaces/using-drill-with-bi-tools/040-using-tibco-spotfire-with-drill.md
deleted file mode 100644
index 6f991de..0000000
--- a/_docs/odbc-jdbc-interfaces/using-drill-with-bi-tools/040-using-tibco-spotfire-with-drill.md
+++ /dev/null
@@ -1,50 +0,0 @@
----
-title: "Using Tibco Spotfire Desktop with Drill"
-parent: "Using Drill with BI Tools"
----
-Tibco Spotfire Desktop is a powerful analytic tool that enables SQL statements when connecting
to data sources. Spotfire Desktop can utilize the powerful query capabilities of Apache Drill
to query complex data structures. Use the MapR Drill ODBC Driver to configure Tibco Spotfire
Desktop with Apache Drill.
-
-To use Spotfire Desktop with Apache Drill, complete the following steps:
-
-1.  Install the Drill ODBC Driver from MapR.
-2.	Configure the Spotfire Desktop data connection for Drill.
-
-----------
-
-
-### Step 1: Install and Configure the MapR Drill ODBC Driver 
-
-Drill uses standard ODBC connectivity to provide easy data exploration capabilities on complex,
schema-less data sets. Verify that the ODBC driver version that you download correlates with
the Apache Drill version that you use. Ideally, you should upgrade to the latest version of
Apache Drill and the MapR Drill ODBC Driver. 
-
-Complete the following steps to install and configure the driver:
-
-1.    Download the 64-bit MapR Drill ODBC Driver for Windows from the following location:<br>
[http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/](http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/)
    
-**Note:** Spotfire Desktop 6.5.1 utilizes the 64-bit ODBC driver.
-2.    Complete steps 2-8 under on the following page to install the driver:<br> 
-[http://drill.apache.org/docs/step-1-install-the-mapr-drill-odbc-driver-on-windows/](http://drill.apache.org/docs/step-1-install-the-mapr-drill-odbc-driver-on-windows/)
-3.    Complete the steps on the following page to configure the driver:<br>
-[http://drill.apache.org/docs/step-2-configure-odbc-connections-to-drill-data-sources/](http://drill.apache.org/docs/step-2-configure-odbc-connections-to-drill-data-sources/)
-
-----------
-
-
-### Step 2: Configure the Spotfire Desktop Data Connection for Drill 
-Complete the following steps to configure a Drill data connection: 
-
-1. Select the **Add Data Connection** option or click the Add Data Connection button in the
menu bar, as shown in the image below:![](http://i.imgur.com/p3LNNBs.png)
-2. When the dialog window appears, click the **Add** button, and select **Other/Database**
from the dropdown list.![](http://i.imgur.com/u1g9kaT.png)
-3. In the Open Database window that appears, select **Odbc Data Provider** and then click
**Configure**. ![](http://i.imgur.com/8Gu0GAZ.png)
-4. In the Configure Data Source Connection window that appears, select the Drill DSN that
you configured in the ODBC administrator, and enter the relevant credentials for Drill.<br>
![](http://i.imgur.com/Yd6BKls.png) 
-5. Click **OK** to continue. The Spotfire Desktop queries the Drill metadata for available
schemas, tables, and views. You can navigate the schemas in the left-hand column. After you
select a specific view or table, the relevant SQL displays in the right-hand column. 
-![](http://i.imgur.com/wNBDs5q.png)
-6. Optionally, you can modify the SQL to work best with Drill. Simply change the schema.table.*
notation in the SELECT statement to simply * or the relevant column names that are needed.

-Note that Drill has certain reserved keywords that you must put in back ticks [ ` ] when
needed. See [Drill Reserved Keywords](http://drill.apache.org/docs/reserved-keywords/).
-7. Once the SQL is complete, provide a name for the Data Source and click **OK**. Spotfire
Desktop queries Drill and retrieves the data for analysis. You can use the functionality of
Spotfire Desktop to work with the data.
-![](http://i.imgur.com/j0MWorh.png)
-
-**NOTE:** You can use the SQL statement column to query data and complex structures that
do not display in the left-hand schema column. A good example is JSON files in the file system.
-
-**SQL Example:**<br>
-SELECT t.trans_id, t.`date`, t.user_info.cust_id as cust_id, t.user_info.device as device
FROM dfs.clicks.`/clicks/clicks.campaign.json` t
-
-----------

http://git-wip-us.apache.org/repos/asf/drill/blob/9784cfc1/_docs/sql-reference/sql-commands/035-partition-by-clause.md
----------------------------------------------------------------------
diff --git a/_docs/sql-reference/sql-commands/035-partition-by-clause.md b/_docs/sql-reference/sql-commands/035-partition-by-clause.md
index aa9a0ba..3214c05 100644
--- a/_docs/sql-reference/sql-commands/035-partition-by-clause.md
+++ b/_docs/sql-reference/sql-commands/035-partition-by-clause.md
@@ -2,12 +2,14 @@
 title: "PARTITION BY Clause"
 parent: "SQL Commands"
 ---
-You can take advantage of automatic partitioning in Drill 1.1 by using the PARTITION BY clause
in the CTAS command.
+In Drill 1.1, using the PARTITION BY clause in the CTAS command, automatically partitions
data, which Drill [prunes]({{site.baseurl}}/docs/partition-pruning/) when you query the data
to improve performance.  
 
 ## Syntax
 
     [ PARTITION_BY ( column_name[, . . .] ) ] 
 
+The PARTITION BY clause partitions the data by the first column_name, and then subpartitions
the data by the next column_name, and so on. 
+
 Only the Parquet storage format is supported for automatic partitioning. Before using CTAS,
[set the `store.format` option]({{site.baseurl}}/docs/create-table-as-ctas/#setting-the-storage-format)
for the table to Parquet.
 
 When the base table in the SELECT statement is schema-less, include columns in the PARTITION
BY clause in the table's column list, or use a select all (SELECT *) statement:  
@@ -47,7 +49,10 @@ Each line in the TSV file has the following structure:
 
 For example, lines 1722089 and 1722090 in the file contain this data:
 
-<table ><tbody><tr><th >ngram</th><th >year</th><th
colspan="1" >match_count</th><th >volume_count</th></tr><tr><td
><p class="p1">Zoological Journal of the Linnean</p></td><td >2007</td><td
colspan="1" >284</td><td >101</td></tr><tr><td colspan="1"
><p class="p1">Zoological Journal of the Linnean</p></td><td colspan="1"
>2008</td><td colspan="1" >257</td><td colspan="1" >87</td></tr></tbody></table>

+| ngram                             | year | match_count | volume_count |
+|-----------------------------------|------|-------------|--------------|
+| Zoological Journal of the Linnean | 2007 | 284         | 101          |
+| Zoological Journal of the Linnean | 2008 | 257         | 87           |
   
 In 2007, "Zoological Journal of the Linnean" occurred 284 times overall in 101
 distinct books of the Google sample.
@@ -103,7 +108,7 @@ a file to have this extension.
 		0_0_11.parquet	0_0_3.parquet	0_0_48.parquet	0_0_66.parquet
 		0_0_12.parquet	0_0_30.parquet	0_0_49.parquet	0_0_67.parquet
         . . .  
-7. Query the `by_yr` directory to check the data partitioning.  
+7. Query the `by_yr` directory to check to see how the data appears.  
    `SELECT * FROM by_yr LIMIT 100`;  
    The output looks something like this:
 
@@ -114,19 +119,43 @@ a file to have this extension.
         | 1737  | Zones_NOUN of_ADP the_DET Earth_NOUN ,_.                   | 2        
   |
         . . .
         | 1737  | Zobah , David slew of                                      | 1        
   |
-        | 1966  | zones_NOUN of_ADP the_DET medulla_NOUN ._.                 | 3        
   |
-        | 1966  | zone_NOUN is_VERB more_ADV or_CONJ less_ADJ                | 1        
   |
-        . . .
-        +-------+------------------------+-------------+
-        |  yr   |         ngram          | occurrances |
-        +-------+------------------------+-------------+
         | 1966  | zone by virtue of the  | 1           |
         +-------+------------------------+-------------+
         100 rows selected (2.184 seconds)
    Files are partitioned by year. The output is not expected to be in perfect sorted order
because Drill reads files sequentially. 
+8. Distributed mode: Query the data to find all ngrams in 1993.
+
+        SELECT * FROM by_yr WHERE yr=1993;
+        +-------+-------------------------------------------------------------+--------------+
+        |  yr   |                            ngram                            | occurrances
 |
+        +-------+-------------------------------------------------------------+--------------+
+        | 1993  | zoom out , click the                                        | 1       
    |
+        | 1993  | zones on earth . _END_                                          | 4   
       |
+        . . .
+        | 1993  | zoology at Oxford University ,                                  | 5   
       |
+        | 1993  | zones_NOUN ,_. based_VERB mainly_ADV on_ADP  | 2           |
+        +-------+----------------------------------------------+-------------+
+        31,100 rows selected (5.45 seconds)
+
+    Drill performs partition pruning when you query partitioned data, which improves performance.
+9. Distributed mode: Query the unpartitioned data to compare the performance of the query
of the partitioned data in the last step.
+
+        SELECT * FROM `/googlebooks-eng-all-5gram-20120701-zo.tsv` WHERE (columns[1] = '1993');
+
+        SELECT * FROM `googlebooks-eng-all-5gram-20120701-zo.tsv` WHERE (columns[1] = '1993');
+        +--------------------------------------------------------------------------------+
+        |                                    columns                                    
|
+        +--------------------------------------------------------------------------------+
+        | ["Zone , the government of","1993","1","1"]                                   
|
+        | ["Zone : Fragments for a","1993","7","7"]                                     
|
+        . . .
+        | ["zooxanthellae_NOUN and_CONJ the_DET evolution_NOUN of_ADP","1993","4","3"]  |
+        +-------------------------------------------------------------------------------+
+        31,100 rows selected (8.389 seconds)
 
-## Other Examples
+    The more data you query, the greater the performance benefit is from partition pruning.

 
+## Other Examples
 
     USE cp;
 	CREATE TABLE mytable1 PARTITION BY (r_regionkey) AS 


Mime
View raw message