drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tshi...@apache.org
Subject [06/12] drill git commit: Chris Westin's Fixes for the Tutorials TOC and Drill in 10 Minutes pull request
Date Fri, 22 May 2015 18:30:56 GMT
Chris Westin's Fixes for the Tutorials TOC and Drill in 10 Minutes pull request


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/72b6615b
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/72b6615b
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/72b6615b

Branch: refs/heads/gh-pages
Commit: 72b6615b8e88845615111a055be7dc7cb994f2ea
Parents: 5c42816
Author: Kristine Hahn <khahn@maprtech.com>
Authored: Thu May 21 18:43:09 2015 -0700
Committer: Kristine Hahn <khahn@maprtech.com>
Committed: Thu May 21 18:43:09 2015 -0700

----------------------------------------------------------------------
 ...20-installing-drill-on-linux-and-mac-os-x.md |  7 ++-
 .../020-querying-parquet-files.md               | 27 ++--------
 _docs/tutorials/020-drill-in-10-minutes.md      | 40 +++++---------
 .../030-analyzing-the-yelp-academic-dataset.md  | 56 ++++++++------------
 4 files changed, 44 insertions(+), 86 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/72b6615b/_docs/install/installing-drill-in-embedded-mode/020-installing-drill-on-linux-and-mac-os-x.md
----------------------------------------------------------------------
diff --git a/_docs/install/installing-drill-in-embedded-mode/020-installing-drill-on-linux-and-mac-os-x.md
b/_docs/install/installing-drill-in-embedded-mode/020-installing-drill-on-linux-and-mac-os-x.md
index 51c80fe..492c218 100755
--- a/_docs/install/installing-drill-in-embedded-mode/020-installing-drill-on-linux-and-mac-os-x.md
+++ b/_docs/install/installing-drill-in-embedded-mode/020-installing-drill-on-linux-and-mac-os-x.md
@@ -6,9 +6,12 @@ First, check that you [meet the prerequisites]({{site.baseurl}}/docs/embedded-mo
 
 Complete the following steps to install Drill:  
 
-1. Issue the following command in a terminal to download the latest, stable version of Apache
Drill to a directory on your machine, or download Drill from the [Drill web site](http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz):
+1. In a terminal windows, change to the directory where you want to install Drill.
 
-        wget http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz  
+2. one of the following two commands (some systems will have wget, and some will have curl)
to download the latest version of Apache Drill, or download Drill from the [Drill web site](http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz):
+
+   * `wget http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz`  
+   *  `curl -o apache-drill-1.0.0.tar.gz http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz`
 
 
 2. Copy the downloaded file to the directory where you want to install Drill. 
 

http://git-wip-us.apache.org/repos/asf/drill/blob/72b6615b/_docs/query-data/query-a-file-system/020-querying-parquet-files.md
----------------------------------------------------------------------
diff --git a/_docs/query-data/query-a-file-system/020-querying-parquet-files.md b/_docs/query-data/query-a-file-system/020-querying-parquet-files.md
index 3731f65..df14662 100644
--- a/_docs/query-data/query-a-file-system/020-querying-parquet-files.md
+++ b/_docs/query-data/query-a-file-system/020-querying-parquet-files.md
@@ -12,20 +12,9 @@ The examples assume that Drill was [installed in embedded mode]({{ site.baseurl
 
 ## Region File
 
-To view the data in the `region.parquet` file, issue the query appropriate for
-your operating system:
-
-  * Linux  
-    
-        SELECT * FROM dfs.`/opt/drill/apache-drill-1.0.0/sample-data/region.parquet`;
+To view the data in the `region.parquet` file, issue the following query:
 
-  * Mac OS X  
-        
-        SELECT * FROM dfs.`/Users/max/drill/apache-drill-1.0.0/sample-data/region.parquet`;
-
-  * Windows  
-    
-        SELECT * FROM dfs.`C:\drill\apache-drill-1.0.0\sample-data\region.parquet`;
+        SELECT * FROM dfs.`<path-to-installation>/apache-drill-<version>\sample-data\region.parquet`;
 
 The query returns the following results:
 
@@ -49,17 +38,7 @@ systems.
 To view the data in the `nation.parquet` file, issue the query appropriate for
 your operating system:
 
-  * Linux  
-  
-        SELECT * FROM dfs.`/opt/drill/apache-drill-1.0.0/sample-data/nation.parquet`;
-
-  * Mac OS X  
-
-        SELECT * FROM dfs.`/Users/max/drill/apache-drill-1.0.0-incubating/sample-data/nation.parquet`;
-
-  * Windows  
-
-        SELECT * FROM dfs.`C:\drill\apache-drill-1.0.0-incubating\sample-data\nation.parquet`;
+        SELECT * FROM dfs.`<path-to-installation>/apache-drill-<version>/apache-drill-1.0.0/sample-data/nation.parquet`;
 
 The query returns the following results:
 

http://git-wip-us.apache.org/repos/asf/drill/blob/72b6615b/_docs/tutorials/020-drill-in-10-minutes.md
----------------------------------------------------------------------
diff --git a/_docs/tutorials/020-drill-in-10-minutes.md b/_docs/tutorials/020-drill-in-10-minutes.md
index a0b6203..4ce8848 100755
--- a/_docs/tutorials/020-drill-in-10-minutes.md
+++ b/_docs/tutorials/020-drill-in-10-minutes.md
@@ -43,13 +43,16 @@ The output looks something like this:
 
 Complete the following steps to install Drill:  
 
-1. Issue the following command in a terminal to download the latest version of Apache Drill
to a directory on your machine, or download Drill from the [Drill web site](http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz):
+1. In a terminal windows, change to the directory where you want to install Drill.
 
-        wget http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz  
+2. To download the latest version of Apache Drill, download Drill from the [Drill web site](http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz)or
run one of the following commands, depending on which you have installed on your system:
 
-2. Copy the downloaded file to the directory where you want to install Drill. 
+   * `wget http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz`  
+   *  `curl -o apache-drill-1.0.0.tar.gz http://getdrill.org/drill/download/apache-drill-1.0.0.tar.gz`
 
 
-3. Extract the contents of the Drill tar.gz file. Use sudo if necessary:  
+3. Copy the downloaded file to the directory where you want to install Drill. 
+
+4. Extract the contents of the Drill tar.gz file. Use sudo if necessary:  
 
         sudo tar -xvzf apache-drill-1.0.0.tar.gz  
 
@@ -107,7 +110,7 @@ Issue the following command when you want to exit the Drill shell:
 
 ## Query Sample Data
 
-Your Drill installation includes a `sample-date` directory with JSON and
+Your Drill installation includes a `sample-data` directory with JSON and
 Parquet files that you can query. The local file system on your machine is
 configured as the `dfs` storage plugin instance by default when you install
 Drill in embedded mode. For more information about storage plugin
@@ -120,7 +123,7 @@ Use SQL syntax to query the sample `JSON` and `Parquet` files in the `sample-dat
 A sample JSON file, `employee.json`, contains fictitious employee data.
 
 To view the data in the `employee.json` file, submit the following SQL query
-to Drill:
+to Drill, using the [cp (classpath) storage plugin]({{site.baseurl}}/docs/storage-plugin-registration/)
to point to the file.
     
     0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json` LIMIT 3;
 
@@ -146,20 +149,12 @@ If you followed the Apache Drill in 10 Minutes instructions to install
Drill
 in embedded mode, the path to the parquet file varies between operating
 systems.
 
-{% include startnote.html %}When you enter the query, include the version of Drill that you
are currently running.{% include endnote.html %} 
+{% include startnote.html %}Substitute your installation path and the Drill version in the
angle-bracketed locations when you enter the query.{% include endnote.html %} 
 
 To view the data in the `region.parquet` file, issue the query appropriate for
 your operating system:
 
-* Linux  
-
-        SELECT * FROM dfs.`/opt/drill/apache-drill-<version>/sample-data/region.parquet`;
-* Mac OS X
-  
-        SELECT * FROM dfs.`/Users/max/drill/apache-drill-<version>/sample-data/region.parquet`;
-* Windows  
-        
-        SELECT * FROM dfs.`C:\drill\apache-drill-<version>\sample-data\region.parquet`;
+        SELECT * FROM dfs.`<path-to-installation>/apache-drill-<version>/sample-data/region.parquet`;
 
 The query returns the following results:
 
@@ -180,21 +175,12 @@ If you followed the Apache Drill in 10 Minutes instructions to install
Drill
 in embedded mode, the path to the parquet file varies between operating
 systems.
 
-{% include startnote.html %}When you enter the query, include the version of Drill that you
are currently running.{% include endnote.html %}
+{% include startnote.html %}Substitute your installation path and the Drill version in the
angle-bracketed locations when you enter the query{% include endnote.html %}
 
 To view the data in the `nation.parquet` file, issue the query appropriate for
 your operating system:
 
-* Linux  
-
-          SELECT * FROM dfs.`/opt/drill/apache-drill-<version>/sample-data/nation.parquet`;
-* Mac OS X
-  
-          SELECT * FROM dfs.`/Users/max/drill/apache-drill-<version>/sample-data/nation.parquet`;
-
-* Windows 
- 
-          SELECT * FROM dfs.`C:\drill\apache-drill-<version>\sample-data\nation.parquet`;
+          SELECT * FROM dfs.`<path-to-installation>/apache-drill-<version>/sample-data/nation.parquet`;
 
 The query returns the following results:
 

http://git-wip-us.apache.org/repos/asf/drill/blob/72b6615b/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md
----------------------------------------------------------------------
diff --git a/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md b/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md
index 1eb2728..c16964b 100644
--- a/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md
+++ b/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md
@@ -5,9 +5,9 @@ parent: "Tutorials"
 Apache Drill is one of the fastest growing open source projects, with the community making
rapid progress with monthly releases. The key difference is Drill’s agility and flexibility.
 Along with meeting the table stakes for SQL-on-Hadoop, which is to achieve low
 latency performance at scale, Drill allows users to analyze the data without
-any ETL or up-front schema definitions. The data could be in any file format
-such as text, JSON, or Parquet. Data could have simple types such as string,
-integer, dates, or more complex multi-structured data, such as nested maps and
+any ETL or up-front schema definitions. The data can be in any file format
+such as text, JSON, or Parquet. Data can have simple types such as strings,
+integers, dates, or more complex multi-structured data, such as nested maps and
 arrays. Data can exist in any file system, local or distributed, such as HDFS,
 MapR FS, or S3. Drill, has a “no schema” approach, which enables you to get
 value from your data in just a few minutes.
@@ -23,25 +23,15 @@ example is downloadable from [Yelp](http://www.yelp.com/dataset_challenge)
 
 ### Step 1: Download Apache Drill onto your local machine
 
-[http://drill.apache.org/download/](http://drill.apache.org/download/)
+To experiment with Drill locally, follow the installation instructions in [Drill in 10 Minutes]({{site.baseurl}}/docs/drill-in-10-minutes/).
 
-You can also [in Drill in distributed mode]({{ site.baseurl }}/docs/installing-drill-in-distributed-mode)
if you
+Alternatively, you can [install Drill in distributed mode]({{ site.baseurl }}/docs/installing-drill-in-distributed-mode)
if you
 want to scale your environment.
 
-### Step 2 : Open the Drill tar file
-
-    tar -xvf apache-drill-0.1.0.tar.gz
-
-### Step 3: Start the Drill shell.
-
-    bin/drill-embedded
-
-That’s it! You are now ready explore the data.
-
 Let’s try out some SQL examples to understand how Drill makes the raw data
 analysis extremely easy.
 
-{% include startnote.html %}You need to substitute your local path to the Yelp data set in
the FROM clause of each query you run.{% include endnote.html %}
+{% include startnote.html %}You need to substitute your local path to the Yelp data set in
the angle-bracketed portion of the FROM clause of each query you run.{% include endnote.html
%}
 
 ----------
 
@@ -52,7 +42,7 @@ analysis extremely easy.
     0: jdbc:drill:zk=local> !set maxwidth 10000
 
     0: jdbc:drill:zk=local> select * from
-        dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json`
+        dfs.`<path-to-yelp-dataset>/yelp/yelp_academic_dataset_business.json`
         limit 1;
 
     +------------------------+----------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+--------------------------------+---------+--------------+-------------------+-------------+-------+-------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------------+
@@ -70,7 +60,7 @@ You can directly query self-describing files such as JSON, Parquet, and
text. Th
 #### Total reviews in the data set
 
     0: jdbc:drill:zk=local> select sum(review_count) as totalreviews 
-    from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json`;
+    from dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_business.json`;
 
     +--------------+
     | totalreviews |
@@ -81,7 +71,7 @@ You can directly query self-describing files such as JSON, Parquet, and
text. Th
 #### Top states and cities in total number of reviews
 
     0: jdbc:drill:zk=local> select state, city, count(*) totalreviews 
-    from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` 
+    from dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_business.json` 
     group by state, city order by count(*) desc limit 10;
 
     +------------+------------+--------------+
@@ -102,7 +92,7 @@ You can directly query self-describing files such as JSON, Parquet, and
text. Th
 #### Average number of reviews per business star rating
 
     0: jdbc:drill:zk=local> select stars,trunc(avg(review_count)) reviewsavg 
-    from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json`
+    from dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_business.json`
     group by stars order by stars desc;
 
     +------------+------------+
@@ -122,7 +112,7 @@ You can directly query self-describing files such as JSON, Parquet, and
text. Th
 #### Top businesses with high review counts (> 1000)
 
     0: jdbc:drill:zk=local> select name, state, city, `review_count` from
-    dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json`
+    dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_business.json`
     where review_count > 1000 order by `review_count` desc limit 10;
 
     +-------------------------------+-------------+------------+---------------+
@@ -145,7 +135,7 @@ You can directly query self-describing files such as JSON, Parquet, and
text. Th
     0: jdbc:drill:zk=local> select b.name, b.hours.Saturday.`open`,
     b.hours.Saturday.`close`  
     from
-    dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json`
+    dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_business.json`
     b limit 10;
 
     +----------------------------+------------+------------+
@@ -184,7 +174,7 @@ the data).
 
 Then, query the attribute’s data.
 
-    0: jdbc:drill:zk=local> select attributes from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json`
limit 10;
+    0: jdbc:drill:zk=local> select attributes from dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_business.json`
limit 10;
 
     +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
     |                                                     attributes                    
                                                                                         
     |
@@ -217,7 +207,7 @@ on data.
 
 #### Number of restaurants in the data set
 
-    0: jdbc:drill:zk=local> select count(*) as TotalRestaurants from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json`
where true=repeated_contains(categories,'Restaurants');
+    0: jdbc:drill:zk=local> select count(*) as TotalRestaurants from dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_business.json`
where true=repeated_contains(categories,'Restaurants');
     +------------------+
     | TotalRestaurants |
     +------------------+
@@ -226,7 +216,7 @@ on data.
 
 #### Top restaurants in number of reviews
 
-    0: jdbc:drill:zk=local> select name,state,city,`review_count` from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json`
where true=repeated_contains(categories,'Restaurants') order by `review_count` desc limit
10;
+    0: jdbc:drill:zk=local> select name,state,city,`review_count` from dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_business.json`
where true=repeated_contains(categories,'Restaurants') order by `review_count` desc limit
10;
 
     +------------------------+-------+-----------+--------------+
     |          name          | state |    city   | review_count |
@@ -245,7 +235,7 @@ on data.
 
 #### Top restaurants in number of listed categories
 
-    0: jdbc:drill:zk=local> select name,repeated_count(categories) as categorycount, categories
from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` where
true=repeated_contains(categories,'Restaurants') order by repeated_count(categories) desc
limit 10;
+    0: jdbc:drill:zk=local> select name,repeated_count(categories) as categorycount, categories
from dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_business.json` where true=repeated_contains(categories,'Restaurants')
order by repeated_count(categories) desc limit 10;
 
     +---------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
     | name                            | categorycount | categories                      
                                                                                         
                       |
@@ -267,7 +257,7 @@ on data.
 #### Top first categories in number of review counts
 
     0: jdbc:drill:zk=local> select categories[0], count(categories[0]) as categorycount

-    from dfs.`/users/nrentachintala/Downloads/yelp_academic_dataset_business.json` 
+    from dfs.`/<path-to-yelp-dataset>/yelp_academic_dataset_business.json` 
     group by categories[0] 
     order by count(categories[0]) desc limit 10;
 
@@ -291,7 +281,7 @@ on data.
 #### Take a look at the contents of the Yelp reviews dataset.
 
     0: jdbc:drill:zk=local> select * 
-    from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_review.json` limit
1;
+    from dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_review.json` limit
1;
     +---------------------------------+------------------------+------------------------+-------+------------+----------------------------------------------------------------------+--------+------------------------+
     | votes                           | user_id                | review_id              |
stars | date       | text                                                                
| type   | business_id            |
     +---------------------------------+------------------------+------------------------+-------+------------+----------------------------------------------------------------------+--------+------------------------+
@@ -305,9 +295,9 @@ review_count to the Yelp review data, which holds additional details on
each
 of the reviews themselves.
 
     0: jdbc:drill:zk=local> Select b.name 
-    from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` b

+    from dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_business.json` b 
     where b.business_id in (SELECT r.business_id 
-    FROM dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_review.json` r
+    FROM dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_review.json` r
     GROUP BY r.business_id having sum(r.votes.cool) > 2000 
     order by sum(r.votes.cool)  desc);
     +-------------------------------+
@@ -329,7 +319,7 @@ instead of in a logical view, you can use CREATE TABLE AS SELECT syntax.
 
     0: jdbc:drill:zk=local> create or replace view dfs.tmp.businessreviews as 
     Select b.name,b.stars,b.state,b.city,r.votes.funny,r.votes.useful,r.votes.cool, r.`date`

-    from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` b,
dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_review.json` r 
+    from dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_business.json` b,
dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_review.json` r 
     where r.business_id=b.business_id
     +------------+-----------------------------------------------------------------+
     |     ok     |                           summary                               |
@@ -346,7 +336,7 @@ Let’s get the total number of records from the view.
     | 1125458    |
     +------------+
 
-In addition to these queries, you can get many more deeper insights using
+In addition to these queries, you can get many deep insights using
 Drill’s [SQL functionality]({{ site.baseurl }}/docs/sql-reference). If you are not comfortable
with writing queries manually, you
 can use a BI/Analytics tools such as Tableau/MicroStrategy to query raw
 files/Hive/HBase data or Drill-created views directly using Drill [ODBC/JDBC
@@ -363,7 +353,7 @@ data so you can apply even deeper SQL functionality. Here is a sample
query:
 #### Get a flattened list of categories for each business
 
     0: jdbc:drill:zk=local> select name, flatten(categories) as category 
-    from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` 
limit 20;
+    from dfs.`/<path-to-yelp-dataset>/yelp/yelp_academic_dataset_business.json`  limit
20;
     +-----------------------------+---------------------------------+
     | name                        | category                        |
     +-----------------------------+---------------------------------+


Mime
View raw message