drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject [10/12] drill git commit: DRILL-2397, new data types doc, misc other fixes
Date Tue, 17 Mar 2015 21:02:51 GMT
DRILL-2397, new data types doc, misc other fixes


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/df1b7e5a
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/df1b7e5a
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/df1b7e5a

Branch: refs/heads/gh-pages
Commit: df1b7e5a9397b02a880230e6dc51a07f2b1ff997
Parents: eca98c7
Author: Kristine Hahn <khahn@maprtech.com>
Authored: Tue Mar 10 18:13:08 2015 -0700
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Wed Mar 11 18:00:37 2015 -0700

----------------------------------------------------------------------
 _docs/005-connect.md                            |   6 +-
 _docs/connect/001-plugin-reg.md                 |   4 +-
 _docs/connect/002-plugin-conf.md                |  39 +++--
 _docs/data-sources/001-hive-types.md            |  31 ++--
 _docs/data-sources/003-parquet-ref.md           |  12 +-
 _docs/data-sources/004-json-ref.md              |   3 +-
 _docs/img/Hbase_Browse.png                      | Bin 147495 -> 148451 bytes
 _docs/img/connect-plugin.png                    | Bin 0 -> 24774 bytes
 _docs/query/004-query-complex.md                |   2 +-
 _docs/sql-ref/001-data-types.md                 | 163 ++++++++++++-------
 _docs/sql-ref/002-lexical-structure.md          |  28 ++--
 _docs/sql-ref/004-functions.md                  |   2 +-
 .../data-types/002-disparate-data-types.md      |  77 +++++++++
 _docs/tutorial/002-get2kno-sb.md                |   2 +-
 14 files changed, 246 insertions(+), 123 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/df1b7e5a/_docs/005-connect.md
----------------------------------------------------------------------
diff --git a/_docs/005-connect.md b/_docs/005-connect.md
index 3c60b2d..69bb407 100644
--- a/_docs/005-connect.md
+++ b/_docs/005-connect.md
@@ -1,7 +1,8 @@
 ---
 title: "Connect to a Data Source"
 ---
-A storage plugin is an interface for connecting to a data source to read and write data.
Apache Drill connects to a data source, such as a file on the file system or a Hive metastore,
through a storage plugin. When you execute a query, Drill gets the plugin name you provide
in FROM clause of your query. 
+A storage plugin is an interface for connecting to a data source to read and write data.
Apache Drill connects to a data source, such as a file on the file system or a Hive metastore,
through a storage plugin. When you execute a query, Drill gets the plugin name you provide
in FROM clause of your query or from the default you specify in the USE.<plugin name>
command that precedes the query.
+. 
 
 In addition to the connection string, the storage plugin configures the workspace and file
formats for reading and writing data, as described in subsequent sections. 
 
@@ -18,8 +19,7 @@ A storage plugin provides the following information to Drill:
   * Interfaces that Drill can use to read from and write to data sources
   * A set of storage plugin optimization rules that assist with efficient and faster execution
of Drill queries, such as pushdowns, statistics, and partition awareness
 
-A storage plugin performs scanner and writer functions, and informs the metadata
-repository of any known metadata, such as:
+A storage plugin performs scanner and writer functions, and informs the metadata repository
of any known metadata. The metadata repository is a database created to store metadata. The
metadata is information about the structures that contain the actual data, such as:
 
   * Schema
   * File size

http://git-wip-us.apache.org/repos/asf/drill/blob/df1b7e5a/_docs/connect/001-plugin-reg.md
----------------------------------------------------------------------
diff --git a/_docs/connect/001-plugin-reg.md b/_docs/connect/001-plugin-reg.md
index 8def0bb..b3c5cdc 100644
--- a/_docs/connect/001-plugin-reg.md
+++ b/_docs/connect/001-plugin-reg.md
@@ -2,14 +2,14 @@
 title: "Storage Plugin Registration"
 parent: "Connect to a Data Source"
 ---
-You connect Drill to a file system, Hive, HBase, or other data source using storage plugins.
Drill includes a number storage plugins in the installation. On the Storage tab of the Web
UI, you can view, create, reconfigure, and register a storage plugin. To open the Storage
tab, go to `http://<IP address>:8047/storage`:
+You connect Drill to a file system, Hive, HBase, or other data source using storage plugins.
Drill includes a number of storage plugins in the installation. On the Storage tab of the
Web UI, you can view, create, reconfigure, and register a storage plugin. To open the Storage
tab, go to `http://<IP address>:8047/storage`, where IP address is any one of the installed
drill bits:
 
 ![drill-installed plugins]({{ site.baseurl }}/docs/img/plugin-default.png)
 
 The Drill installation registers the `cp`, `dfs`, `hbase`, `hive`, and `mongo` storage plugins
instances by default.
 
 * `cp`
-  Points to a JAR file in the Drill classpath that contains sample data that you can query.

+  Points to a JAR file in the Drill classpath that contains the Transaction Processing Performance
Council (TPC) benchmark schema TPC-H that you can query. 
 * `dfs`
   Points to the local file system on your machine, but you can configure this instance to
 point to any distributed file system, such as a Hadoop or S3 file system. 

http://git-wip-us.apache.org/repos/asf/drill/blob/df1b7e5a/_docs/connect/002-plugin-conf.md
----------------------------------------------------------------------
diff --git a/_docs/connect/002-plugin-conf.md b/_docs/connect/002-plugin-conf.md
index 632ed3f..caef2ad 100644
--- a/_docs/connect/002-plugin-conf.md
+++ b/_docs/connect/002-plugin-conf.md
@@ -3,8 +3,8 @@ title: "Storage Plugin Configuration"
 parent: "Connect to a Data Source"
 ---
 When you add or update storage plugin instances on one Drill node in a Drill
-cluster, Drill broadcasts the information to all of the other Drill nodes so
-they all have identical storage plugin configurations. You do not need to
+cluster, Drill broadcasts the information to all of the other Drill nodes 
+to have identical storage plugin configurations. You do not need to
 restart any of the Drillbits when you add or update a storage plugin instance.
 
 Use the Drill Web UI to update or add a new storage plugin. Launch a web browser, go to:
`http://<IP address of the sandbox>:8047`, and then go to the Storage tab. 
@@ -21,6 +21,11 @@ name. Names are case-sensitive.
 Click Update to reconfigure an existing, enabled storage plugin.
 
 ## Storage Plugin Attributes
+The following diagram of the dfs storage plugin briefly describes options you configure in
a typical storage plugin configuration:
+
+![dfs plugin]({{ site.baseurl }}/docs/img/connect-plugin.png)
+
+The following table describes the attributes you configure for storage plugins in more detail
than the diagram. 
 
 <table>
   <tr>
@@ -33,72 +38,74 @@ Click Update to reconfigure an existing, enabled storage plugin.
     <td>"type"</td>
     <td>"file"<br>"hbase"<br>"hive"<br>"mongo"</td>
     <td>yes</td>
-    <td>The storage plugin type name supported by Drill</td>
+    <td>The storage plugin type name supported by Drill.</td>
   </tr>
   <tr>
     <td>"enabled"</td>
     <td>true<br>false</td>
     <td>yes</td>
-    <td>The state of the storage plugin</td>
+    <td>The state of the storage plugin.</td>
   </tr>
   <tr>
     <td>"connection"</td>
     <td>"classpath:///"<br>"file:///"<br>"mongodb://localhost:27017/"<br>"maprfs:///"</td>
     <td>implementation-dependent</td>
-    <td>The type of distributed filesystem. Drill can work with any distributed system,
such as HDFS and S3, or JSON or CSV file in your file system.</td>
+    <td>The type of distributed file system. Drill can work with any distributed system,
such as HDFS and S3, or files in your file system.</td>
   </tr>
   <tr>
     <td>"workspaces"</td>
     <td>null<br>"logs"</td>
     <td>no</td>
-    <td>One or more arbitrary workspace names, enclosed in double quotation marks.
Workspace names are case sensitive.</td>
+    <td>One or more unique workspace names, enclosed in double quotation marks. If
a workspace is defined more than once, the latest one overrides the previous ones. Not used
with local or distributed file systems.</td>
   </tr>
   <tr>
     <td>"workspaces". . . "location"</td>
     <td>"location": "/"<br>"location": "/tmp"</td>
     <td>no</td>
-    <td>An alias for a path to a directory, used to shorten references to files in
the a query. </td>
+    <td>The path to a directory on the file system.</td>
   </tr>
   <tr>
     <td>"workspaces". . . "writable"</td>
     <td>true<br>false</td>
-    <td>yes</td>
-    <td>Indicates whether or not users can write data to this location</td>
+    <td>no</td>
+    <td>One or more unique workspace names, enclosed in double quotation marks. If
a workspace is defined more than once, the latest one overrides the previous ones. Not used
with local or distributed file systems.</td>
   </tr>
   <tr>
     <td>"workspaces". . . "defaultInputFormat"</td>
     <td>null<br>"parquet"<br>"csv"<br>"json"</td>
-    <td>yes</td>
-    <td>The format of data written by executing CREATE TABLE AS and CREATE VIEW commands</td>
+    <td>no</td>
+    <td>The format of data Drill reads by default, regardless of extension. Parquet
is the default.</td>
   </tr>
   <tr>
     <td>"formats"</td>
     <td>"psv"<br>"csv"<br>"tsv"<br>"parquet"<br>"json"<br>"maprdb"</td>
     <td>yes</td>
-    <td>One or more file formats of data read by Drill. Drill can detect some file
formats based on the file extension or the first few bits of data within the file, but needs
format information for others. </td>
+    <td>One or more file formats of data Drill can read. Drill can implicitly detect
some file formats based on the file extension or the first few bits of data within the file,
but you need to configure an option for others.</td>
   </tr>
   <tr>
     <td>"formats" . . . "type"</td>
     <td>"text"<br>"parquet"<br>"json"<br>"maprdb"</td>
-    <td>no</td>
-    <td>The type of data in the file. Although Drill can work with different file types
in the same directory, restricting a Drill workspace to one file type prevents confusion.
</td>
+    <td>yes</td>
+    <td>The type of the format specified. For example, you can define two formats,
csv and psv, as type "Text", but having different delimiters. Drill enables the maprdb plugin
if you define the maprdb type.</td>
   </tr>
   <tr>
     <td>formats . . . "extensions"</td>
     <td>["csv"]</td>
     <td>format-dependent</td>
-    <td>The extension of the file.</td>
+    <td>The extensions of the files that Drill can read.</td>
   </tr>
   <tr>
     <td>"formats" . . . "delimiter"</td>
     <td>"\t"<br>","</td>
     <td>format-dependent</td>
-    <td>The delimiter used to separate columns in text files such as CSV.</td>
+    <td>The delimiter used to separate columns in text files such as CSV. Specify a
non-printable delimiter in the storage plugin config by using the form \uXXXX, where XXXX
is the four numeral hex ascii code for the character.</td>
   </tr>
 </table>
 
 The configuration of other attributes, such as `size.calculator.enabled` in the hbase plugin
and `configProps` in the hive plugin, are implementation-dependent and beyond the scope of
this document.
 
+Although Drill can work with different file types in the same directory, restricting a Drill
workspace to one file type prevents confusion.
+
 ## Case-sensitive Names
 As previously mentioned, workspace and storage plugin names are case-sensitive. For example,
the following query uses a storage plugin name `dfs` and a workspace name `clicks`. When you
refer to `dfs.clicks` in an SQL statement, use the defined case:
 

http://git-wip-us.apache.org/repos/asf/drill/blob/df1b7e5a/_docs/data-sources/001-hive-types.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources/001-hive-types.md b/_docs/data-sources/001-hive-types.md
index c6cdb90..a4721bf 100644
--- a/_docs/data-sources/001-hive-types.md
+++ b/_docs/data-sources/001-hive-types.md
@@ -5,7 +5,7 @@ parent: "Data Sources and File Formats"
 Using Drill you can read tables created in Hive that use data types compatible with Drill.
Drill currently does not support writing Hive tables. The following table shows Drill support
for Hive primitive types:
 <table>
   <tr>
-    <th>SQL Type</th>
+    <th>Supported SQL Type</th>
     <th>Hive Type</th>
     <th>Drill Description</th>
   </tr>
@@ -80,25 +80,16 @@ Using Drill you can read tables created in Hive that use data types compatible
w
     <td>Conventional UNIX Epoch timestamp.</td>
   </tr>
   <tr>
-    <td>N/A</td>
+    <td>None</td>
     <td>TIMESTAMP</td>
     <td>JDBC timestamp in yyyy-mm-dd hh:mm:ss format</td>
   </tr>
   <tr>
-    <td>TIMESTAMPTZ</td>
-    <td>N/A</td>
-    <td>Hours ahead of or behind Coordinated Universal Time (UTC) or regional hours
and minutes</td>
-  </tr>
-  <tr>
-    <td>N/A</td>
+    <td>None</td>
     <td>STRING</td>
     <td>Binary string (16)</td>
   </tr>
-  <tr>
-    <td>BINARY</td>
-    <td>BINARY</td>
-    <td>Binary string</td>
-  </tr>
+  
   <tr>
     <td>VARCHAR</td>
     <td>VARCHAR</td>
@@ -120,7 +111,7 @@ The Hive version used in MapR supports the Hive timestamp in Unix Epoch
format.
 ## Type Mapping Example
 This example demonstrates the mapping of Hive data types to Drill data types. Using a CSV
that has the following contents, you create a Hive table having values of different supported
types:
 
-     100005,true,3.5,-1231.4,3.14,42,"SomeText",2015-03-25,2015-03-25 01:23:15 
+     8223372036854775807,true,3.5,-1231.4,3.14,42,"SomeText",2015-03-25,2015-03-25 01:23:15

 
 The example assumes that the CSV resides on the MapR file system (MapRFS) in the Drill sandbox:
`/mapr/demo.mapr.com/data/`
  
@@ -144,7 +135,7 @@ You check that Hive mapped the data from the CSV to the typed values as
as expec
 
     hive> SELECT * FROM types_demo;
     OK
-    100005	true	3.5	-1231.4	3.14	42	"SomeText"	2015-03-25   2015-03-25 01:23:15
+    8223372036854775807	true	3.5	-1231.4	3.14	42	"SomeText"	2015-03-25   2015-03-25 01:23:15
     Time taken: 0.524 seconds, Fetched: 1 row(s)
 
 In Drill, you use the Hive storage plugin that has the following definition.
@@ -171,11 +162,11 @@ Using the Hive storage plugin connects Drill to the Hive metastore containing
th
 The data in the Hive table shows the expected values.
 	
 	0: jdbc:drill:> SELECT * FROM hive.`types_demo`;
-	+--------+------+------+---------+------+----+------------+------------+-----------+
-	|   a    |   b  |  c   |     d   |  e   | f  |     g      |     h      |     i     |
-	+------------+---------+---------+------+----+------------+------------+-----------+
-	| 100005 | true | 3.50 | -1231.4 | 3.14 | 42 | "SomeText" | 2015-03-25 | 2015-03-25 01:23:15.0
|
-	+--------+------+------+---------+------+----+------------+------------+-----------+
+	+---------------------+------+------+---------+------+----+------------+------------+-----------+
+	|   a                 |   b  |  c   |     d   |  e   | f  |     g      |     h      |  
  i     |
+	+---------------------+---------+---------+------+----+------------+------------+-----------+
+	| 8223372036854775807 | true | 3.50 | -1231.4 | 3.14 | 42 | "SomeText" | 2015-03-25 | 2015-03-25
01:23:15.0 |
+	+---------------------+------+------+---------+------+----+------------+------------+-----------+
 	1 row selected (1.262 seconds)
 	
 To validate that Drill interprets the timestamp in column i correctly, use the extract function
to extract part of the date:

http://git-wip-us.apache.org/repos/asf/drill/blob/df1b7e5a/_docs/data-sources/003-parquet-ref.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources/003-parquet-ref.md b/_docs/data-sources/003-parquet-ref.md
index 71ed851..4f04463 100644
--- a/_docs/data-sources/003-parquet-ref.md
+++ b/_docs/data-sources/003-parquet-ref.md
@@ -16,13 +16,13 @@ Apache Drill includes the following support for Parquet:
 * Querying self-describing data in files or NoSQL databases without having to define and
manage schema overlay definitions in centralized metastores
 * Creating Parquet files from other file formats, such as JSON, without any set up
 * Generating Parquet files that have evolving or changing schemas and querying the data on
the fly
-* Handling Parquet scalar and complex data types, such as maps and arrays
+* Handling Parquet data types
 
 ### Reading Parquet Files
 When a read of Parquet data occurs, Drill loads only the necessary columns of data, which
reduces I/O. Reading only a small piece of the Parquet data from a data file or table, Drill
can examine and analyze all values for a column across multiple files. You can create a Drill
table from one format and store the data in another format, including Parquet.
 
 ### Writing Parquet Files
-CTAS can use any data source provided by the storage plugin. To write Parquet data using
the CTAS command, set the session store.format option as shown in the next section. Alternatively,
configure the storage plugin to point to the directory containing the Parquet files.
+CREATE TABLE AS SELECT (CTAS) can use any data source provided by the storage plugin. To
write Parquet data using the CTAS command, set the session store.format option as shown in
the next section. Alternatively, configure the storage plugin to point to the directory containing
the Parquet files.
 
 Although the data resides in a single table, Parquet output generally consists of multiple
files that resemble MapReduce output having numbered file names,  such as 0_0_0.parquet in
a directory.
 
@@ -56,7 +56,7 @@ The following general process converts a file from JSON to Parquet:
 * Create or use an existing storage plugin that specifies the storage location of the Parquet
file, mutability of the data, and supported file formats.
 * Take a look at the JSON data. 
 * Create a table that selects the JSON file.
-* In the CTAS command, cast JSON string data to corresponding SQL types.
+* In the CTAS command, cast JSON string data to corresponding [SQL types](/docs/json-data-model/data-type-mapping).
 
 ### Example: Read JSON, Write Parquet
 This example demonstrates a storage plugin definition, a sample row of data from a JSON file,
and a Drill query that writes the JSON input to Parquet output. 
@@ -65,7 +65,7 @@ This example demonstrates a storage plugin definition, a sample row of data
from
 You can use the default dfs storage plugin installed with Drill for reading and writing Parquet
files. The storage plugin needs to configure the writable option of the workspace to true,
so Drill can write the Parquet output. The dfs storage plugin defines the tmp writable workspace,
which you can use in the CTAS command to create a Parquet table.
 
 #### Sample Row of JSON Data
-A JSON file contains data consisting of strings, typical of JSON data. The following example
shows one row of the JSON file:
+A JSON file called sample.json contains data consisting of strings, typical of JSON data.
The following example shows one row of the JSON file:
 
         {"trans_id":0,"date":"2013-07-26","time":"04:56:59","amount":80.5,"user_info":
           {"cust_id":28,"device":"WEARABLE2","state":"mt"
@@ -80,7 +80,7 @@ A JSON file contains data consisting of strings, typical of JSON data. The
follo
               
 
 #### CTAS Query      
-The following example shows a CTAS query that creates a table from JSON data. The command
casts the date, time, and amount strings to SQL types DATE, TIME, and DOUBLE. String-to-VARCHAR
casting of the other strings occurs automatically.
+The following example shows a CTAS query that creates a table from JSON data shown in the
last example. The command casts the date, time, and amount strings to SQL types DATE, TIME,
and DOUBLE. String-to-VARCHAR casting of the other strings occurs automatically.
 
     CREATE TABLE dfs.tmp.sampleparquet AS 
     (SELECT trans_id, 
@@ -90,7 +90,7 @@ The following example shows a CTAS query that creates a table from JSON
data. Th
     user_info, marketing_info, trans_info 
     FROM dfs.`/Users/drilluser/sample.json`);
         
-The CTAS query did not specify a file name extension, so Drill used the default parquet file
name extension. The output is a Parquet file:
+The CTAS query does not specify a file name extension for the output. Drill creates a parquet
file by default, as indicated by the file name in the output:
 
     +------------+---------------------------+
     |  Fragment  | Number of records written |

http://git-wip-us.apache.org/repos/asf/drill/blob/df1b7e5a/_docs/data-sources/004-json-ref.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources/004-json-ref.md b/_docs/data-sources/004-json-ref.md
index f865a8a..65b5b7c 100644
--- a/_docs/data-sources/004-json-ref.md
+++ b/_docs/data-sources/004-json-ref.md
@@ -70,7 +70,7 @@ When you set this option, Drill reads all data from the JSON files as VARCHAR.
A
 * Cast JSON numeric values to [SQL types](/docs/data-types), such as BIGINT, DECIMAL, FLOAT,
INTEGER, and SMALLINT.
 * Cast JSON strings to [Drill Date/Time Data Type Formats](/docs/supported-date-time-data-type-formats).
 
-Drill uses [map and array data types](/docs/data-types) internally for reading and writing
complex and nested data structures from JSON. You can cast data in a map or array of data
to return a value from the structure, as shown in [“Create a view on a MapR-DB table”]
(/docs/lession-2-run-queries-with-ansi-sql). “Query Complex Data” shows how to access
nested arrays, for example.
+Drill uses [map and array data types](/docs/data-types) internally for reading complex and
nested data structures from JSON. You can cast data in a map or array of data to return a
value from the structure, as shown in [“Create a view on a MapR-DB table”] (/docs/lession-2-run-queries-with-ansi-sql).
“Query Complex Data” shows how to access nested arrays.
 
 ## Reading JSON
 To read JSON data using Drill, use a [file system storage plugin](/docs/connect-to-a-data-source)
that defines the JSON format. You can use the `dfs` storage plugin, which includes the definition.

@@ -501,3 +501,4 @@ Workaround: Query each file individually.
 
 
 
+

http://git-wip-us.apache.org/repos/asf/drill/blob/df1b7e5a/_docs/img/Hbase_Browse.png
----------------------------------------------------------------------
diff --git a/_docs/img/Hbase_Browse.png b/_docs/img/Hbase_Browse.png
index 729e0f8..014ceca 100644
Binary files a/_docs/img/Hbase_Browse.png and b/_docs/img/Hbase_Browse.png differ

http://git-wip-us.apache.org/repos/asf/drill/blob/df1b7e5a/_docs/img/connect-plugin.png
----------------------------------------------------------------------
diff --git a/_docs/img/connect-plugin.png b/_docs/img/connect-plugin.png
new file mode 100644
index 0000000..d47bee3
Binary files /dev/null and b/_docs/img/connect-plugin.png differ

http://git-wip-us.apache.org/repos/asf/drill/blob/df1b7e5a/_docs/query/004-query-complex.md
----------------------------------------------------------------------
diff --git a/_docs/query/004-query-complex.md b/_docs/query/004-query-complex.md
index 537d7b4..96c06eb 100644
--- a/_docs/query/004-query-complex.md
+++ b/_docs/query/004-query-complex.md
@@ -43,7 +43,7 @@ The examples in this section operate on JSON data files. In order to write
 your own queries, you need to be aware of the basic data types in these files:
 
   * string (all data inside double quotes), such as `"0001"` or `"Cake"`
-  * numeric types: integers, decimals, and floats, such as `0.55` or `10`
+  * number: integers, decimals, and floats, such as `0.55` or `10`
   * null values
   * boolean values: true, false
 

http://git-wip-us.apache.org/repos/asf/drill/blob/df1b7e5a/_docs/sql-ref/001-data-types.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/001-data-types.md b/_docs/sql-ref/001-data-types.md
index 491898a..094fd87 100644
--- a/_docs/sql-ref/001-data-types.md
+++ b/_docs/sql-ref/001-data-types.md
@@ -2,76 +2,119 @@
 title: "Data Types"
 parent: "SQL Reference"
 ---
-You can use the following SQL data types in your Drill queries:
+You can use the following SQL data types in Drill queries:
+
+<table>
+  <tr>
+    <th>SQL Type</th>
+    <th>Drill Description</th>
+  </tr>
+  <tr>
+    <td>BIGINT</td>
+    <td>8-byte signed integer</td>
+  </tr>
+  <tr>
+    <td>BOOLEAN</td>
+    <td>TRUE (1) or FALSE (0)</td>
+  </tr>
+  <tr>
+    <td>DATE</td>
+    <td>Years months and days in the form in the form YYYY-­MM-­DD</td>
+  </tr>
+  <tr>
+    <td>DECIMAL</td>
+    <td>38-digit precision</td>
+  </tr>
+  <tr>
+    <td>FLOAT</td>
+    <td>4-byte single precision floating point number</td>
+  </tr>
+  <tr>
+    <td>DOUBLE</td>
+    <td>8-byte double precision floating point number</td>
+  </tr>
+  <tr>
+    <td>INTEGER</td>
+    <td>4-byte signed integer</td>
+  </tr>
+  <tr>
+    <td>INTERVAL</td>
+    <td>Integer fields representing a period of time depending on the type of interval</td>
+  </tr>
+  <tr>
+    <td>INTERVALDAY</td>
+    <td>Integer fields representing a day</td>
+  </tr>
+  <tr>
+    <td>INTERVALYEAR</td>
+    <td>Integer fields representing a year</td>
+  </tr>
+  <tr>
+    <td>SMALLINT</td>
+    <td>2-byte signed integer. Supported in Drill 0.9 and later.</td>
+  </tr>
+  <tr>
+    <td>TIME</td>
+    <td>Hours minutes seconds 24-hour basis</td>
+  </tr>
+  <tr>
+    <td>TIMESTAMP</td>
+    <td>Conventional UNIX Epoch timestamp.</td>
+  </tr>
+  <tr>
+    <td>VARCHAR</td>
+    <td>Character string variable length</td>
+  </tr>
+</table>
 
+## Complex Data Types
 
-#### Character
-
-  * VARCHAR/CHAR 
-
-#### Date/Time
-
-  * DATE
-  * INTERVAL
-    * Interval Year (stores year and month)
-    * Interval Day (stores day, hour, minute, seconds, and milliseconds)
-  * TIME
-  * TIMESTAMP
-
-Refer to [Supported Date/Time Data Type formats](/docs/supported-date-time-data-type-formats/).
+Complex and nested data structures in JSON and Parquet files are of map and array types.

 
-#### Integer
+* A map is a set of name/value pairs. 
+  A value in a map can be a scalar type, such as string or int, or a complex type, such as
an array or another map. 
+* An array is a repeated list of values.
+  A value in an array can be a scalar type, such as string or int, or an array can be a complex
type, such as a map or another array.
 
-  * BIGINT
-  * INT
-  * SMALLINT
+Drill reads/writes maps and arrays from/to JSON and Parquet files. In Drill, you do not cast
a map or array to another type. 
 
-#### Numeric
 
-  * DECIMAL
-  * FLOAT 
-  * DOUBLE PRECISION (FLOAT 8)
-  * REAL (FLOAT 4) 
+The following example shows a JSON map having scalar values:
 
-#### Boolean
+    phoneNumber: 
+    { 
+      areaCode: 622, 
+      number: 1567845
+    }
 
-Values are FALSE or TRUE.
+The following example shows a JSON map having an array as a value:
 
-## Complex Data Types
+    { citiesLived : 
+      [ 
+        { place : Los Angeles,       
+          yearsLived : [ 1989, 1993, 1998, 2002]     
+        } 
+      ] 
+    }
 
-Drill provides map and array data types to work with complex and nested data
-structures. For analysis of complex data, a more modern JSON-style approach to
-writing queries is more effective than using standard SQL functions.
+The following example shows a JSON array having scalar values:
 
-The following table provides descriptions and examples of the complex data
-types:
+    yearsLived: [1990, 1993, 1998, 2008]
 
-<table><tbody>
-  <tr><th>Data Type</th>
-  <th>Description</th>
-  <th>Example</th></tr>
-    <tr>
-      <td valign="top">Map</td>
-      <td valign="top">A map is a set of name/value pairs. </br>
-      A value in an map can be a scalar type, </br>
-      such as string or int, or a map can be a </br>
-      complex type, such as an array or another map.</td>
-      <td valign="top">Map with scalar type values:</br><code>&nbsp;&nbsp;&quot;phoneNumber&quot;:
{ &quot;areaCode&quot;: &quot;622&quot;, &quot;number&quot;: &quot;1567845&quot;}</code></br>Map
with complex type value:<code></br>&nbsp;&nbsp;{ &quot;citiesLived&quot;
: [ { &quot;place&quot; : &quot;Los Angeles&quot;,</br>        
-      &nbsp;&nbsp;&nbsp;&nbsp;&quot;yearsLived&quot; : [ &quot;1989&quot;,</br>
-      &nbsp;&nbsp;&nbsp;&nbsp;            &quot;1993&quot;,</br>
           
-      &nbsp;&nbsp;&nbsp;&nbsp;&quot;1998&quot;,</br>      
     
-      &nbsp;&nbsp;&nbsp;&nbsp;&quot;2002&quot;</br>
-      &nbsp;&nbsp;&nbsp;&nbsp;          ]</br>      
-      &nbsp;&nbsp;
-      &nbsp;} ] }</code></td>
-    </tr>
-    <tr>
-      <td valign="top">Array</td>
-      <td valign="top">An array is a repeated list of values. </br>
-      A value in an array can be a scalar type, </br>
-      such as string or int, or an array can be a</br> 
-      complex type, such as a map or another array.</td>
-      <td valign="top">Array with scalar values:</br><code>&nbsp;&nbsp;&quot;yearsLived&quot;:
[&quot;1990&quot;, &quot;1993&quot;, &quot;1998&quot;, &quot;2008&quot;]</code></br>Array
with complex type values:</br><code>&nbsp;&nbsp;&quot;children&quot;:</br>&nbsp;&nbsp;[
{ &quot;age&quot; : &quot;10&quot;, </br>   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&quot;gender&quot;
: &quot;Male&quot;,</br>    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&quot;name&quot;
: &quot;Earl&quot;</br> &nbsp;&nbsp;&nbsp;&nbsp; }, </br>
&nbsp;&nbsp;&nbsp;&nbsp;{ &quot;age&quot; : &quot;6&quot;,</br>
   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&quot;gender&quot;
: &quot;Male&quot;,</br>    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&quot;name&quot;
: &quot;Sam&quot;</br>  &nbsp;&nbsp;&nbsp;&nbsp;},</br>
 &nbsp;&nbsp;&nbsp;&nbsp;{ &quot;age&quot; : &quot;8&quot;,</br>
   &nbsp;&nbsp;&nbsp;&nbsp;&quot;gender&quot; : &quot;Male&quot;,
 </br>  &nbsp;&nbsp;&nbsp;&nbsp;&quot;name&quot; : &quot;Kit&quot;
</br> &nbsp;&nbsp;&nbsp;&nbsp
 ;}</br>&nbsp;&nbsp;]</code></td>
-    </tr>
-  </tbody></table>
+The following example shows a JSON array having complex type values:
 
+    children: 
+      [ 
+        { age : 10,
+          gender : Male,
+          name : Earl
+        },
+        { age : 6,
+          gender : Male,
+          name : Sam,
+        { age : 8,
+          gender : Male,
+          name : Kit
+        }
+      ]
+  

http://git-wip-us.apache.org/repos/asf/drill/blob/df1b7e5a/_docs/sql-ref/002-lexical-structure.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/002-lexical-structure.md b/_docs/sql-ref/002-lexical-structure.md
index 971bb9f..5115ab8 100644
--- a/_docs/sql-ref/002-lexical-structure.md
+++ b/_docs/sql-ref/002-lexical-structure.md
@@ -3,10 +3,13 @@ title: "Lexical Structure"
 parent: "SQL Reference"
 ---
 
-A SQL statement used in Drill can include the following parts:
+A SQL statement used in Drill can include one or more of the following parts:
 
-* [Storage plugin and workspace references](/docs/lexical-structure#storage-plugin-and-workspace-references)
-* Literal values
+* Clause, such as FROM
+* Command, such as SELECT 
+* Expression, a combination of one or more values, operators, and SQL functions that evaluates
to a value. For example, users.firstname is a period expression
+* Function, scalar and aggregate, such as sum
+* Literal value
 
   * [Boolean](/docs/lexical-structure#boolean)
   * [Identifier](/docs/lexical-structure#identifier)
@@ -14,11 +17,12 @@ A SQL statement used in Drill can include the following parts:
   * [Numeric constant](/docs/lexical-structure#numeric-constant)
   * [String](/docs/lexical-structure#string)
 
-* Expressions, such as t.metric * 3.1415927
-* Functions, such as count(*)
-* Names of commands and clauses, such as `SELECT * FROM myfile WHERE a > b`.
+* Operator, such as [NOT] IN, LIKE, and AND
+* Predicate, such as a > b in `SELECT * FROM myfile WHERE a > b`.
+* [Storage plugin and workspace reference](/docs/lexical-structure#storage-plugin-and-workspace-references)
+* Whitespace
 
-The upper/lowercase sensitivity of the parts differ.
+The upper/lowercase sensitivity of the parts differs.
 
 ## Case-sensitivity
 
@@ -30,7 +34,7 @@ The sys.options table name and values are case-sensitive. The following
query wo
 
     SELECT * FROM sys.options where NAME like '%parquet%';
 
-When using the ALTER command, specify the name lower case. For example:
+When using the ALTER command, specify the name in lower case. For example:
 
     ALTER SESSION  set `store.parquet.compression`='snappy';
 
@@ -40,7 +44,7 @@ Storage plugin and workspace names are case-sensitive. The case of the name
used
 
     SELECT * FROM dfs.`/Users/drilluser/ticket_sales.json`;
 
-## Literal values
+## Literal Values
 
 This section describes how to construct literals.
 
@@ -53,7 +57,7 @@ An identifier is a letter followed by any sequence of letters, digits, or
the un
 * Keywords
 * Identifiers that SQL cannot parse. 
 
-For example, you enclose the SQL keywords date and time in back ticks in the example in section,
["Example: Read JSON, Write Parquet"](/docs/parquet-format#example-read-json-write-parquet):
+For example, enclose the SQL keywords date and time in back ticks when referring to column
names, but not when referring to data types:
 
     CREATE TABLE dfs.tmp.sampleparquet AS 
     (SELECT trans_id, 
@@ -65,7 +69,7 @@ For example, you enclose the SQL keywords date and time in back ticks in
the exa
 
 Table and column names are case-insensitive. Use back ticks to enclose names that contain
special characters. Special characters are those other than the 52 Latin alphabet characters.
For example, space and @ are special characters. 
 
-The following example shows the keyword Year and the column alias "Customer Number" enclosed
in back ticks. The column alias contains the special space character.
+The following example shows the keyword Year enclosed in back ticks. Because the column alias
contains the special space character, also enclose the alias in back ticks, as shown in the
following example:
 
     SELECT extract(year from transdate) AS `Year`, t.user_info.cust_id AS `Customer Number`
FROM dfs.tmp.`sampleparquet` t;
 
@@ -130,7 +134,7 @@ Strings are characters enclosed in single quotation marks. To use a single
quota
     +------------+
     2 rows selected (0.053 seconds)
 
-To refer to the string Martha's Vineyard in a query, use single quotation marks to enclose
the string. and escape the apostophe using a single quotation mark:
+To refer to the string Martha's Vineyard in a query, use single quotation marks to enclose
the string and escape the apostophe using a single quotation mark:
 
     SELECT * FROM dfs.`/Users/drilluser/vitalstat.json` t 
     WHERE t.source = 'Martha''s Vineyard';

http://git-wip-us.apache.org/repos/asf/drill/blob/df1b7e5a/_docs/sql-ref/004-functions.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/004-functions.md b/_docs/sql-ref/004-functions.md
index c405a58..03cd6ae 100644
--- a/_docs/sql-ref/004-functions.md
+++ b/_docs/sql-ref/004-functions.md
@@ -40,7 +40,7 @@ Function| Return Type
 char_length(string) or character_length(string)| int  
 concat(str "any" [, str "any" [, ...] ])| text
 convert_from(string text, src_encoding name)| text 
-convert_to(string text, dest_encoding name)| 
+convert_to(string text, dest_encoding name)| byte array
 initcap(string)| text
 left(str text, n int)| text
 length(string)| int

http://git-wip-us.apache.org/repos/asf/drill/blob/df1b7e5a/_docs/sql-ref/data-types/002-disparate-data-types.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/data-types/002-disparate-data-types.md b/_docs/sql-ref/data-types/002-disparate-data-types.md
new file mode 100644
index 0000000..be9dbc4
--- /dev/null
+++ b/_docs/sql-ref/data-types/002-disparate-data-types.md
@@ -0,0 +1,77 @@
+---
+title: "Handling Disparate Data Types"
+parent: "Data Types"
+---
+[Previous](/docs/supported-date-time-data-type-formats)<code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code>[Back
to Table of Contents](/docs)<code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code>[Next](/docs/lexical-structure)
+
+The file format of the data and planned queries determine the casting or converting required,
if any. When Drill reads schema-less data into SQL tables for querying, you might need to
cast one data type to another explicitly. In some cases, Drill converts schema-less data to
typed data implicitly. In this case, you do not need to cast. Drill does not implicitly cast
HBase binary data. You use convert_to and convert_from functions to work with HBase data in
Drill.
+
+With respect to data types, Drill treats data from these sources as follows:
+
+* HBase
+  No implicit casting to SQL types. Convert data to appropriate types as shown in ["Querying
HBase."](/docs/querying-hbase/)
+* Hive
+  Implicitly casts Hive types to SQL types as shown in the Hive [type mapping example](/docs/hive-to-drill-data-type-mapping#type-mapping-example)
+* JSON
+  Implicitly casts JSON data to SQL types as shown in the [SQL and JSON type mapping table](/docs/json-data-model#data-type-mapping)
of the JSON Data Model documentation.
+* MapR-DB
+  Implicitly casts MapR-DB data to SQL types when you use the maprdb format for reading MapR-DB
data. The dfs storage plugin defines the format when you install Drill from the mapr-drill
package on a MapR node.
+* Parquet
+  Implicitly casts JSON data to the SQL types shown in [SQL Data Types to Parquet](/docs/parquet-format/sql-data-types-to-parquet).

+* Text: CSV, TSV, and other text
+  Implicitly casts all textual data to varchar. 
+
+Drill supports a number of functions to cast and convert compatible data types:
+
+* cast
+  Casts textual data from one data type to another.
+* convert_to and convert_from
+  Converts binary data from one data type to another.
+* to_char
+  Converts a timestamp, interval, integer, real/double precision, or decimal to a string.
+* to_date
+  Converts a string to a date.
+* to_number
+  Converts a string to a decimal.
+* to_timestamp
+  Converts a string to a timestamp.
+
+### Using Cast
+
+Embed a cast function in a query using this syntax:
+
+    cast <expression> AS <data type> 
+
+* expression
+  A entity that has single data value, such as a column name, of the data type you want to
cast to a different type
+* data type
+  The target data type, such as INTEGER or DATE
+
+Example: Inspect integer data and cast data to a decimal
+
+    SELECT c_row, c_int FROM mydata WHERE c_row = 9;
+    c_row | c_int
+    ------+------------
+        9 | -2147483648
+    (1 row)
+
+   SELECT c_row, cast(c_int as decimal(28,8)) FROM my_data WHERE c_row = 9;
+    c_row | c_int
+    ------+---------------------
+    9     | -2147483648.00000000
+    (1 row)
+
+If the SELECT statement includes a WHERE clause that compares a column of an unknown data
type, cast both the value of the column and the comparison value in the WHERE clause. For
example:
+
+    SELECT c_row, cast(c_int as decimal(28,8)) FROM mydata WHERE cast(c_int as decimal(28,8))
> -3.0
+
+Although you can use cast to handle binary data, convert_to and convert_from are recommended
for use with binary data.
+
+### Using convert_to and convert_from
+
+To query HBase data in Drill, convert every column of an HBase table to/from byte arrays
from/to an [SQL data type](/docs/data-types/) that Drill supports when writing/reading data.
For examples of how to use these functions, see ["Convert and Cast Functions".](/docs/sql-functions#convert-and-cast-functions).
+
+## Handling Textual Data
+In a textual file, such as CSV, Drill interprets every field as a varchar, as previously
mentioned. In addition to using the cast function, you can also use [to_char](link), [to_date](line),
[to_number](link), and [to_timestamp](link). If the SELECT statement includes a WHERE clause
that compares a column of an unknown data type, cast both the value of the column and the
comparison value in the WHERE clause.
+
+

http://git-wip-us.apache.org/repos/asf/drill/blob/df1b7e5a/_docs/tutorial/002-get2kno-sb.md
----------------------------------------------------------------------
diff --git a/_docs/tutorial/002-get2kno-sb.md b/_docs/tutorial/002-get2kno-sb.md
index 3850daf..ee2b5dd 100644
--- a/_docs/tutorial/002-get2kno-sb.md
+++ b/_docs/tutorial/002-get2kno-sb.md
@@ -24,7 +24,7 @@ In this tutorial you query a number of data sets, including Hive and HBase,
and
 
 ## Storage Plugin Overview
 This section describes storage plugins included in the sandbox. For general information about
Drill storage plugins, see ["Connect to a Data Source"](/docs/connect-to-data-sources).
-Take a look at the pre-configured storage plugins for the sandbox by opening the Storage
tab in the Drill Web UI. Launch a web browser and go to: `http://<IP address of the sandbox>:8047/storage`.
For example:
+Take a look at the pre-configured storage plugins for the sandbox by opening the Storage
tab in the Drill Web UI. Launch a web browser and go to: `http://<IP address>:8047/storage`.
For example:
 
     http://localhost:8046/storage
 


Mime
View raw message