drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject drill git commit: DRILL-2429
Date Tue, 17 Mar 2015 01:09:27 GMT
Repository: drill
Updated Branches:
  refs/heads/gh-pages-master df1b7e5a9 -> feaa579e4


DRILL-2429


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/feaa579e
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/feaa579e
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/feaa579e

Branch: refs/heads/gh-pages-master
Commit: feaa579e43f5c9704deb9a406e073df36f8de191
Parents: df1b7e5
Author: Kristine Hahn <khahn@maprtech.com>
Authored: Mon Mar 16 17:42:09 2015 -0700
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Mon Mar 16 18:08:31 2015 -0700

----------------------------------------------------------------------
 _docs/005-connect.md                            |   2 +-
 _docs/008-sql-ref.md                            |   4 +-
 _docs/connect/006-reg-hive.md                   |   2 +-
 _docs/data-sources/001-hive-types.md            |  34 +-
 _docs/data-sources/004-json-ref.md              |  12 +-
 _docs/sql-ref/001-data-types.md                 |  70 ++--
 _docs/sql-ref/data-types/001-date.md            | 206 ++++++------
 .../data-types/002-disparate-data-types.md      | 316 ++++++++++++++++---
 8 files changed, 450 insertions(+), 196 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/feaa579e/_docs/005-connect.md
----------------------------------------------------------------------
diff --git a/_docs/005-connect.md b/_docs/005-connect.md
index 69bb407..53de0a5 100644
--- a/_docs/005-connect.md
+++ b/_docs/005-connect.md
@@ -4,7 +4,7 @@ title: "Connect to a Data Source"
 A storage plugin is an interface for connecting to a data source to read and write data.
Apache Drill connects to a data source, such as a file on the file system or a Hive metastore,
through a storage plugin. When you execute a query, Drill gets the plugin name you provide
in FROM clause of your query or from the default you specify in the USE.<plugin name>
command that precedes the query.
 . 
 
-In addition to the connection string, the storage plugin configures the workspace and file
formats for reading and writing data, as described in subsequent sections. 
+In addition to the connection string, the storage plugin configures the workspace and file
formats for reading data, as described in subsequent sections. 
 
 ## Storage Plugins Internals
 The following image represents the storage plugin layer between Drill and a

http://git-wip-us.apache.org/repos/asf/drill/blob/feaa579e/_docs/008-sql-ref.md
----------------------------------------------------------------------
diff --git a/_docs/008-sql-ref.md b/_docs/008-sql-ref.md
index 81bcbab..61dd988 100644
--- a/_docs/008-sql-ref.md
+++ b/_docs/008-sql-ref.md
@@ -5,10 +5,10 @@ Drill supports the ANSI standard for SQL. You can use SQL to query your
Hive,
 HBase, and distributed file system data sources. Drill can discover the form
 of the data when you submit a query. You can query text files and nested data
 formats, such as JSON and Parquet. Drill provides special operators and
-functions that you can use to _drill down _into nested data formats.
+functions that you can use to drill down into nested data formats.
 
 Drill queries do not require information about the data that you are trying to
 access, regardless of its source system or its schema and data types. The
-sweet spot for Apache Drill is a SQL query workload against "complex data":
+sweet spot for Apache Drill is a SQL query workload against *complex data*:
 data made up of various types of records and fields, rather than data in a
 recognizable relational form (discrete rows and columns).

http://git-wip-us.apache.org/repos/asf/drill/blob/feaa579e/_docs/connect/006-reg-hive.md
----------------------------------------------------------------------
diff --git a/_docs/connect/006-reg-hive.md b/_docs/connect/006-reg-hive.md
index 03a252a..cf9b72a 100644
--- a/_docs/connect/006-reg-hive.md
+++ b/_docs/connect/006-reg-hive.md
@@ -8,7 +8,7 @@ storage plugin instance for a Hive data source, provide a unique name for
the
 instance, and identify the type as “`hive`”. You must also provide the
 metastore connection information.
 
-Drill supports Hive 0.13. To access Hive tables
+Drill supports Hive 1.0. To access Hive tables
 using custom SerDes or InputFormat/OutputFormat, all nodes running Drillbits
 must have the SerDes or InputFormat/OutputFormat `JAR` files in the 
 `<drill_installation_directory>/jars/3rdparty` folder.

http://git-wip-us.apache.org/repos/asf/drill/blob/feaa579e/_docs/data-sources/001-hive-types.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources/001-hive-types.md b/_docs/data-sources/001-hive-types.md
index a4721bf..8482133 100644
--- a/_docs/data-sources/001-hive-types.md
+++ b/_docs/data-sources/001-hive-types.md
@@ -7,15 +7,15 @@ Using Drill you can read tables created in Hive that use data types compatible
w
   <tr>
     <th>Supported SQL Type</th>
     <th>Hive Type</th>
-    <th>Drill Description</th>
+    <th>Description</th>
   </tr>
   <tr>
-    <td>BIGINT</td>
+    <td>bigint</td>
     <td>BIGINT</td>
     <td>8-byte signed integer</td>
   </tr>
   <tr>
-    <td>BOOLEAN</td>
+    <td>boolean</td>
     <td>BOOLEAN</td>
     <td>TRUE (1) or FALSE (0)</td>
   </tr>
@@ -25,62 +25,62 @@ Using Drill you can read tables created in Hive that use data types compatible
w
     <td>Same as Varchar but having a fixed-length max 255</td>
   </tr>
   <tr>
-    <td>DATE</td>
+    <td>date</td>
     <td>DATE</td>
     <td>Years months and days in the form in the form YYYY-­MM-­DD</td>
   </tr>
   <tr>
-    <td>DECIMAL</td>
+    <td>decimal</td>
     <td>DECIMAL</td>
     <td>38-digit precision</td>
   </tr>
   <tr>
-    <td>FLOAT</td>
+    <td>float</td>
     <td>FLOAT</td>
     <td>4-byte single precision floating point number</td>
   </tr>
   <tr>
-    <td>DOUBLE</td>
+    <td>double</td>
     <td>DOUBLE</td>
     <td>8-byte double precision floating point number</td>
   </tr>
   <tr>
-    <td>INTEGER</td>
+    <td>integer</td>
     <td>INT</td>
     <td>4-byte signed integer</td>
   </tr>
   <tr>
-    <td>INTERVAL</td>
+    <td>interval</td>
     <td>N/A</td>
     <td>Integer fields representing a period of time depending on the type of interval</td>
   </tr>
   <tr>
-    <td>INTERVALDAY</td>
+    <td>intervalday</td>
     <td>N/A</td>
     <td>Integer fields representing a day</td>
   </tr>
   <tr>
-    <td>INTERVALYEAR</td>
+    <td>intervalyear</td>
     <td>N/A</td>
     <td>Integer fields representing a year</td>
   </tr>
   <tr>
-    <td>SMALLINT</td>
+    <td>smallint</td>
     <td>SMALLINT</td>
     <td>2-byte signed integer</td>
   </tr>
   <tr>
-    <td>TIME</td>
+    <td>time</td>
     <td>N/A</td>
     <td>Hours minutes seconds 24-hour basis</td>
   </tr>
   <tr>
-    <td>TIMESTAMP</td>
     <td>N/A</td>
+    <td>TIMESTAMP</td>
     <td>Conventional UNIX Epoch timestamp.</td>
   </tr>
   <tr>
-    <td>None</td>
+    <td>timestamp</td>
     <td>TIMESTAMP</td>
     <td>JDBC timestamp in yyyy-mm-dd hh:mm:ss format</td>
   </tr>
@@ -91,14 +91,14 @@ Using Drill you can read tables created in Hive that use data types compatible
w
   </tr>
   
   <tr>
-    <td>VARCHAR</td>
+    <td>varchar</td>
     <td>VARCHAR</td>
     <td>Character string variable length</td>
   </tr>
 </table>
 
 ## Unsupported Types
-The following Hive types are not supported:
+Drill does not support the following Hive types:
 
 * LIST
 * MAP

http://git-wip-us.apache.org/repos/asf/drill/blob/feaa579e/_docs/data-sources/004-json-ref.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources/004-json-ref.md b/_docs/data-sources/004-json-ref.md
index 65b5b7c..bf4c5d7 100644
--- a/_docs/data-sources/004-json-ref.md
+++ b/_docs/data-sources/004-json-ref.md
@@ -28,7 +28,7 @@ JSON data consists of the following types:
 * Value: a string, number, true, false, null
 * Whitespace: used between tokens
 
-The following table shows SQL and JSON data mapping: 
+The following table shows SQL-JSON data type mapping, assuming you use the default `all_text_mode`
option setting, false: 
 
 <table>
   <tr>
@@ -37,22 +37,22 @@ The following table shows SQL and JSON data mapping:
     <th>Description</th>
   </tr>
   <tr>
-    <td>BOOLEAN</td>
+    <td>boolean</td>
     <td>Boolean</td>
     <td>True or false</td>
   </tr>
   <tr>
-    <td>BIGINT</td>
+    <td>bigint</td>
     <td>Numeric</td>
     <td>Number having no decimal point in JSON, 8-byte signed integer in Drill</td>
   </tr>
    <tr>
-    <td>DOUBLE</td>
+    <td>double</td>
     <td>Numeric</td>
     <td>Number having a decimal point in JSON, 8-byte double precision floating point
number in Drill</td>
   </tr>
   <tr>
-    <td>VARCHAR</td>
+    <td>varchar</td>
     <td>String</td>
     <td>Character string of variable length</td>
   </tr>
@@ -61,7 +61,7 @@ The following table shows SQL and JSON data mapping:
 JSON does not enforce types or distinguish between integers and floating point values. When
reading numerical values from a JSON file, Drill distinguishes integers from floating point
numbers by the presence or lack of a decimal point. If some numbers in a JSON map or array
appear with and without a decimal point, such as 0 and 0.0, Drill throws a schema change error.
 
 ### Handling Type Differences
-Use all text mode to prevent the schema change error described in the previous section. Set
the `store.json.all_text_mode` property to true.
+Use the all text mode to prevent the schema change error described in the previous section.
Set the `store.json.all_text_mode` property to true.
 
     ALTER SYSTEM SET `store.json.all_text_mode` = true;
 

http://git-wip-us.apache.org/repos/asf/drill/blob/feaa579e/_docs/sql-ref/001-data-types.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/001-data-types.md b/_docs/sql-ref/001-data-types.md
index 094fd87..bcb506c 100644
--- a/_docs/sql-ref/001-data-types.md
+++ b/_docs/sql-ref/001-data-types.md
@@ -1,8 +1,8 @@
 ---
-title: "Data Types"
+title: "Data Type Casting"
 parent: "SQL Reference"
 ---
-You can use the following SQL data types in Drill queries:
+The following table lists the type arguments you can use with the cast function:
 
 <table>
   <tr>
@@ -15,69 +15,99 @@ You can use the following SQL data types in Drill queries:
   </tr>
   <tr>
     <td>BOOLEAN</td>
-    <td>TRUE (1) or FALSE (0)</td>
+    <td>True or false</td>
   </tr>
   <tr>
     <td>DATE</td>
-    <td>Years months and days in the form in the form YYYY-­MM-­DD</td>
+    <td>Years, months, and days in YYYY-­MM-­DD format</td>
   </tr>
   <tr>
-    <td>DECIMAL</td>
-    <td>38-digit precision</td>
+    <td>NUMERIC, DECIMAL, or DEC(p,s)</td>
+    <td>38-digit precision number. Same as numeric(p,s) where precision is p, and scale
is s. Example: decimal(6,2) has 4 digits before the decimal point and 2 digits after the decimal
point.</td>
   </tr>
   <tr>
     <td>FLOAT</td>
     <td>4-byte single precision floating point number</td>
   </tr>
   <tr>
-    <td>DOUBLE</td>
-    <td>8-byte double precision floating point number</td>
+    <td>DOUBLE, DOUBLE PRECISION</td>
+    <td>8-byte double precision floating point number. </td>
   </tr>
   <tr>
-    <td>INTEGER</td>
+    <td>INTEGER, INT</td>
     <td>4-byte signed integer</td>
   </tr>
   <tr>
     <td>INTERVAL</td>
-    <td>Integer fields representing a period of time depending on the type of interval</td>
+    <td>Integer fields representing a period of time in years, months, day,s hours,
minutes, seconds and optional milliseconds using ISO 8601 format.</td>
   </tr>
   <tr>
     <td>INTERVALDAY</td>
-    <td>Integer fields representing a day</td>
+    <td>A simple version of the interval type expressing a period of time in days,
hours, minutes, and seconds only.</td>
   </tr>
   <tr>
     <td>INTERVALYEAR</td>
-    <td>Integer fields representing a year</td>
+    <td>A simple version of interval representing a period of time in years and months
only.</td>
   </tr>
   <tr>
     <td>SMALLINT</td>
-    <td>2-byte signed integer. Supported in Drill 0.9 and later.</td>
+    <td>2-byte signed integer. Supported in Drill 0.9 and later./td>
   </tr>
   <tr>
     <td>TIME</td>
-    <td>Hours minutes seconds 24-hour basis</td>
+    <td>Hours, minutes, seconds in the form HH:mm:ss, 24-hour based</td>
   </tr>
   <tr>
     <td>TIMESTAMP</td>
-    <td>Conventional UNIX Epoch timestamp.</td>
+    <td>JDBC timestamp in year, month, date hour, minute, second, and optional milliseconds
format: yyyy-MM-dd HH:mm:ss.SSS</td>
   </tr>
   <tr>
-    <td>VARCHAR</td>
-    <td>Character string variable length</td>
+    <td>CHARACTER VARYING, CHARACTER, CHAR, or VARCHAR</td>
+    <td>Character string variable length. </td>
   </tr>
 </table>
 
+DATE, TIME, and TIMESTAMP store values in Coordinated Universal Time (UTC). Currently, Drill
does not support casting a TIMESTAMP with time zone, but you can use the TO_TIMESTAMP function
(link to example) in a query to use time stamp data having a time zone.
+
+## Compatibility with Data Sources
+
+The following sections describe the data type mapping between Drill and supported data sources.
+
+* HBase  
+  None. You need to convert data as shown in ["Querying HBase."](/docs/querying-hbase/).
+* Hive  
+  ["Hive-to-Drill Data Type Mapping"](/docs/hive-to-drill-data-type-mapping).
+* JSON  
+  [SQL-JSON data type mapping](/docs/json-data-model#data-type-mapping) or to varchar in
all text mode
+* MapR-DB  
+  [The maprdb format](/docs/mapr-db-format) for reading (only).
+* Parquet  
+  [SQL Data Types to Parquet](/docs/parquet-format/sql-data-types-to-parquet). 
+* Text: CSV, TSV, and other text  
+  Implicitly casts all textual data to VARCHAR. 
+
+Depending on the data format, you might need to [cast or convert](/docs/handling-disparate-data-types)
data types to/from these SQL types when Drill reads/writes data.
+
+## Guidelines for Using Float and Double
+
+FLOAT and DOUBLE yield approximate results. These are variable-precision numeric types. Drill
does not cast/convert all values precisely to the internal format, but instead stores approximations.
Slight differences can occur in the value stored and retrieved. The following guidelines are
recommended:
+
+* For conversions involving monetary calculations, for example, that require precise results
use the decimal type instead of float or double.
+* For complex calculations or mission-critical applications, especially those involving infinity
and underflow situations, carefully consider the limitations of type casting that involves
FLOAT or DOUBLE.
+* Equality comparisons between floating-point values can produce unexpected results.
+
+Values of FLOAT and DOUBLE that are less than the lowest value in the range (more negative)
cause an error. Rounding can occur if the precision of an input number is too high. 
+
 ## Complex Data Types
 
-Complex and nested data structures in JSON and Parquet files are of map and array types.

+Drill extends SQL to supoort complex and nested data structures in JSON and Parquet files.
Drill reads/writes maps and arrays from/to JSON and Parquet files.  
 
 * A map is a set of name/value pairs. 
   A value in a map can be a scalar type, such as string or int, or a complex type, such as
an array or another map. 
 * An array is a repeated list of values.
   A value in an array can be a scalar type, such as string or int, or an array can be a complex
type, such as a map or another array.
 
-Drill reads/writes maps and arrays from/to JSON and Parquet files. In Drill, you do not cast
a map or array to another type. 
-
+In Drill, you do not cast a map or array to another type. 
 
 The following example shows a JSON map having scalar values:
 

http://git-wip-us.apache.org/repos/asf/drill/blob/feaa579e/_docs/sql-ref/data-types/001-date.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/data-types/001-date.md b/_docs/sql-ref/data-types/001-date.md
index ef20bc2..ff87acc 100644
--- a/_docs/sql-ref/data-types/001-date.md
+++ b/_docs/sql-ref/data-types/001-date.md
@@ -1,148 +1,128 @@
 ---
-title: "Supported Date/Time Data Type Formats"
-parent: "Data Types"
+title: "Date/Time Formats"
+parent: "Data Type Casting"
 ---
-You must use supported `date` and `time` formats when you `SELECT` date and
-time literals or when you `CAST()` from `VARCHAR `to `date` and `time` data
-types. Apache Drill currently supports specific formats for the following
-`date` and `time` data types:
+Using familiar date and time formats, listed in the [SQL data types table](/docs/data-types),
you can construct query date and time data. You need to cast textual data to date and time
data types. The format of date, time, and timestamp text in a textual data source needs to
match the SQL query format for successful casting. 
 
-  * Date
-  * Timestamp
-  * Time
-  * Interval
-    * Interval Year
-    * Interval Day
-  * Literal
+## Date, Time, and Timestamp
 
-The following query provides an example of how to `SELECT` a few of the
-supported date and time formats as literals:
+Before running a query, you can check the formatting of your dates and times. First, create
a dummy JSON file to use in the FROM clause for testing queries as shown in the following
examples. 
+    {"dummy" : "data"}. 
 
-    select date '2008-2-23', timestamp '2008-1-23 14:24:23', time '10:20:30' from dfs.`/tmp/input.json`;
+Next, use the following literals in a SELECT statement. 
 
-The following query provides an example where `VARCHAR` data in a file is
-`CAST()` to supported `date` and `time` formats:
+* `date`
+* `time`
+* `timestamp`
 
-    select cast(col_A as date), cast(col_B as timestamp), cast(col_C as time) from dfs.`/tmp/dates.json`;
+You can also use `interval` as a literal in a SELECT statement. This usage is covered later.
 
-`Date`, `timestamp`, and `time` data types store values in `UTC`. Currently,
-Apache Drill does not support `timestamp` with time zone.
+    SELECT date '2010-2-15' FROM dfs.`/Users/drilluser/apache-drill-0.8.0/dummy.json`;
+    +------------+
+    |   EXPR$0   |
+    +------------+
+    | 2010-02-15 |
+    +------------+
+    1 row selected (0.083 seconds)
 
-## Date
+    SELECT time '15:20:30' from dfs.`/Users/drilluser/apache-drill-0.8.0/dummy.json`;
+    +------------+
+    |   EXPR$0   |
+    +------------+
+    | 15:20:30   |
+    +------------+
+    1 row selected (0.067 seconds)
 
-Drill supports the `date` data type in the following format:
+    SELECT timestamp '2015-03-11 6:50:08' FROM dfs.`/Users/drilluser/apache-drill-0.8.0/dummy.json`;
+    +------------+
+    |   EXPR$0   |
+    +------------+
+    | 2015-03-11 06:50:08.0 |
+    +------------+
+    1 row selected (0.071 seconds)
 
-    YYYY-MM-DD (year-month-date)
+## INTERVAL Type
 
-The following table provides some examples for the `date` data type:
+The INTERVAL type represents a period of time. Use ISO 8601 syntax to format a value of this
type:
 
-  | Use | Example |
-  | --- | ------- |
-  |Literal| `select date ‘2008-2-23’ from dfs.`/tmp/input.json`;`|
-  |`JSON` input | `{"date_col" : "2008-2-23"} 
-  | `CAST` from `VARCHAR`| `` select CAST(date_col as date) as CAST_DATE from dfs.`/tmp/input.json`;
``|
+    P [qty] Y [qty] M [qty] D T [qty] H [qty] M [qty] S
 
-## Timestamp
-
-Drill supports the `timestamp` data type in the following format:
-
-    yyyy-MM-dd HH:mm:ss.SSS (year-month-date hour:minute:sec.milliseconds)
-
-The following table provides some examples for the `timestamp` data type:
+    P [qty] D T [qty] H [qty] M [qty] S
 
-<table>
- <tbody>
-  <tr>
-   <th>Use</th>
-   <th>CAST Example</th>
-  </tr>
-  <tr>
-   <td valign="top">Literal</td>
-   <td valign="top"><code><span style="color: rgb(0,0,0);">select timestamp
‘2008-2-23 10:20:30.345’, timestamp ‘2008-2-23 10:20:30’ from dfs.`/tmp/input.json`;</span></code>
-   </td></tr>
-  <tr>
-   <td colspan="1" valign="top"><code>JSON</code> Input</td>
-   <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">{“timestamp_col”:
“2008-2-23 15:20:30.345”}<br /></span><span style="color: rgb(0,0,0);">{“timestamp_col”:
“2008-2-23 10:20:30”}</span></code><span style="color: rgb(0,0,0);">The
fractional millisecond component is optional.</span></td>
-   </tr>
-   <tr>
-    <td colspan="1" valign="top"><code>CAST</code> from <code>VARCHAR</code></td>
-    <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select
cast(timestamp_col as timestamp) from dfs.`/tmp/input.json`; </span></code></td>
-   </tr>
-  </tbody>
- </table>
+    P [qty] Y [qty] M
 
-## Time
+where:
 
-Drill supports the `time` data type in the following format:
+* P (Period) marks the beginning of a period of time.
+* Y follows a number of years.
+* M follows a number of months.
+* D follows a number of days.
+* H follows a number of hours 0-24.
+* M follows a number of minutes.
+* S follows a number of seconds and optional milliseconds to the right of a decimal point
 
-    HH:mm:ss.SSS (hour:minute:sec.milliseconds)
 
-The following table provides some examples for the `time` data type:
+INTERVALYEAR (Year, Month) and INTERVALDAY (Day, Hours, Minutes, Seconds, Milliseconds) are
a simpler version of INTERVAL with a subset of the fields.  You do not need to specify all
fields.
 
-<table><tbody><tr>
-  <th>Use</th>
-  <th>Example</th>
-  </tr>
-  <tr>
-   <td valign="top">Literal</td>
-   <td valign="top"><code><span style="color: rgb(0,0,0);">select time
‘15:20:30’, time ‘10:20:30.123’ from dfs.`/tmp/input.json`;</span></code></td>
-  </tr>
-  <tr>
-  <td colspan="1" valign="top"><code>JSON</code> Input</td>
-  <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">{“time_col”
: “10:20:30.999”}<br /></span><span style="color: rgb(0,0,0);">{“time_col”:
“10:20:30”}</span></code></td>
- </tr>
- <tr>
-  <td colspan="1" valign="top"><code>CAST</code> from <code>VARCHAR</code></td>
-  <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select
cast(time_col as time) from dfs.`/tmp/input.json`;</span></code></td>
-</tr></tbody>
-</table>
+The format of INTERVAL data in the data source differs from the query format. 
 
-## Interval
+You can run the dummy query described earlier to check the formatting of the fields. The
input to the following SELECT statements show how to format INTERVAL data in the query. The
output shows how to format the data in the data source.
 
-Drill supports the `interval year` and `interval day` data types.
+    SELECT INTERVAL '1 10:20:30.123' day to second FROM dfs.`/Users/drilluser/apache-drill-0.8.0/dummy.json`;
+    +------------+
+    |   EXPR$0   |
+    +------------+
+    | P1DT37230.123S |
+    +------------+
+    1 row selected (0.054 seconds)
 
-### Interval Year
+    SELECT INTERVAL '1-2' year to month FROM dfs.`/Users/khahn/drill/apache-drill-0.8.0-SNAPSHOT/dummy.json`;
+    +------------+
+    |   EXPR$0   |
+    +------------+
+    | P1Y2M      |
+    +------------+
+    1 row selected (0.927 seconds)
 
-The `interval year` data type stores time duration in years and months. Drill
-supports the `interval` data type in the following format:
+    SELECT INTERVAL '1' year FROM dfs.`/Users/khahn/drill/apache-drill-0.8.0-SNAPSHOT/dummy.json`;
+    +------------+
+    |   EXPR$0   |
+    +------------+
+    | P1Y        |
+    +------------+
+    1 row selected (0.088 seconds)
 
-    P [qty] Y [qty] M
+    SELECT INTERVAL '13' month FROM dfs.`/Users/khahn/drill/apache-drill-0.8.0-SNAPSHOT/dummy.json`;
+    +------------+
+    |   EXPR$0   |
+    +------------+
+    | P1Y1M      |
+    +------------+
+    1 row selected (0.076 seconds)
 
-The following table provides examples for `interval year` data type:
-
-<table ><tbody><tr>
-<th>Use</th>
-<th>Example</th></tr>
-  <tr>
-    <td valign="top">Literals</td>
-    <td valign="top"><code><span style="color: rgb(0,0,0);">select interval
‘1-2’ year to month from dfs.`/tmp/input.json`;<br /></span><span style="color:
rgb(0,0,0);">select interval ‘1’ year from dfs.`/tmp/input.json`;<br /></span><span
style="color: rgb(0,0,0);">select interval '13’ month from dfs.`/tmp/input.json`;</span></code></td></tr><tr>
-    <td colspan="1" valign="top"><code>JSON</code> Input</td>
-    <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">{“col”
: “P1Y2M”}<br /></span><span style="color: rgb(0,0,0);">{“col” :
“P-1Y2M”}<br /></span><span style="color: rgb(0,0,0);">{“col” :
“P-1Y-2M”}<br /></span><span style="color: rgb(0,0,0);">{“col”:
“P10M”}<br /></span><span style="color: rgb(0,0,0);">{“col”: “P5Y”}</span></code></td>
-  </tr>
-  <tr>
-    <td colspan="1" valign="top"><code>CAST</code> from <code>VARCHAR</code></td>
-    <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select
cast(col as interval year) from dfs.`/tmp/input.json`;</span></code></td>
-  </tr>
-  </tbody></table> 
-
-### Interval Day
-
-The `interval day` data type stores time duration in days, hours, minutes, and
-seconds. You do not need to specify all fields in a given interval. Drill
-supports the `interval day` data type in the following format:
+To cast INTERVAL data use the following syntax:
 
-    P [qty] D T [qty] H [qty] M [qty] S
+    CAST (column_name AS INTERVAL)
+    CAST (column_name AS INTERVAL DAY)
+    CAST (column_name AS INTERVAL YEAR)
 
-The following table provides examples for `interval day` data type:
+## Interval Example
+A JSON file contains the following objects:
 
-<table ><tbody><tr><th >Use</th><th >Example</th></tr><tr><td
valign="top">Literal</td><td valign="top"><code><span style="color:
rgb(0,0,0);">select interval '1 10:20:30.123' day to second from dfs.`/tmp/input.json`;<br
/></span><span style="color: rgb(0,0,0);">select interval '1 10' day to hour
from dfs.`/tmp/input.json`;<br /></span><span style="color: rgb(0,0,0);">select
interval '10' day  from dfs.`/tmp/input.json`;<br /></span><span style="color:
rgb(0,0,0);">select interval '10' hour  from dfs.`/tmp/input.json`;</span></code><code><span
style="color: rgb(0,0,0);">select interval '10.999' second  from dfs.`/tmp/input.json`;</span></code></td></tr><tr><td
colspan="1" valign="top"><code>JSON</code> Input</td><td colspan="1"
valign="top"><code><span style="color: rgb(0,0,0);">{&quot;col&quot;
: &quot;P1DT10H20M30S&quot;}<br /></span><span style="color: rgb(0,0,0);">{&quot;col&quot;
: &quot;P1DT10H20M30.123S&quot;}<br /></span><span style="color: rgb(0,0,0);">{&quot;col&quot;
: &q
 uot;P1D&quot;}<br /></span><span style="color: rgb(0,0,0);">{&quot;col&quot;
: &quot;PT10H&quot;}<br /></span><span style="color: rgb(0,0,0);">{&quot;col&quot;
: &quot;PT10.10S&quot;}<br /></span><span style="color: rgb(0,0,0);">{&quot;col&quot;
: &quot;PT20S&quot;}<br /></span><span style="color: rgb(0,0,0);">{&quot;col&quot;
: &quot;PT10H10S&quot;}</span></code></td></tr><tr><td
colspan="1" valign="top"><code>CAST</code> from <code>VARCHAR</code></td><td
colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select cast(col
as interval day) from dfs.`/tmp/input.json`;</span></code></td></tr></tbody></table>

+    { "INTERVALYEAR_col":"P1Y", "INTERVALDAY_col":"P1D", "INTERVAL_col":"P1Y1M1DT1H1M" }
+    { "INTERVALYEAR_col":"P2Y", "INTERVALDAY_col":"P2D", "INTERVAL_col":"P2Y2M2DT2H2M" }
+    { "INTERVALYEAR_col":"P3Y", "INTERVALDAY_col":"P3D", "INTERVAL_col":"P3Y3M3DT3H3M" }
 
-## Literal
+The following CTAS statement shows how to cast text from a JSON file to INTERVAL data types
in a Parquet table:
 
-The following table provides a list of `date/time` literals that Drill
-supports with examples of each:
+    CREATE TABLE dfs.tmp.parquet_intervals AS 
+    (SELECT cast (INTERVAL_col as interval),
+           cast( INTERVALYEAR_col as interval year) INTERVALYEAR_col, 
+           cast( INTERVALDAY_col as interval day) INTERVALDAY_col 
+    FROM `/user/root/intervals.json`);
 
-<table ><tbody><tr><th >Format</th><th colspan="1" >Interpretation</th><th
>Example</th></tr><tr><td colspan="1" valign="top"><code><span
style="color: rgb(0,0,0);">interval '1 10:20:30.123' day to second</span></code></td><td
colspan="1" valign="top"><code>1 day, 10 hours, 20 minutes, 30 seconds, and 123 thousandths
of a second</code></td><td colspan="1" valign="top"><code><span
style="color: rgb(0,0,0);">select interval '1 10:20:30.123' day to second from dfs.`/tmp/input.json`;</span></code></td></tr><tr><td
colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">interval '1
10' day to hour</span></code></td><td colspan="1" valign="top"><code>1
day 10 hours</code></td><td colspan="1" valign="top"><code><span
style="color: rgb(0,0,0);">select interval '1 10' day to hour from dfs.`/tmp/input.json`;</span></code></td></tr><tr><td
colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">interval '10'
day</span></code></td><td colspan="1" valign="top"><code>10
days</code
 ></td><td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select
interval '10' day from dfs.`/tmp/input.json`;</span></code></td></tr><tr><td
colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">interval '10'
hour</span></code></td><td colspan="1" valign="top"><code>10
hours</code></td><td colspan="1" valign="top"><code><span style="color:
rgb(0,0,0);">select interval '10' hour from dfs.`/tmp/input.json`;</span></code></td></tr><tr><td
colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">interval '10.999'
second</span></code></td><td colspan="1" valign="top"><code>10.999
seconds</code></td><td colspan="1" valign="top"><code><span style="color:
rgb(0,0,0);">select interval '10.999' second from dfs.`/tmp/input.json`; </span></code></td></tr></tbody></table>
+Output is: 
 
+TBD need to test in a future build.
 
 

http://git-wip-us.apache.org/repos/asf/drill/blob/feaa579e/_docs/sql-ref/data-types/002-disparate-data-types.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/data-types/002-disparate-data-types.md b/_docs/sql-ref/data-types/002-disparate-data-types.md
index be9dbc4..52aa71e 100644
--- a/_docs/sql-ref/data-types/002-disparate-data-types.md
+++ b/_docs/sql-ref/data-types/002-disparate-data-types.md
@@ -1,61 +1,132 @@
 ---
 title: "Handling Disparate Data Types"
-parent: "Data Types"
+parent: "Data Type Casting"
 ---
 [Previous](/docs/supported-date-time-data-type-formats)<code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code>[Back
to Table of Contents](/docs)<code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code>[Next](/docs/lexical-structure)
 
-The file format of the data and planned queries determine the casting or converting required,
if any. When Drill reads schema-less data into SQL tables for querying, you might need to
cast one data type to another explicitly. In some cases, Drill converts schema-less data to
typed data implicitly. In this case, you do not need to cast. Drill does not implicitly cast
HBase binary data. You use convert_to and convert_from functions to work with HBase data in
Drill.
+The file format of the data and queries you plan to use determine the casting or converting
you need to do, if any. When Drill reads schema-less data into SQL tables for querying, you
might need to cast one data type to another explicitly. In some cases, Drill converts schema-less
data to typed data implicitly. In this case, you do not need to cast. Drill does not implicitly
cast HBase binary data. You use CONVERT_TO and CONVERT_FROM functions to work with HBase data
in Drill.
 
-With respect to data types, Drill treats data from these sources as follows:
+The following list describes how Drill treats data types from various data sources:
 
-* HBase
+* HBase  
   No implicit casting to SQL types. Convert data to appropriate types as shown in ["Querying
HBase."](/docs/querying-hbase/)
-* Hive
+* Hive  
   Implicitly casts Hive types to SQL types as shown in the Hive [type mapping example](/docs/hive-to-drill-data-type-mapping#type-mapping-example)
-* JSON
-  Implicitly casts JSON data to SQL types as shown in the [SQL and JSON type mapping table](/docs/json-data-model#data-type-mapping)
of the JSON Data Model documentation.
-* MapR-DB
-  Implicitly casts MapR-DB data to SQL types when you use the maprdb format for reading MapR-DB
data. The dfs storage plugin defines the format when you install Drill from the mapr-drill
package on a MapR node.
-* Parquet
-  Implicitly casts JSON data to the SQL types shown in [SQL Data Types to Parquet](/docs/parquet-format/sql-data-types-to-parquet).

-* Text: CSV, TSV, and other text
-  Implicitly casts all textual data to varchar. 
+* JSON  
+  Implicitly casts JSON data to its [corresponding SQL types](/docs/json-data-model#data-type-mapping)
or to VARCHAR if Drillis in all text mode. 
+* MapR-DB  
+  Implicitly casts MapR-DB data to SQL types when you use [the maprdb format](/docs/mapr-db-format)
for reading MapR-DB data. The dfs storage plugin defines the format when you install Drill
from the mapr-drill package on a MapR node.
+* Parquet  
+  Implicitly casts Parquet data to the SQL types shown in [SQL Data Types to Parquet](/docs/parquet-format/sql-data-types-to-parquet).

+* Text: CSV, TSV, and other text  
+  Implicitly casts all textual data to VARCHAR.
+
+## Implicit Casting
+
+
+In general, Drill implicitly casts (promotes) one type to another type based in the order
of precedence, high to low, shown in the following table. A type that has a lower precedence
can be implicitly cast to type of higher precedence. For instance, NULL can be promoted to
any other type; SMALLINT can be promoted into INT. INT cannot be promoted to SMALLINT due
to possible precision loss.
+
+Under certain circumstances, such as queries involving functions such as substr and concat,
Drill reverses the order of precedence and allows a cast to VARCHAR from a type of higher
precedence, such as BIGINT. Drill implicitly casts to more data types than currently supported
for explicit casting.
+ 
+<table>
+  <tr>
+    <th>Data Types by Precedence (high to low)</th>
+    <th></th>
+  </tr>
+  <tr>
+    <td>24 INTERVAL</td>
+    <td>12 BIGINT</td>
+  </tr>
+  <tr>
+    <td>23 INTERVALYEAR</td>
+    <td>11UINT4</td>
+  </tr>
+  <tr>
+    <td>22 INTERVLADAY</td>
+    <td>10 INT</td>
+  </tr>
+  <tr>
+    <td>21 TIMESTAMPTZ</td>
+    <td>9 UINT2</td>
+  </tr>
+  <tr>
+    <td>20 TIMETZ</td>
+    <td>8 SMALLINT</td>
+  </tr>
+  <tr>
+    <td>19 TIMESTAMP</td>
+    <td>7 UINT1</td>
+  </tr>
+  <tr>
+    <td>18 DATE</td>
+    <td>6 VAR16CHAR</td>
+  </tr>
+  <tr>
+    <td>17 TIME</td>
+    <td>5 FIXED16CHAR</td>
+  </tr>
+  <tr>
+    <td>16 FLOAT8</td>
+    <td>4 VARCHAR</td>
+  </tr>
+  <tr>
+    <td>15 DECIMAL</td>
+    <td>3 FIXEDCHAR</td>
+  </tr>
+  <tr>
+    <td>14 MONEY</td>
+    <td>2 VARBINARY</td>
+  </tr>
+  <tr>
+    <td>13 UINT8</td>
+    <td>1 FIXEDBINARY</td>
+  </tr>
+  <tr>
+    <td></td>
+    <td>0 NULL</td>
+  </tr>
+</table>
+
+## Explicit Casting
 
 Drill supports a number of functions to cast and convert compatible data types:
 
-* cast
+* CAST  
   Casts textual data from one data type to another.
-* convert_to and convert_from
-  Converts binary data from one data type to another.
-* to_char
-  Converts a timestamp, interval, integer, real/double precision, or decimal to a string.
-* to_date
-  Converts a string to a date.
-* to_number
-  Converts a string to a decimal.
-* to_timestamp
-  Converts a string to a timestamp.
+* CONVERT_TO and CONVERT_FROM  
+  Converts data, including binary data, from one data type to another.
+* TO_CHAR
+  Converts a TIMESTAMP, INTERVAL, INTEGER, DOUBLE, or DECIMAL to a string.
+* TO_DATE
+  Converts a string to DATE.
+* TO_NUMBER
+  Converts a string to a DECIMAL.
+* TO_TIMESTAMP
+  Converts a string to TIMESTAMP.
 
-### Using Cast
 
-Embed a cast function in a query using this syntax:
+### Using CAST
+
+Embed a CAST function in a query using this syntax:
 
     cast <expression> AS <data type> 
 
-* expression
-  A entity that has single data value, such as a column name, of the data type you want to
cast to a different type
-* data type
+* expression  
+  An entity that has single data value, such as a column name, of the data type you want
to cast to a different type
+* data type  
   The target data type, such as INTEGER or DATE
 
-Example: Inspect integer data and cast data to a decimal
+Example: Inspect INTEGER data and cast the data to the DECIMAL type
 
     SELECT c_row, c_int FROM mydata WHERE c_row = 9;
+
     c_row | c_int
     ------+------------
         9 | -2147483648
     (1 row)
 
-   SELECT c_row, cast(c_int as decimal(28,8)) FROM my_data WHERE c_row = 9;
+    SELECT c_row, CAST(c_int AS DECIMAL(28,8)) FROM my_data WHERE c_row = 9;
+
     c_row | c_int
     ------+---------------------
     9     | -2147483648.00000000
@@ -63,15 +134,188 @@ Example: Inspect integer data and cast data to a decimal
 
 If the SELECT statement includes a WHERE clause that compares a column of an unknown data
type, cast both the value of the column and the comparison value in the WHERE clause. For
example:
 
-    SELECT c_row, cast(c_int as decimal(28,8)) FROM mydata WHERE cast(c_int as decimal(28,8))
> -3.0
+    SELECT c_row, CAST(c_int AS DECIMAL(28,8)) FROM mydata WHERE CAST(c_int AS CECIMAL(28,8))
> -3.0
+
+Although you can use CAST to handle binary data, CONVERT_TO and CONVERT_FROM are recommended
for these conversions.
 
-Although you can use cast to handle binary data, convert_to and convert_from are recommended
for use with binary data.
+The following table shows data types that you can cast to from other data types.
 
-### Using convert_to and convert_from
+<table>
+  <tr>
+    <th></th>
+    <th></th>
+    <th></th>
+    <th></th>
+    <th></th>
+    <th></th>
+    <th></th>
+    <th></th>
+    <th></th>
+    <th></th>
+    <th></th>
+  </tr>
+  <tr>
+    <td>From:</td>
+    <td>SMALLINT</td>
+    <td>INT</td>
+    <td>BIGINT/UINT</td>
+    <td>DECIMAL</td>
+    <td>FLOAT4</td>
+    <td>FLOAT8</td>
+    <td>FIXEDCHAR</td>
+    <td>FIXEDBINARY</td>
+    <td>VARCHAR</td>
+    <td>VARBINARY</td>
+  </tr>
 
-To query HBase data in Drill, convert every column of an HBase table to/from byte arrays
from/to an [SQL data type](/docs/data-types/) that Drill supports when writing/reading data.
For examples of how to use these functions, see ["Convert and Cast Functions".](/docs/sql-functions#convert-and-cast-functions).
+  <tr>
+    <td>SMALLINT</td>
+    <td>no</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+  </tr>
+  <tr>
+    <td>INT</td>
+    <td>yes</td>
+    <td>no</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+  </tr>
+  <tr>
+    <td>BIGINT/UINT</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+  </tr>
+  <tr>
+
+  <tr>
+    <td>DECIMAL</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+  </tr>
+
+  <tr>
+    <td>FLOAT8</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>no</td>
+    <td>yes</td>
+    <td>no</td>
+    <td>yes</td>
+    <td>no</td>
+  </tr>
+  <tr>
+    <td>FLOAT4</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>no</td>
+    <td>no</td>
+    <td>yes</td>
+    <td>no</td>
+    <td>yes</td>
+    <td>no</td>
+  </tr>
+  
+  <tr>
+    <td>FIXEDCHAR</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>no</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+  </tr>
+  <tr>
+    <td>FIXEDBINARY</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>no</td>
+    <td>no</td>
+    <td>yes</td>
+    <td>yes</td>
+  </tr>
+  <tr>
+    <td>VARCHAR</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>no</td>
+    <td>yes</td>
+  </tr>
+  <tr>
+    <td>VARBINARY</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>no</td>
+    <td>yes</td>
+    <td>yes</td>
+    <td>no</td>
+  </tr>
+
+</table>
+
+### Using CONVERT_TO and CONVERT_FROM
+
+To query HBase data in Drill, convert every column of an HBase table to/from byte arrays
from/to an [SQL data type](/docs/data-types/) that Drill supports when writing/reading data.
For examples of how to use these functions, see ["Convert and Cast Functions".](/docs/sql-functions#convert-and-cast-functions)
 
 ## Handling Textual Data
-In a textual file, such as CSV, Drill interprets every field as a varchar, as previously
mentioned. In addition to using the cast function, you can also use [to_char](link), [to_date](line),
[to_number](link), and [to_timestamp](link). If the SELECT statement includes a WHERE clause
that compares a column of an unknown data type, cast both the value of the column and the
comparison value in the WHERE clause.
+In a textual file, such as CSV, Drill interprets every field as a VARCHAR, as previously
mentioned. In addition to using the CAST function, you can also use [to_char](link), [to_date](line),
[to_number](link), and [to_timestamp](link). If the SELECT statement includes a WHERE clause
that compares a column of an unknown data type, cast both the value of the column and the
comparison value in the WHERE clause.
+
+## All text mode option
+All text mode is a system option for controlling how Drill implicitly casts JSON data. When
reading numerical values from a JSON file, Drill implicitly casts a number to the DOUBLE or
BIGINT type depending on the presence or absence a decimal point. If some numbers in a JSON
map or array appear with and without a decimal point, such as 0 and 0.0, Drill throws a schema
change error. To prevent Drill from attempting to read such data, [set all_text_mode](/docs/json-data-model#handling-type-differences)
to true. In all text mode, Drill implicitly casts JSON data to VARCHAR, which you can subsequently
cast to desired types.
+
+Drill reads numbers without decimal point as BIGINT values by default. The range of BIGINT
is -9223372036854775808 to 9223372036854775807. A BIGINT result outside this range produces
an error. Use `all_text_mode` to select data as VARCHAR and then cast the data to a numerical
type.
+
 
 


Mime
View raw message