drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject [1/7] drill git commit: post content for AD 1.10 release
Date Wed, 15 Mar 2017 02:43:28 GMT
Repository: drill
Updated Branches:
  refs/heads/gh-pages 5c2ee7fa8 -> c75795000


post content for AD 1.10 release


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/74eabf4d
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/74eabf4d
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/74eabf4d

Branch: refs/heads/gh-pages
Commit: 74eabf4da0c28a94b2356969755963893b896193
Parents: 5c2ee7f
Author: Bridget Bevens <bbevens@maprtech.com>
Authored: Mon Mar 13 15:49:06 2017 -0700
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Mon Mar 13 15:49:06 2017 -0700

----------------------------------------------------------------------
 .../040-parquet-format.md                       |  50 +++++++++++++++----
 _docs/developer-information/009-rest-api.md     |   6 +--
 _docs/img/jdbc_connection_tries.png             | Bin 0 -> 7999 bytes
 _docs/img/multiple_drill_versions.jpg           | Bin 0 -> 81613 bytes
 ...ying-multiple-drill-versions-in-a-cluster.md |  26 ++++++++++
 .../015-using-jdbc-driver.md                    |  24 +++++++--
 .../005-querying-a-file-system-introduction.md  |  43 +++++++++++++++-
 7 files changed, 130 insertions(+), 19 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/74eabf4d/_docs/data-sources-and-file-formats/040-parquet-format.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources-and-file-formats/040-parquet-format.md b/_docs/data-sources-and-file-formats/040-parquet-format.md
index 56e48e7..0bf6db0 100644
--- a/_docs/data-sources-and-file-formats/040-parquet-format.md
+++ b/_docs/data-sources-and-file-formats/040-parquet-format.md
@@ -1,6 +1,6 @@
 ---
 title: "Parquet Format"
-date:  
+date: 2017-03-13 22:49:07 UTC
 parent: "Data Sources and File Formats"
 ---
 [Apache Parquet](http://parquet.incubator.apache.org/documentation/latest) has the following
characteristics:
@@ -22,9 +22,27 @@ Apache Drill includes the following support for Parquet:
 When a read of Parquet data occurs, Drill loads only the necessary columns of data, which
reduces I/O. Reading only a small piece of the Parquet data from a data file or table, Drill
can examine and analyze all values for a column across multiple files. You can create a Drill
table from one format and store the data in another format, including Parquet.
 
 ## Writing Parquet Files
-CREATE TABLE AS (CTAS) can use any data source provided by the storage plugin. To write Parquet
data using the CTAS command, set the session store.format option as shown in the next section.
Alternatively, configure the storage plugin to point to the directory containing the Parquet
files.
+CREATE TABLE AS (CTAS) can use any data source provided by the storage plugin. To write Parquet
data using the CTAS command, set the `session store.format` option as shown in [Configuring
the Parquet Storage Format]({{site.baseurl}}/docs/parquet-format/#configuring-the-parquet-storage-format).
Alternatively, configure the storage plugin to point to the directory containing the Parquet
files.
 
-Although the data resides in a single table, Parquet output generally consists of multiple
files that resemble MapReduce output having numbered file names,  such as 0_0_0.parquet in
a directory.
+Although the data resides in a single table, Parquet output generally consists of multiple
files that resemble MapReduce output having numbered file names,  such as 0_0_0.parquet in
a directory.  
+
+###Date Value Auto-Correction
+As of Drill 1.10, Drill writes standard Parquet date values. Drill also has an automatic
correction feature that automatically detects and corrects corrupted date values that Drill
wrote into Parquet files prior to Drill 1.10. 
+
+By default, the automatic correction feature is turned on and works for dates up to 5,000
years into the future. In the unlikely event that Drill needs to write dates thousands of
years into the future, turn the auto-correction feature off.  
+
+To disable the auto-correction feature, navigate to the storage plugin configuration and
change the `autoCorrectCorruptDates` option in the Parquet configuration to “false”, as
shown in the example below:  
+
+       "formats": {
+           "parquet": {
+             "type": "parquet",
+             "autoCorrectCorruptDates": false
+           }  
+
+Alternatively, you can set the option to false when you issue a query, as shown in the following
example:  
+
+       SELECT l_shipdate, l_commitdate FROM table(dfs.`/drill/testdata/parquet_date/dates_nodrillversion/drillgen2_lineitem`

+       (type => 'parquet', autoCorrectCorruptDates => false)) LIMIT 1; 
 
 ### Configuring the Parquet Storage Format
 To read or write Parquet data, you need to include the Parquet format in the storage plugin
format definitions. The `dfs` plugin definition includes the Parquet format. 
@@ -130,20 +148,30 @@ The first table in this section maps SQL data types to Parquet data
types, limit
 | INTEGER           | INT32        | 4-byte signed integer                         |
 | VARBINARY(12)*    | INT96        | 12-byte signed int                            |
 
-\* Drill 1.2 and later supports reading the Parquet INT96 type.
+\* Drill 1.10 and later can implicitly interpret the Parquet INT96 type as TIMESTAMP (with
standard 8 byte/millisecond precision) when the `store.parquet.int96_as_timestamp` option
is enabled. In earlier versions of Drill (1.2 through 1.9) or when the `store.parquet.int96_as_timestamp`
option is disabled, you must use the CONVERT_FROM function for Drill to correctly interpret
INT96 values as TIMESTAMP values.
+
+## About INT96 Support  
+As of Drill 1.10, Drill can implicitly interpret the INT96 timestamp data type in Parquet
files when the `store.parquet.int96_as_timestamp` option is enabled. For earlier versions
of Drill,  or when the `store.parquet.int96_as_timestamp` option is disabled, you must use
the CONVERT_FROM function,   
+
+The `store.parquet.int96_as_timestamp` option is disabled by default. Use the [ALTER SYSTEM|SESSION
SET]({{site.baseurl}}/docs/alter-system/) command to enable the option. Unnecessarily enabling
this option can cause queries to fail because the CONVERT_FROM(col, 'TIMESTAMP_IMPALA') function
does not work when `store.parquet.int96_as_timestamp` is enabled.  
+
+###Using CONVERT_FROM to Interpret INT96
+In earlier versions of Drill (1.2 through 1.9), you must use the CONVERT_FROM function for
Drill to interpret the Parquet INT96 type. For example, to decode a timestamp from Hive or
Impala, which is of type INT96, use the CONVERT_FROM function and the [TIMESTAMP_IMPALA]({{site.baseurl}}/docs/supported-data-types/#data-types-for-convert_to-and-convert_from-functions)
type argument:  
+
+``SELECT CONVERT_FROM(timestamp_field, 'TIMESTAMP_IMPALA') as timestamp_field FROM `dfs.file_with_timestamp.parquet`;``
 
+
+Because INT96 is supported for reads only, you cannot use the TIMESTAMP_IMPALA as a data
type argument with CONVERT_TO. You can convert a SQL TIMESTAMP to VARBINARY using the CAST
function, but the resultant VARBINARY is not the same as INT96. 
 
-## About INT96 Support
-Drill 1.2 and later supports reading the Parquet INT96 type. For example, to decode a timestamp
from Hive or Impala, which is of type INT96, use the CONVERT_FROM function and the [TIMESTAMP_IMPALA]({{site.baseurl}}/docs/supported-data-types/#data-types-for-convert_to-and-convert_from-functions)
type argument:
+For example, create a Drill table after reading INT96 and converting some data to a timestamp.
 
-``SELECT CONVERT_FROM(timestamp_field, 'TIMESTAMP_IMPALA') as timestamp_field FROM `dfs.file_with_timestamp.parquet`;``
 
-Because INT96 is supported for reads only, you cannot use the TIMESTAMP_IMPALA as a data
type argument with CONVERT_TO.
+    CREATE TABLE t2(c1) AS SELECT CONVERT_FROM(created_ts, 'TIMESTAMP_IMPALA') FROM t1 ORDER
BY 1 LIMIT 1;
 
-You can convert a SQL TIMESTAMP to VARBINARY using the CAST function, but the resultant VARBINARY
is not the same as the INT96. For example, create a Drill table after reading an INT96 and
converting some data to a timestamp.
+t1.created_ts is an INT96 (or Hive/Impala timestamp) , t2.created_ts is a SQL timestamp.
These types are not comparable. You cannot use a condition like t1.created_ts = t2.created_ts.
 
-`CREATE TABLE t2(c1) AS SELECT CONVERT_FROM(created_ts, 'TIMESTAMP_IMPALA') FROM t1 ORDER
BY 1 LIMIT 1;`
+###Configuring the Timezone
+By default, INT96 timestamp values represent the local date and time, which is similar to
Hive. To get INT96 timestamp values in UTC, configure Drill for [UTC time]({{site.baseurl}}/docs/data-type-conversion/#time-zone-limitation).
 
 
-t1.created_ts is an INT96 (or Hive/Impala timestamp) , t2.created_ts is a SQL timestamp.
These types are not comparable--you cannot use a condition like t1.created_ts = t2.created_ts.
 
 ### SQL Types to Parquet Logical Types
 Parquet also supports logical types, fully described on the [Apache Parquet site](https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md).
Embedded types, JSON and BSON, annotate a binary primitive type representing a JSON or BSON
document. The logical types and their mapping to SQL types are:

http://git-wip-us.apache.org/repos/asf/drill/blob/74eabf4d/_docs/developer-information/009-rest-api.md
----------------------------------------------------------------------
diff --git a/_docs/developer-information/009-rest-api.md b/_docs/developer-information/009-rest-api.md
index ce9e03e..86b37df 100644
--- a/_docs/developer-information/009-rest-api.md
+++ b/_docs/developer-information/009-rest-api.md
@@ -1,6 +1,6 @@
 ---
 title: "REST API"
-date: 2016-11-21 22:14:41 UTC
+date: 2017-03-13 22:49:08 UTC
 parent: "Developer Information"
 ---
 
@@ -306,13 +306,13 @@ Gets metric information.
 
 ----------
 
-### GET /stats.json
+### GET /cluster.json
 
 Get Drillbit information, such as ports numbers.
 
 **Example**
 
-`curl http://localhost:8047/stats.json`
+`curl http://localhost:8047/cluster.json`
 
 **Response Body**
 

http://git-wip-us.apache.org/repos/asf/drill/blob/74eabf4d/_docs/img/jdbc_connection_tries.png
----------------------------------------------------------------------
diff --git a/_docs/img/jdbc_connection_tries.png b/_docs/img/jdbc_connection_tries.png
new file mode 100644
index 0000000..5d5fef8
Binary files /dev/null and b/_docs/img/jdbc_connection_tries.png differ

http://git-wip-us.apache.org/repos/asf/drill/blob/74eabf4d/_docs/img/multiple_drill_versions.jpg
----------------------------------------------------------------------
diff --git a/_docs/img/multiple_drill_versions.jpg b/_docs/img/multiple_drill_versions.jpg
new file mode 100644
index 0000000..8db9234
Binary files /dev/null and b/_docs/img/multiple_drill_versions.jpg differ

http://git-wip-us.apache.org/repos/asf/drill/blob/74eabf4d/_docs/install/070-identifying-multiple-drill-versions-in-a-cluster.md
----------------------------------------------------------------------
diff --git a/_docs/install/070-identifying-multiple-drill-versions-in-a-cluster.md b/_docs/install/070-identifying-multiple-drill-versions-in-a-cluster.md
new file mode 100644
index 0000000..e5971bb
--- /dev/null
+++ b/_docs/install/070-identifying-multiple-drill-versions-in-a-cluster.md
@@ -0,0 +1,26 @@
+---
+title: Identifying Multiple Drill Versions in a Cluster
+date:  
+parent: Install Drill
+---
+
+As of Drill 1.10, the Web Console displays the Drill version running on each Drill node in
the cluster, as shown in the following image:  
+
+![](http://i.imgur.com/42otmKQ.jpg)  
+
+You can also retrieve the version information by running the following query:  
+
+       SELECT * FROM sys.drillbits;  
+
+If the version of Drill differs between nodes, a warning message appears. The nodes running
the current version have a green label, while the nodes running another version have a red
label, as shown in the image above.  
+ 
+The Drill node from which you access the Web Console defines the current version. For example,
assume you have two Drill nodes in a cluster with the following IP addresses, versions, and
Web Console access:  
+
+| Drill   Node | Drill Version | Web Console               |
+|--------------|---------------|---------------------------|
+| 10.10.123.88 | 1.9.0         | http:// 10.10.123.88:8047 |
+| 10.10.136.25 | 1.10.0        | http://10.10.136.25:8047  |  
+
+Accessing the Web Console for Drill node 10.10.123.88 displays Drill version 1.9.0 as the
current version with a green label, while also displaying the Drill version for Drill node
10.10.136.25, but with a red label. Accessing the Web Console for Drill node 10.10.136.25
displays 1.10.0 as the current version with a green label, while also displaying the Drill
version for Drill node 10.10.123.88, but with a red label. In both cases, the Web Console
generates a warning to state that the Drill versions do not match.  
+
+The Web Console sorts the Drill nodes by version, starting with the current Drill node, followed
by Drill nodes with Drill versions that match the current version, followed by Drill nodes
that do not match the current version. Drill nodes marked as having an “undefined” version
may be incorrectly defined or have a pre-1.10.0 version of Drill installed. 

http://git-wip-us.apache.org/repos/asf/drill/blob/74eabf4d/_docs/odbc-jdbc-interfaces/015-using-jdbc-driver.md
----------------------------------------------------------------------
diff --git a/_docs/odbc-jdbc-interfaces/015-using-jdbc-driver.md b/_docs/odbc-jdbc-interfaces/015-using-jdbc-driver.md
index 1ace1cc..e572adc 100644
--- a/_docs/odbc-jdbc-interfaces/015-using-jdbc-driver.md
+++ b/_docs/odbc-jdbc-interfaces/015-using-jdbc-driver.md
@@ -1,6 +1,6 @@
 ---
 title: "Using the JDBC Driver"
-date:  
+date: 2017-03-13 22:49:08 UTC
 parent: "ODBC/JDBC Interfaces"
 ---
 This section explains how to install and use the JDBC driver for Apache Drill. To use the
JDBC driver, you have to:
@@ -79,14 +79,32 @@ drill.exec: {
 
 ## Using the JDBC URL Format for a Direct Drillbit Connection
 
-If you want to connect directly to a Drillbit instead of using ZooKeeper to choose the Drillbit,
replace `zk=<zk name>` with `drillbit=<node>` as shown in the following URL.
+If you want to connect directly to a Drillbit instead of using ZooKeeper to choose the Drillbit,
replace `zk=<zk name>` with `drillbit=<node name>` as shown in the following URL:
 
 `jdbc:drill:drillbit=<node name>[:<port>][,<node name2>[:<port>]...
`  
   `<directory>/<cluster ID>[schema=<storage plugin>]`
 
 where
 
-`drillbit=<node name>` specifies one or more host names or IP addresses of cluster
nodes running Drill. 
+`drillbit=<node name>` specifies one or more host names or IP addresses of cluster
nodes running Drill.  
+
+###`tries` Parameter 
+
+As of Drill 1.10, you can include the optional `tries=<value>` parameter in the connection
string, as shown in the following URL:  
+
+
+    jdbc:drill:drillbit=<node name>[:<port>][,<node name2>[:<port>]...
+    <directory>/<cluster ID>;[schema=<storage plugin>];tries=5  
+
+The “tries” option represents the maximum number of unique drillbits to which the client
can try to establish a successful connection. The default value is 5. This option improves
the fault tolerance in the Drill client when first trying to connect with a drillbit, which
will then act as the Foreman (the node that drives the query).  
+ 
+The order in which the client tries to connect to the drillbits may not occur in the order
listed in the connection string. If the first try results in an authentication failure, the
client does not attempt any additional tries. If the number of unique drillbits listed in
the `drillbit` parameter is less than the “tries” value, the client tries to connect to
each drillbit one time.   
+
+For example, if there are three unique drillbits listed in the connection string, and the
“tries” value is set to 5, the client can try to connect to each drillbit once, until
a successful connection is made, as shown in the image below: 
+
+![](http://i.imgur.com/MJ9qChJ.png)  
+
+If the client cannot successfully connect to any of the drillbits, Drill returns a failure
message. 
 
 For definitions of other URL components, see [Using the JDBC URL for a Random Drillbit Connection]({{site.baseurl}}/docs/using-the-jdbc-driver/#using-the-jdbc-url-for-a-random-drillbit-connection).
 

http://git-wip-us.apache.org/repos/asf/drill/blob/74eabf4d/_docs/query-data/query-a-file-system/005-querying-a-file-system-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/query-data/query-a-file-system/005-querying-a-file-system-introduction.md
b/_docs/query-data/query-a-file-system/005-querying-a-file-system-introduction.md
index 98e107c..436cc75 100644
--- a/_docs/query-data/query-a-file-system/005-querying-a-file-system-introduction.md
+++ b/_docs/query-data/query-a-file-system/005-querying-a-file-system-introduction.md
@@ -1,6 +1,6 @@
 ---
 title: "Querying a File System Introduction"
-date: 2016-04-12 18:29:30 UTC
+date: 2017-03-13 22:49:09 UTC
 parent: "Querying a File System"
 ---
 Files and directories are like standard SQL tables to Drill. You can specify a
@@ -33,4 +33,43 @@ Drill supports the following file types:
 
 The extensions for these file types must match the configuration settings for
 your registered storage plugins. For example, PSV files may be defined with a
-`.tbl` extension, while CSV files are defined with a `.csv` extension.
+`.tbl` extension, while CSV files are defined with a `.csv` extension.  
+
+##Implicit Columns  
+Drill 1.8 introduces implicit columns. Implicit columns provide file information, such as
the directory path to a file and the file extension. You can query implicit columns in files,
directories, nested directories, and files. 
+
+The following table lists the implicit columns available and their descriptions:  
+  
+| Implicit   Column Name | Description                                                  
                             |
+|------------------------|--------------------------------------------------------------------------------------------|
+| FQN                    | The   fully qualified name. Contains the full path to the file,
including the file   name. |
+| FILEPATH               | The   full path to the file, without the file name.          
                             |
+| FILENAME               | The   file name with the file extension. Does not include the
path to the file.            |
+| SUFFIX                 | The   file suffix without the dot (.) at the beginning.      
                             |  
+
+To access implicit columns, you must explicitly include the columns in a query, as shown
in the following example:  
+
+       0: jdbc:drill:zk=local> SELECT fqn, filepath, filename, suffix FROM dfs.`/dev/data/files/test.csvh`
LIMIT 1;  
+       
+       +-------------------------------------+--------------------------+---------------+----------------+
+       |             fqn                     |      filepath            |  filename     |
suffix         |
+       +-------------------------------------+--------------------------+---------------+----------------+
+       | /dev/data/files/test.csvh           | /dev/data/files          | test.csvh     |
csvh           |
+       +-------------------------------------+--------------------------+---------------+----------------+
  
+
+{% include startnote.html %}If a table has a column with the same name as an implicit column,
such as “suffix,” the implicit column overrides the table column.{% include endnote.html
%} 
+
+If a column name has the same name as an implicit column, you can change the default implicit
column name using the [ALTER SYSTEM|SESSION SET]({{site.baseurl}}/docs/alter-system/) command
with the appropriate parameter, as shown in the following example:  
+
+       ALTER SYSTEM SET `drill.exec.storage.implicit.suffix.column.label` = appendix;  
+
+Use the following configuration options to change the default implicit column names:  
+
+       drill.exec.storage.implicit.fqn.column.label
+       drill.exec.storage.implicit.filepath.column.label
+       drill.exec.storage.implicit.filename.column.label
+       drill.exec.storage.implicit.suffix.column.label
+ 
+
+
+


Mime
View raw message