drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject drill git commit: DRILL-2493
Date Fri, 20 Mar 2015 17:53:58 GMT
Repository: drill
Updated Branches:
  refs/heads/gh-pages 923bad370 -> 2265dd160


DRILL-2493


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/2265dd16
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/2265dd16
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/2265dd16

Branch: refs/heads/gh-pages
Commit: 2265dd16087f080cfbdbd852e9d8a05a400d7968
Parents: 923bad3
Author: Kristine Hahn <khahn@maprtech.com>
Authored: Thu Mar 19 16:26:52 2015 -0700
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Fri Mar 20 10:51:07 2015 -0700

----------------------------------------------------------------------
 _docs/009-datasources.md               |   2 +-
 _docs/data-sources/004-json-ref.md     |   2 +
 _docs/img/datasources-json-bracket.png | Bin 30129 -> 15160 bytes
 _docs/sql-ref/001-data-types.md        |   4 +-
 _docs/sql-ref/002-lexical-structure.md |   8 ++
 _docs/sql-ref/008-sql-extensions.md    | 135 ++++++++++++++++++++++++++++
 6 files changed, 149 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/2265dd16/_docs/009-datasources.md
----------------------------------------------------------------------
diff --git a/_docs/009-datasources.md b/_docs/009-datasources.md
index 5400f22..4b25e9e 100644
--- a/_docs/009-datasources.md
+++ b/_docs/009-datasources.md
@@ -3,7 +3,7 @@ title: "Data Sources and File Formats"
 ---
 Included in the data sources that  Drill supports are these key data sources:
 
-* Hbase
+* HBase
 * Hive
 * MapR-DB
 * File system

http://git-wip-us.apache.org/repos/asf/drill/blob/2265dd16/_docs/data-sources/004-json-ref.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources/004-json-ref.md b/_docs/data-sources/004-json-ref.md
index bf4c5d7..bbb4386 100644
--- a/_docs/data-sources/004-json-ref.md
+++ b/_docs/data-sources/004-json-ref.md
@@ -128,6 +128,8 @@ Generally, you query JSON files using the following syntax, which includes
a tab
     
   The first index position of an array is 0.
 
+* Do not use a map, array or repeated scalar type in GROUP BY, ORDER BY or in a comparison
operator.
+
 Drill returns null when a document does not have the specified map or level.
 
 Using the following techniques, you can query complex, nested JSON:

http://git-wip-us.apache.org/repos/asf/drill/blob/2265dd16/_docs/img/datasources-json-bracket.png
----------------------------------------------------------------------
diff --git a/_docs/img/datasources-json-bracket.png b/_docs/img/datasources-json-bracket.png
index e813568..4dfee92 100644
Binary files a/_docs/img/datasources-json-bracket.png and b/_docs/img/datasources-json-bracket.png
differ

http://git-wip-us.apache.org/repos/asf/drill/blob/2265dd16/_docs/sql-ref/001-data-types.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/001-data-types.md b/_docs/sql-ref/001-data-types.md
index 39b1468..7e56231 100644
--- a/_docs/sql-ref/001-data-types.md
+++ b/_docs/sql-ref/001-data-types.md
@@ -467,7 +467,9 @@ Although you can use CAST to handle binary data, CONVERT_TO and CONVERT_FROM
are
 
 ### Using CONVERT_TO and CONVERT_FROM
 
-To query HBase data in Drill, convert every column of an HBase table to/from byte arrays
from/to an [SQL data type](/docs/data-types/) that Drill supports when writing/reading data.
For examples of how to use these functions, see ["Convert and Cast Functions".](/docs/sql-functions#convert-and-cast-functions)
+CONVERT_TO converts an SQL data type to complex types, including Hbase byte arrays, JSON
and Parquet arrays and mapsTo query HBase data in Drill, convert every column of an HBase
table to/from byte arrays from/to an [SQL data type](/docs/data-types/) that Drill supports
when writing/reading data. For examples of how to use these functions, see ["Convert and Cast
Functions".](/docs/sql-functions#convert-and-cast-functions)
+
+CONVERT_FROM converts from complex types, including Hbase byte arrays, JSON and Parquet arrays
and maps to an SQL data type.
 
 ## Handling Textual Data
 In a textual file, such as CSV, Drill interprets every field as a VARCHAR, as previously
mentioned. In addition to using the CAST function, you can also use [to_char](link), [to_date](line),
[to_number](link), and [to_timestamp](link). If the SELECT statement includes a WHERE clause
that compares a column of an unknown data type, cast both the value of the column and the
comparison value in the WHERE clause.

http://git-wip-us.apache.org/repos/asf/drill/blob/2265dd16/_docs/sql-ref/002-lexical-structure.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/002-lexical-structure.md b/_docs/sql-ref/002-lexical-structure.md
index 5115ab8..731b2b3 100644
--- a/_docs/sql-ref/002-lexical-structure.md
+++ b/_docs/sql-ref/002-lexical-structure.md
@@ -28,6 +28,14 @@ The upper/lowercase sensitivity of the parts differs.
 
 SQL function and command names are case-insensitive. Storage plugin and workspace names are
case-sensitive. Column and table names are case-insensitive unless enclosed in double quotation
marks. The double-quotation mark character can be used as an escape character for the double
quotation mark.
 
+Although column names are case-insensitive in Drill, the names might be otherwise in the
storage format:
+
+* JSON: insensitive
+* Hive: insensitive
+* Parquet: insensitive
+* MapR-DB: case-sensitive
+* HBase: case-sensitive
+
 Keywords are case-insensitive. For example, the keywords SELECT and select are equivalent.
This document shows keywords in uppercase.
 
 The sys.options table name and values are case-sensitive. The following query works:

http://git-wip-us.apache.org/repos/asf/drill/blob/2265dd16/_docs/sql-ref/008-sql-extensions.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/008-sql-extensions.md b/_docs/sql-ref/008-sql-extensions.md
new file mode 100644
index 0000000..1eb5f65
--- /dev/null
+++ b/_docs/sql-ref/008-sql-extensions.md
@@ -0,0 +1,135 @@
+---
+title: "SQL Extensions"
+parent: "SQL Reference"
+---
+Drill extends SQL to work with Hadoop-scale data and to explore smaller-scale data in ways
not possible with SQL. Using intuitive SQL extensions you work with self-describing data and
complex data types. Extensions to SQL include capabilities for exploring self-describing data,
such as files and HBase, directly in the native format.
+
+Drill provides language support for pointing to [storage plugin]() interfaces that Drill
uses to interact with data sources. Use the name of a storage plugin to specify a file system
*database* as a prefix in queries when you refer to objects across databases. Query files,
including compressed .gz files and directories like an SQL table using a single query. 
+
+Drill extends the SELECT statement for reading complex, multi-structured data. The extended
CREATE TABLE AS SELECT, provides the capability to write data of complex/multi-structured
data types. Drill extends the [lexical rules](http://drill.apache.org/docs/lexical-structure)
for working with files and directories, such as using back ticks for including file names,
directory names, and reserved words in queries. Drill syntax supports using the file system
as a persistent store for query profiles and diagnostic information.
+
+## Extensions for Hive- and HBase-related Data Sources
+
+Drill supports Hive and HBase as a plug-and-play data source. You can query Hive tables with
no modifications and creating model in the Hive metastore. Primitives, such as JOIN, support
columnar operation.
+
+## Extensions for JSON-related Data Sources
+For reading all JSON data as text, use the all text mode extension. Drill extends SQL to
provide access to repeating values in arrays and arrays within arrays (array indexes). You
can use these extensions to reach into deeply nested data. Drill extensions use standard JavaScript
notation for referencing data elements in a hierarchy.
+
+## Extensions for Text Data Sources
+Drill handles plain text files and directories like standard SQL tables and can infer knowledge
about the schema of the data. You can query compressed .gz files.
+
+## SQL Commands Extensions
+
+The following table describes key Drill extensions to SQL commands.
+
+<table>
+  <tr>
+    <th>Command</th>
+    <th>SQL</th>
+    <th>Drill</th>
+  </tr>
+  <tr>
+    <td>ALTER (SESSION | SYSTEM)</td>
+    <td>None</td>
+    <td>Changes a system or session option.</td>
+  </tr>
+  <tr>
+    <td>CREATE TABLE AS SELECT</td>
+    <td>Creates a table from selected data in an existing database table.</td>
+    <td>Stores selected data from one or more data sources on the file system.</td>
+  </tr>
+  <tr>
+    <td>CREATE VIEW</td>
+    <td>Creates a virtual table. The fields in a view are fields from one or more real
tables in the database.</td>
+    <td>Creates a virtual structure for and stores the result set. The fields in a
view are fields from files in a file system, Hive, Hbase, MapR-DB tables</td>
+  </tr>
+  <tr>
+    <td>DESCRIBE</td>
+    <td>Obtains information about the &lt;select list&gt; columns</td>
+    <td>Obtains information about views created in a workspace and tables created in
Hive, HBase, and MapR-DB.</td>
+  </tr>
+  <tr>
+    <td>EXPLAIN</td>
+    <td>None</td>
+    <td>Obtains a query execution plan.</td>
+  </tr>
+  <tr>
+    <td>INSERT</td>
+    <td>Loads data into the database for querying.</td>
+    <td>No INSERT function. Performs schema-on-read querying and execution; no need
to load data into Drill for querying.</td>
+  </tr>
+  <tr>
+    <td>SELECT</td>
+    <td>Retrieves rows from a database table or view.</td>
+    <td>Retrieves data from Hbase, Hive, MapR-DB, file system or other storage plugin
data source.</td>
+  </tr>
+  <tr>
+    <td>SHOW (DATABASES | SCHEMAS | FILES | TABLES)</td>
+    <td>None</td>
+    <td>Lists the storage plugin data sources available for querying or the Hive, Hbase,
MapR-DB tables, or views for the data source in use. Supports a FROM clause for listing file
data sources in directories.</td>
+  </tr>
+  <tr>
+    <td>USE</td>
+    <td>Targets a database in SQL schema for querying.</td>
+    <td>Targets Hbase, Hive, MapR-DB, file system or other storage plugin data source,
which can be schema-less for querying.</td>
+  </tr>
+</table>
+
+## SQL Function Extensions
+The following table describes key Drill functions for analyzing nested data.
+
+<table>
+  <tr>
+    <th>Function</th>
+    <th>SQL</th>
+    <th>Drill</th>
+  </tr>
+  <tr>
+    <td>CAST</td>
+    <td>Casts database data from one type to another.</td>
+    <td>Casts database data from one type to another and also casts data having no
metadata into a readable type. Allows liberal casting of schema-less data.</td>
+  </tr>
+  <tr>
+    <td>CONVERT_TO</td>
+    <td>Converts an expression from one type to another using the CONVERT command.</td>
+    <td>Converts an SQL data type to complex types, including Hbase byte arrays, JSON
and Parquet arrays and maps.</td>
+  </tr>
+  <tr>
+    <td>CONVERT_FROM</td>
+    <td>Same as above</td>
+    <td>Converts from complex types, including Hbase byte arrays, JSON and Parquet
arrays and maps to an SQL data type.</td>
+  </tr>
+  <tr>
+    <td>FLATTEN</td>
+    <td>None</td>
+    <td>Separates the elements in nested data from a repeated field into individual
records.</td>
+  </tr>
+  <tr>
+    <td>KVGEN</td>
+    <td>None</td>
+    <td>Returns a repeated map, generating key-value pairs to simplify querying of
complex data having unknown column names. You can then aggregate or filter on the key or value.</td>
+  </tr>
+  <tr>
+    <td>REPEATED_COUNT</td>
+    <td>None</td>
+    <td>Counts the values in a JSON array.</td>
+  </tr>
+</table>
+
+## Other Extensions
+
+[`sys` database system tables]() provide port, version, and option information. Drill Connects
to a random node, know where you’re connected:
+
+select host from sys.drillbits where `current` = true;
++------------+
+|    host    |
++------------+
+| 10.1.1.109 |
++------------+
+
+select commit_id from sys.version;
++------------+
+| commit_id  |
++------------+
+| e3ab2c1760ad34bda80141e2c3108f7eda7c9104 |
+


Mime
View raw message