drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tshi...@apache.org
Subject [19/30] drill git commit: Update 050-json-data-model.md
Date Mon, 23 Nov 2015 21:54:02 GMT
Update 050-json-data-model.md

Add documentation for hetergenous types (Union type).

Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/fe75ca91
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/fe75ca91
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/fe75ca91

Branch: refs/heads/gh-pages
Commit: fe75ca9127619dea114f1c2e34db7fa3edf1b8df
Parents: 85f3a1b
Author: Steven Phillips <smp@apache.org>
Authored: Sun Nov 22 12:00:44 2015 -0800
Committer: Tomer Shiran <tshiran@gmail.com>
Committed: Mon Nov 23 10:09:38 2015 -0800

 .../050-json-data-model.md                      | 21 ++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/_docs/data-sources-and-file-formats/050-json-data-model.md b/_docs/data-sources-and-file-formats/050-json-data-model.md
index 4cd5281..48ea264 100644
--- a/_docs/data-sources-and-file-formats/050-json-data-model.md
+++ b/_docs/data-sources-and-file-formats/050-json-data-model.md
@@ -60,6 +60,21 @@ When you set this option, Drill reads all numbers from the JSON files as
 Drill uses these types internally for reading complex and nested data structures from data
sources such as JSON.
+### Experimental Feature: Heterogeneous types
+The Union type allows storing different types in the same field. This new feature is still
considered experimental, and must be explicitly enabled by setting the `exec.enabel_union_type`
option to true.
+    ALTER SESSION SET `exec.enable_union_type` = true;
+ With this feature enabled, JSON data with changing types, which previously could not be
queried by drill, are now queryable.
+A field with a Union type can be used inside of functions. Drill will automatically handle
evaluation of the function appropriately for each type. If the data requires special handling
for the different types, you can do this with case statement, leveraging the new `type` functions:
+select 1 + case when is_list(a) then a[0] else a end from table;
+In this example, the column a contains both scalar and list types, so the case where it is
a list is handled by using the first element of the array.
 ## Reading JSON
 To read JSON data using Drill, use a [file system storage plugin]({{ site.baseurl }}/docs/file-system-storage-plugin/)
that defines the JSON format. You can use the `dfs` storage plugin, which includes the definition.
@@ -464,6 +479,8 @@ Workaround: None, per se, but if you avoid querying the multi-polygon
lines (120
     | -122.4379846573301  | 37.75844260679518  | 0.0     |
     1 row selected (6.64 seconds)
+Another option is to use the experimental union type.
 ### Varying types
 Any attempt to query a list that has
@@ -472,7 +489,7 @@ coordinates for shapes of type Polygon.  For shapes of MultiPolygon, this
file h
 of coordinates. Even a query that tries to filter away the
 MultiPolygons will fail.
-Workaround: None.
+Workaround: Use union type (experimental).
 ### Misusing Dot Notation
 Drill accesses an object when you use dot notation in the SELECT statement only when the
dot is *not* the first dot in the expression. Drill attempts to access the table that appears
after the first dot. For example,  records in `some-file` have a geometry field that Drill
successfully accesses given this query:
@@ -516,4 +533,4 @@ Workaround: Set the `store.json.read_numbers_as_double` property, described
 ### Selecting all in a JSON directory query
 Drill currently returns only fields common to all the files in a [directory query]({{ site.baseurl
}}/docs/querying-directories) that selects all (SELECT *) JSON files.
-Workaround: Query each file individually.
+Workaround: Query each file individually. Another option is to use the union type (experimental).

View raw message