drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-6035) Specify Drill's JSON behavior
Date Tue, 19 Dec 2017 05:26:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293316#comment-16293316
] 

Paul Rogers edited comment on DRILL-6035 at 12/19/17 5:25 AM:
--------------------------------------------------------------

h4. JSON Arrays

Drill supports simple arrays in JSON using the following rules:

* Arrays must contain hetrogeneous elements: any of the scalars described above, or a JSON
object.

(See a later comment for nested arrays.)

For example, the following are scalar arrays:
{code}
[10, 20]
[10.30, 10.45]
["foo", "bar"]
[true, false]
{code}

h4. Schema Change in Arrays

The following will trigger errors:

{code}
{a: [10, "foo"]}  // Mixed types
{a: [10]} {a: ["foo"]} // Schema change
{a: [10, 12.5]} // Conflicting types: integer and float
{code}

h4. Nulls in Arrays

Drill handles nulls in arrays using the {{LIST}} type, described in a separate note below.


was (Author: paul.rogers):
h4. JSON Arrays

Drill supports simple arrays in JSON using the following rules:

* Arrays must contain hetrogeneous elements: any of the scalars described above, or a JSON
object.

(See a later comment for nested arrays.)

For example, the following are scalar arrays:
{code}
[10, 20]
[10.30, 10.45]
["foo", "bar"]
[true, false]
{code}

h4. Schema Change in Arrays

The following will trigger errors:

{code}
{a: [10, "foo"]}  // Mixed types
{a: [10]} {a: ["foo"]} // Schema change
{a: [10, 12.5]} // Conflicting types: integer and float
{code}

h4. Nulls with Arrays

Rules for nulls  are:

* Arrays may not contain nulls. (Drill does not support nulls as array elements.)
* A null (or missing) array field is treated the same as an empty array.

The following is invalid:
{code}
[10, null, 20]
{code}

The following are all valid:

{code}
{id: 1}
{id: 2, a: null}
{id: 3, a: []}
{id: 4, a: [10, 20, 30]}
{code}

As described, Drill will defer picking an array type if it sees null values. In the above
example, for id=2, Drill sees column `a` but does not pick a type. For id=3, Drill identifies
that `a` is an array, but does not know the type. Finally, for id=4, Drill identifies the
array as {{Repeated BIGINT}}. (This is the behavior for Drill 1.13, earlier versions may differ
and require investigation.)

As usual, if the first file or batch contains only nulls, Drill will guess {{Nullable VARCHAR}}
which will cause a schema change error if later records reveal the type to be an array (of
any type.)

If the first batch contains only nulls and/or empty arrays, Drill guesses that the type is
{{Repeated VARCHAR}}. (Again, this is specific to Drill 1.13.) For example:

{code}
{id: 1}
{id: 2, a: null}
{id: 3, a: []}
{code}

> Specify Drill's JSON behavior
> -----------------------------
>
>                 Key: DRILL-6035
>                 URL: https://issues.apache.org/jira/browse/DRILL-6035
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Assignee: Pritesh Maker
>
> Drill supports JSON as its native data format. However, experience suggests that Drill
may have limitations in the JSON that Drill supports. This ticket asks to clarify Drill's
expected behavior on various kinds of JSON.
> Topics to be addressed:
> * Relational vs. non-relational structures
> * JSON structures used in practice and how they map to Drill
> * Support for varying data types
> * Support for missing values, especially across files
> These topics are complex, hence the request to provide a detailed specifications that
clarifies what Drill does and does not support (or what is should and should not support.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message