drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arina Yelchiyeva <arina.yelchiy...@gmail.com>
Subject Re: JSON reader enhancement
Date Sat, 18 Nov 2017 15:15:57 GMT
In general sounds good.
If user will apply kvgen / flatten over such 2-D array columns read as
string, he will be able to normalize data in the format he wants? Right? Or
we need to come up with new function?

Kind regards
Arina

On Fri, Nov 17, 2017 at 10:39 PM, Paul Rogers <progers@mapr.com> wrote:

> Hi All,
>
> I’d like to propose a minor enhancement to the JSON reader to better
> handle non-relational JSON structures. (See DRILL-5974 [1].)
>
> As background, Drill handles simple tuples:
>
> {a: 10, b: “fred”}
>
> Drill also handles arrays:
>
> {name: “fred”, hobbies: [“bowling”, “golf”]}
>
> Drill even handles arrays of tuples:
>
> {name: “fred”, orders: [
>   {id: 1001, amount: 12.34},
>   {id: 1002, amount: 56.78}]}
>
> The above are termed "relational" because there is a straightforward
> mapping to/from tables into the above JSON structures.
>
> Things get interesting with non-relational types, such as 2-D arrays:
>
> {id: 4, shape: “square”, points: [[0, 0], [0, 5], [5, 0], [5, 5]]}
>
> Drill has two solutions:
>
> * Turn on the experimental list and union support.
> * Enable all-text mode to read all fields as JSON text.
>
> Here, I’d like to propose a middle ground:
>
> * Read fields with relational types into vectors.
> * Read non-relational fields using text mode.
>
> Thus, the first three examples would all result in the JSON data parsed
> into Drill vectors. But, the fourth, non-relational example would produce a
> row that looks like this:
>
> id, shape, points
> 4, “shape”, “[[0, 0], [0, 5], [5, 0], [5, 5]]”
>
> Although Drill can’t parse the 2-D array, Drill will pass the array along
> to the client, which can use its favorite JSON parser to parse the array
> and do something useful (like draw the square in this case.)
>
> In particular, the proposal:
>
> * Apply this change only to the revised “batch size aware” JSON reader.
> * Use the above parsing model by default.
> * Use the experimental list-and-union support if the existing
> `exec.enable_union_type` system/session option is set.
>
> Existing queries should “just work.” In fact, now JSON with non-relational
> types will work “out-of-the-box” without all-text mode or the experimental
> types.
>
> Thoughts?
>
> - Paul
>
> [1] https://issues.apache.org/jira/browse/DRILL-5974
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message