drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-4710) Document Drill's JSON processing rules
Date Mon, 06 Jun 2016 18:02:21 GMT
Paul Rogers created DRILL-4710:

             Summary: Document Drill's JSON processing rules
                 Key: DRILL-4710
                 URL: https://issues.apache.org/jira/browse/DRILL-4710
             Project: Apache Drill
          Issue Type: Improvement
          Components: Documentation
            Reporter: Paul Rogers
            Priority: Minor

One of Drill's key benefits is the ability to query JSON-formatted data. Much great work has
been done. But, unless someone happens to be a Drill developer, the details of exactly how
Drill handles various JSON formats can be hard to find.

We should document how Drill handles various JSON scenarios.

* SELECT * (schema inferred)
* SELECT a, b, c (schema implied by query)

And various JSON structures:

* Top-level structure (list of maps. Can we handle an array of maps? A list of scalars?)
* Changes of the top-level map structure across rows.
** New field appears later in the file. (Was {a: 1, b: "s"}, now is {a: 1, b: "s", c: 10}
** Fields disappear later in the file
** Fields change type
** Start of file has many nulls for a field, later in file has non-null values.
* How Drill handles array fields
** Array field is null: { a: [10, 20]}, { a: null }
** Array contains nulls: { a: [10, null, 20] }
** Array contains single scalar type (number or string)
** Array contains multiple scalar types (number and string)
** Aray contains structured types (array, map)
* How Drill handles nested maps
** Explicit select: a, b.c, b.d: {a: 1, b: { c: "s", d: 10 }}
** Implicit select: *
** How data is delivered to Drill client
** How data is delivered to JDBC/ODBC clients
* Size issues
** Very large records (what is max size?)
** Very large strings
** Vary large arrays

Along with any other detailed information not covered by the above list.

This message was sent by Atlassian JIRA

View raw message