drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sekhon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3526) Drill proper DESCRIBE support for JSON
Date Tue, 21 Jul 2015 11:49:04 GMT
Hari Sekhon created DRILL-3526:

             Summary: Drill proper DESCRIBE support for JSON
                 Key: DRILL-3526
                 URL: https://issues.apache.org/jira/browse/DRILL-3526
             Project: Apache Drill
          Issue Type: Bug
          Components: Metadata, Storage - JSON
    Affects Versions: 1.1.0
            Reporter: Hari Sekhon
            Assignee: Steven Phillips

Request to add full DESCRIBE support for JSON files.

Currently the describe command results in a blank table being printed instead of the schema,
which is unhelpful, so I do a select * limit 1 instead.

While trying to describe lots of JSON data could be inefficient, I propose the following solution:

Read JSON records until a threshold of a few thousand JSON file records or few tens of thousands
of fields have been read without discovering any new fields, and then assume that is the schema.

Extend the DESCRIBE command to have a user-configurable number of records / fields to read
(or rather number of records / fields to read without which any new fields have been discovered)
to present a merged schema for the data source, as well as an ALL keywords to scan all JSON
files and records to create true global schema.

This message was sent by Atlassian JIRA

View raw message