drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sekhon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3524) Drill proper DESCRIBE support for MongoDB
Date Tue, 21 Jul 2015 11:16:04 GMT
Hari Sekhon created DRILL-3524:
----------------------------------

             Summary: Drill proper DESCRIBE support for MongoDB
                 Key: DRILL-3524
                 URL: https://issues.apache.org/jira/browse/DRILL-3524
             Project: Apache Drill
          Issue Type: Bug
          Components: Metadata
    Affects Versions: 1.1.0
            Reporter: Hari Sekhon
            Assignee: Steven Phillips


Request to add full DESCRIBE support for MongoDB collections.

I understand this may be difficult / sub-optimal due to the flexible schema nature of Mongo
docs but if you can tabulate results when reading directly from MongoDB for which you have
read the field names, then it's also possible to extract all field names to present for the
describe command, albeit an inefficient scan to do so.

Currently describe returns a pseudo / inaccurate / unhelpful metadata:
{code}+--------------+------------+--------------+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--------------+------------+--------------+
| *            | ANY        | YES          |
+--------------+------------+--------------+{code}

Perhaps you could extend DESCRIBE to scan the first few dozen docs by default to create a
merged schema as well as adding an optional argument to the describe command to allow for
scanning a user-specified number of docs from which to describe the schema, or an ALL argument
keyword to describe to scan all docs in a collection to get the complete global schema for
the collection?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message