Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AE5B118BAB for ; Tue, 21 Jul 2015 13:12:05 +0000 (UTC) Received: (qmail 18165 invoked by uid 500); 21 Jul 2015 13:12:04 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 18139 invoked by uid 500); 21 Jul 2015 13:12:04 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 18128 invoked by uid 99); 21 Jul 2015 13:12:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jul 2015 13:12:04 +0000 Date: Tue, 21 Jul 2015 13:12:04 +0000 (UTC) From: "Hari Sekhon (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (DRILL-3524) Drill proper DESCRIBE support for MongoDB MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sekhon updated DRILL-3524: ------------------------------- Description: Request to add full DESCRIBE support for MongoDB collections. I understand this may be difficult / sub-optimal due to the flexible schema nature of Mongo docs but if you can tabulate results when reading directly from MongoDB for which you have read the field names, then it's also possible to extract all field names to present for the describe command, albeit an inefficient scan to do so. Currently describe returns a pseudo / inaccurate / unhelpful metadata: {code}+--------------+------------+--------------+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | +--------------+------------+--------------+ | * | ANY | YES | +--------------+------------+--------------+{code} Perhaps you could extend DESCRIBE to scan the first few dozen docs by default to create a merged schema as well as adding an optional argument to the describe command to allow for scanning a user-specified number of docs from which to describe the schema, or an ALL argument keyword to describe to scan all docs in a collection to get the complete global schema for the collection? In case of schema evolution it might be an interesting option to additionally read the first and last records by ID etc. was: Request to add full DESCRIBE support for MongoDB collections. I understand this may be difficult / sub-optimal due to the flexible schema nature of Mongo docs but if you can tabulate results when reading directly from MongoDB for which you have read the field names, then it's also possible to extract all field names to present for the describe command, albeit an inefficient scan to do so. Currently describe returns a pseudo / inaccurate / unhelpful metadata: {code}+--------------+------------+--------------+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | +--------------+------------+--------------+ | * | ANY | YES | +--------------+------------+--------------+{code} Perhaps you could extend DESCRIBE to scan the first few dozen docs by default to create a merged schema as well as adding an optional argument to the describe command to allow for scanning a user-specified number of docs from which to describe the schema, or an ALL argument keyword to describe to scan all docs in a collection to get the complete global schema for the collection? > Drill proper DESCRIBE support for MongoDB > ----------------------------------------- > > Key: DRILL-3524 > URL: https://issues.apache.org/jira/browse/DRILL-3524 > Project: Apache Drill > Issue Type: Bug > Components: Metadata, Storage - MongoDB > Affects Versions: 1.1.0 > Reporter: Hari Sekhon > Assignee: Steven Phillips > > Request to add full DESCRIBE support for MongoDB collections. > I understand this may be difficult / sub-optimal due to the flexible schema nature of Mongo docs but if you can tabulate results when reading directly from MongoDB for which you have read the field names, then it's also possible to extract all field names to present for the describe command, albeit an inefficient scan to do so. > Currently describe returns a pseudo / inaccurate / unhelpful metadata: > {code}+--------------+------------+--------------+ > | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | > +--------------+------------+--------------+ > | * | ANY | YES | > +--------------+------------+--------------+{code} > Perhaps you could extend DESCRIBE to scan the first few dozen docs by default to create a merged schema as well as adding an optional argument to the describe command to allow for scanning a user-specified number of docs from which to describe the schema, or an ALL argument keyword to describe to scan all docs in a collection to get the complete global schema for the collection? > In case of schema evolution it might be an interesting option to additionally read the first and last records by ID etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)