Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C4BA3200CBA for ; Mon, 19 Jun 2017 00:23:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B7A83160BEE; Sun, 18 Jun 2017 22:23:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 092D5160BE3 for ; Mon, 19 Jun 2017 00:23:10 +0200 (CEST) Received: (qmail 25609 invoked by uid 500); 18 Jun 2017 22:23:10 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 25600 invoked by uid 99); 18 Jun 2017 22:23:10 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Jun 2017 22:23:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 660C91A07D1 for ; Sun, 18 Jun 2017 22:23:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id c2eYPcsVetbU for ; Sun, 18 Jun 2017 22:23:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 1F9395FC4D for ; Sun, 18 Jun 2017 22:23:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id AAA14E0D57 for ; Sun, 18 Jun 2017 22:23:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3AB0824004 for ; Sun, 18 Jun 2017 22:23:02 +0000 (UTC) Date: Sun, 18 Jun 2017 22:23:02 +0000 (UTC) From: "Paul Rogers (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-4824) Null maps / lists and non-provided state support for JSON fields. Numeric types promotion. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 18 Jun 2017 22:23:12 -0000 [ https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053326#comment-16053326 ] Paul Rogers commented on DRILL-4824: ------------------------------------ Thanks for the very detailed, informative proposal! I've gone through it and added detailed comments. The main themes are: * Must coordinate with the work done in DRILL-5211 to avoid fragmentation. This work has reworked the "vector writers", among other changes. * Must handle null vectors in a generic way, not in JSON-specifc code. * Need for type promotion in both assignment (assign smaller value to larger vector) and in vector promotion (replace a smaller vector with a larger one when presented with a larger value.) * Backward compatibility with older JDBC and ODBC clients that do not understand the new vector layouts. Also, we probably should check with the Arrow project to see if they have solved this problem or have plans to do so. It is a stated (but questioned) goal of Drill to move to Arrow. So, changing the vectors in a way that Arrow does not support will prevent us from switching to Arrow -- unless we can make the same changes in Arrow. > Null maps / lists and non-provided state support for JSON fields. Numeric types promotion. > ------------------------------------------------------------------------------------------ > > Key: DRILL-4824 > URL: https://issues.apache.org/jira/browse/DRILL-4824 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON > Affects Versions: 1.0.0 > Reporter: Roman > Assignee: Volodymyr Vysotskyi > > There is incorrect output in case of JSON file with complex nested data. > _JSON:_ > {code:none|title=example.json|borderStyle=solid} > { > "Field1" : { > } > } > { > "Field1" : { > "InnerField1": {"key1":"value1"}, > "InnerField2": {"key2":"value2"} > } > } > { > "Field1" : { > "InnerField3" : ["value3", "value4"], > "InnerField4" : ["value5", "value6"] > } > } > {code} > _Query:_ > {code:sql} > select Field1 from dfs.`/tmp/example.json` > {code} > _Incorrect result:_ > {code:none} > +---------------------------+ > | Field1 | > +---------------------------+ > {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]} > {"InnerField1":{"key1":"value1"},"InnerField2" {"key2":"value2"},"InnerField3":[],"InnerField4":[]} > {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]} > +--------------------------+ > {code} > Theres is no need to output missing fields. In case of deeply nested structure we will get unreadable result for user. > _Correct result:_ > {code:none} > +--------------------------+ > | Field1 | > +--------------------------+ > |{} > {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}} > {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]} > +--------------------------+ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)