Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A7A9E109F7 for ; Fri, 11 Mar 2016 18:21:39 +0000 (UTC) Received: (qmail 66954 invoked by uid 500); 11 Mar 2016 18:21:39 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 66883 invoked by uid 500); 11 Mar 2016 18:21:39 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 66849 invoked by uid 99); 11 Mar 2016 18:21:39 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Mar 2016 18:21:39 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 088F02C044E for ; Fri, 11 Mar 2016 18:21:39 +0000 (UTC) Date: Fri, 11 Mar 2016 18:21:39 +0000 (UTC) From: "Jiang Wu (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (DRILL-4498) Projecting a map key within an array produces incorrect results MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Jiang Wu created DRILL-4498: ------------------------------- Summary: Projecting a map key within an array produces incorrect results Key: DRILL-4498 URL: https://issues.apache.org/jira/browse/DRILL-4498 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 1.4.0 Reporter: Jiang Wu To reproduce: 1) place the following 3 JSON objects in a file: {noformat} {"r":1,"c1":[{"c2":1,"c3":"a"},{"c2":2,"c3":"b"},{"c2":3,"c3":"c"}]} {"r":2,"c1":[{"c2":4,"c3":"d"}]} {"r":3,"c1":[{"c2":5,"c3":"e"},{"c2":6,"c3":"f"},{"c2":7,"c3":"g"}]} {noformat} 2) Run query: {noformat} select t.r, t.c1.c2 from dfs.`c:\tmp\data.json` t; +----+---------+ | r | EXPR$1 | +----+---------+ | 1 | 1 | | 2 | 2 | <-- not OK | 3 | 3 | <-- not OK +----+---------+ {noformat} 3) The above results are incorrect. The returned values for "c1.c2" are not correlated with the values from r after the first row. Expecting the result contains information for r = 1 has 3 values for c1.c2: 1, 2, and 3. For example, the same conceptual query in MongoDB, returns the proper information: {noformat} > db.t.find({}, {"r":1, "c1.c2":1}): {"r":1,"c1":[{"c2":1},{"c2":2},{"c2":3}]} {"r":2,"c1":[{"c2":4}]} {"r":3,"c1":[{"c2":5},{"c2":6},{"c2":7}]} {noformat} For Drill, the same information can be returned, even if it is differently formatted in a more relational style. For example: {noformat} select t.r, t.c1.c2 from dfs.`c:\tmp\data.json` t; +----+-----------+ | r | EXPR$1 | +----+-----------+ | 1 | [1, 2, 3] | | 2 | [4] | | 3 | [5, 6, 7] | +----+-----------+ {noformat} Or choose some other formatting for the output. Returning an array of value can be an important use case to support operations such as forming a single string of comma separated value "1, 2, 3" without going through flatten and then re-aggregate, or predicates such as "where ... xyz in c1.c2 ..." -- This message was sent by Atlassian JIRA (v6.3.4#6332)