drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jiang Wu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-4498) Projecting a map key within an array produces incorrect results
Date Fri, 11 Mar 2016 18:35:39 GMT

     [ https://issues.apache.org/jira/browse/DRILL-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jiang Wu updated DRILL-4498:
----------------------------
    Description: 
To reproduce:

1) place the following 3 JSON objects in a file:
{noformat}
{"r":1,"c1":[{"c2":1,"c3":"a"},{"c2":2,"c3":"b"},{"c2":3,"c3":"c"}]}
{"r":2,"c1":[{"c2":4,"c3":"d"}]}
{"r":3,"c1":[{"c2":5,"c3":"e"},{"c2":6,"c3":"f"},{"c2":7,"c3":"g"}]}
{noformat}
2) Run query:
{noformat}
select t.r, t.c1.c2 from dfs.`c:\tmp\data.json` t;
+----+---------+
| r  | EXPR$1  |
+----+---------+
| 1  | 1       |
| 2  | 2       | <-- not OK
| 3  | 3       | <-- not OK
+----+---------+
{noformat}

3) The above results are incorrect.  The returned values for "c1.c2" are not correlated with
the values from r after the first row.  Expecting the result contains information for r =
1 has 3 values for c1.c2: 1, 2, and 3.

For example, the same conceptual query in MongoDB, returns the proper information:

{noformat}
> db.t.find({}, {"r":1, "c1.c2":1}):
{"r":1,"c1":[{"c2":1},{"c2":2},{"c2":3}]}
{"r":2,"c1":[{"c2":4}]}
{"r":3,"c1":[{"c2":5},{"c2":6},{"c2":7}]}
{noformat}

I believe when using the Mongo storage plugin, the projection " select t.r, t.c1.c2 " translates
to the Mongo query above.  The Mongo plugin then produces the above JSON documents and sent
to Drill.  There is then a bug in Drill in converting the correct JSON document into the Drill
results.

For Drill, the same information can be returned, even if it is differently formatted in a
more relational style.  For example:

{noformat}
select t.r, t.c1.c2 from dfs.`c:\tmp\data.json` t;
+----+-----------+
| r  | EXPR$1    |
+----+-----------+
| 1  | [1, 2, 3] |
| 2  | [4]       | 
| 3  | [5, 6, 7] |
+----+-----------+
{noformat}

Or choose some other formatting for the output.

Returning an array of value can be an important use case to support operations such as forming
a single string of comma separated value "1, 2, 3" without going through flatten and then
re-aggregate, or predicates such as "where ... xyz in c1.c2 ..."



  was:
To reproduce:

1) place the following 3 JSON objects in a file:
{noformat}
{"r":1,"c1":[{"c2":1,"c3":"a"},{"c2":2,"c3":"b"},{"c2":3,"c3":"c"}]}
{"r":2,"c1":[{"c2":4,"c3":"d"}]}
{"r":3,"c1":[{"c2":5,"c3":"e"},{"c2":6,"c3":"f"},{"c2":7,"c3":"g"}]}
{noformat}
2) Run query:
{noformat}
select t.r, t.c1.c2 from dfs.`c:\tmp\data.json` t;
+----+---------+
| r  | EXPR$1  |
+----+---------+
| 1  | 1       |
| 2  | 2       | <-- not OK
| 3  | 3       | <-- not OK
+----+---------+
{noformat}

3) The above results are incorrect.  The returned values for "c1.c2" are not correlated with
the values from r after the first row.  Expecting the result contains information for r =
1 has 3 values for c1.c2: 1, 2, and 3.

For example, the same conceptual query in MongoDB, returns the proper information:

{noformat}
> db.t.find({}, {"r":1, "c1.c2":1}):
{"r":1,"c1":[{"c2":1},{"c2":2},{"c2":3}]}
{"r":2,"c1":[{"c2":4}]}
{"r":3,"c1":[{"c2":5},{"c2":6},{"c2":7}]}
{noformat}

For Drill, the same information can be returned, even if it is differently formatted in a
more relational style.  For example:

{noformat}
select t.r, t.c1.c2 from dfs.`c:\tmp\data.json` t;
+----+-----------+
| r  | EXPR$1    |
+----+-----------+
| 1  | [1, 2, 3] |
| 2  | [4]       | 
| 3  | [5, 6, 7] |
+----+-----------+
{noformat}

Or choose some other formatting for the output.

Returning an array of value can be an important use case to support operations such as forming
a single string of comma separated value "1, 2, 3" without going through flatten and then
re-aggregate, or predicates such as "where ... xyz in c1.c2 ..."




> Projecting a map key within an array produces incorrect results
> ---------------------------------------------------------------
>
>                 Key: DRILL-4498
>                 URL: https://issues.apache.org/jira/browse/DRILL-4498
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.4.0
>            Reporter: Jiang Wu
>
> To reproduce:
> 1) place the following 3 JSON objects in a file:
> {noformat}
> {"r":1,"c1":[{"c2":1,"c3":"a"},{"c2":2,"c3":"b"},{"c2":3,"c3":"c"}]}
> {"r":2,"c1":[{"c2":4,"c3":"d"}]}
> {"r":3,"c1":[{"c2":5,"c3":"e"},{"c2":6,"c3":"f"},{"c2":7,"c3":"g"}]}
> {noformat}
> 2) Run query:
> {noformat}
> select t.r, t.c1.c2 from dfs.`c:\tmp\data.json` t;
> +----+---------+
> | r  | EXPR$1  |
> +----+---------+
> | 1  | 1       |
> | 2  | 2       | <-- not OK
> | 3  | 3       | <-- not OK
> +----+---------+
> {noformat}
> 3) The above results are incorrect.  The returned values for "c1.c2" are not correlated
with the values from r after the first row.  Expecting the result contains information for
r = 1 has 3 values for c1.c2: 1, 2, and 3.
> For example, the same conceptual query in MongoDB, returns the proper information:
> {noformat}
> > db.t.find({}, {"r":1, "c1.c2":1}):
> {"r":1,"c1":[{"c2":1},{"c2":2},{"c2":3}]}
> {"r":2,"c1":[{"c2":4}]}
> {"r":3,"c1":[{"c2":5},{"c2":6},{"c2":7}]}
> {noformat}
> I believe when using the Mongo storage plugin, the projection " select t.r, t.c1.c2 "
translates to the Mongo query above.  The Mongo plugin then produces the above JSON documents
and sent to Drill.  There is then a bug in Drill in converting the correct JSON document into
the Drill results.
> For Drill, the same information can be returned, even if it is differently formatted
in a more relational style.  For example:
> {noformat}
> select t.r, t.c1.c2 from dfs.`c:\tmp\data.json` t;
> +----+-----------+
> | r  | EXPR$1    |
> +----+-----------+
> | 1  | [1, 2, 3] |
> | 2  | [4]       | 
> | 3  | [5, 6, 7] |
> +----+-----------+
> {noformat}
> Or choose some other formatting for the output.
> Returning an array of value can be an important use case to support operations such as
forming a single string of comma separated value "1, 2, 3" without going through flatten and
then re-aggregate, or predicates such as "where ... xyz in c1.c2 ..."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message