drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-1671) Incorrect results reported by drill when we have more than 10 flattens (2048 records)
Date Wed, 12 Nov 2014 18:46:34 GMT

    [ https://issues.apache.org/jira/browse/DRILL-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208437#comment-14208437
] 

Jacques Nadeau edited comment on DRILL-1671 at 11/12/14 6:46 PM:
-----------------------------------------------------------------

I'm fine with that for now.  A 50-way cartesian join seems like an edge case.


was (Author: jnadeau):
I'm fine with that for now.  A 50 cartesian join seems like an edge case.

> Incorrect results reported by drill when we have more than  10 flattens (2048 records)
> --------------------------------------------------------------------------------------
>
>                 Key: DRILL-1671
>                 URL: https://issues.apache.org/jira/browse/DRILL-1671
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill, Storage - JSON
>            Reporter: Rahul Challapalli
>         Attachments: many-arrays-50.json
>
>
> git.commit.id.abbrev=60aa446
> I ran the below test against the private branch of Jason which has some patches for bugs
related to flatten which are not yet merged into the master.
> The data is in such a way that each array within the record contains only 2 records.
So with each flatten added to the query the no of rows should get doubled
> The below query works as expected
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDir>select count(*) from (select id, flatten(evnts1),
flatten(evnts2), flatten(evnts3), flatten(evnts4), flatten(evnts5), flatten(evnts6), flatten(evnts7),
flatten(evnts8), flatten(evnts9), flatten(evnts10) from `json_kvgenflatten/many-arrays-50.json`)
;
> +------------+
> |   EXPR$0   |
> +------------+
> | 1024       |
> +------------+
> {code}
> However the below query reports incorrect results. The correct output is 2048.
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDir> select count(*) from (select id, flatten(evnts1),
flatten(evnts2), flatten(evnts3), flatten(evnts4), flatten(evnts5), flatten(evnts6), flatten(evnts7),
flatten(evnts8), flatten(evnts9), flatten(evnts10), flatten(evnts11) from `json_kvgenflatten/many-arrays-50.json`)
;
> +------------+
> |   EXPR$0   |
> +------------+
> | 2047       |
> +------------+
> {code}
> From here on no matter how many flattens we add to the query, the output still remains
the same. However the duration of the query seems to more and more with each new flatten added.
> I attached the data file. Let me know if you have any questions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message