drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Challapalli (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (DRILL-1671) Incorrect results reported by drill when we have more than 10 flattens (2048 records)
Date Wed, 06 May 2015 01:19:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rahul Challapalli closed DRILL-1671.
------------------------------------

Verified and added the below testcase

Functional/Passing/json_kvgenflatten/flatten/flatten_DRILL-1671.q

> Incorrect results reported by drill when we have more than  10 flattens (2048 records)
> --------------------------------------------------------------------------------------
>
>                 Key: DRILL-1671
>                 URL: https://issues.apache.org/jira/browse/DRILL-1671
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill, Storage - JSON
>            Reporter: Rahul Challapalli
>             Fix For: 0.7.0
>
>         Attachments: many-arrays-50.json
>
>
> git.commit.id.abbrev=60aa446
> I ran the below test against the private branch of Jason which has some patches for bugs
related to flatten which are not yet merged into the master.
> The data is in such a way that each array within the record contains only 2 records.
So with each flatten added to the query the no of rows should get doubled
> The below query works as expected
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDir>select count(*) from (select id, flatten(evnts1),
flatten(evnts2), flatten(evnts3), flatten(evnts4), flatten(evnts5), flatten(evnts6), flatten(evnts7),
flatten(evnts8), flatten(evnts9), flatten(evnts10) from `json_kvgenflatten/many-arrays-50.json`)
;
> +------------+
> |   EXPR$0   |
> +------------+
> | 1024       |
> +------------+
> {code}
> However the below query reports incorrect results. The correct output is 2048.
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDir> select count(*) from (select id, flatten(evnts1),
flatten(evnts2), flatten(evnts3), flatten(evnts4), flatten(evnts5), flatten(evnts6), flatten(evnts7),
flatten(evnts8), flatten(evnts9), flatten(evnts10), flatten(evnts11) from `json_kvgenflatten/many-arrays-50.json`)
;
> +------------+
> |   EXPR$0   |
> +------------+
> | 2047       |
> +------------+
> {code}
> From here on no matter how many flattens we add to the query, the output still remains
the same. However the duration of the query seems to more and more with each new flatten added.
> I attached the data file. Let me know if you have any questions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message