drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkata krishnan Sowrirajan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-1664) Drill gives wrong count on a parquet file which is created as a table by drill
Date Fri, 07 Nov 2014 21:55:33 GMT

    [ https://issues.apache.org/jira/browse/DRILL-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202761#comment-14202761
] 

Venkata krishnan Sowrirajan commented on DRILL-1664:
----------------------------------------------------

If I do "create table `t2-csv` as select * from `t2.csv`;", then the parquet file created
looks like below:

columns = 9711942
columns = HX362083

columns = 9707867
columns = HX357851

Here all the columns are named as `columns`, this is why the count is showing up as 4.

If I do "create table `t2-parq` as select columns[0] as a, columns[1] as b from `t2.csv`;",
then the parquet file created looks like below:

a = 9711942
b = HX362083

a = 9707867
b = HX357851

This way if create the table, the count showing correctly as 2.

> Drill gives wrong count on a parquet file which is created as a table by drill
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-1664
>                 URL: https://issues.apache.org/jira/browse/DRILL-1664
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Venkata krishnan Sowrirajan
>
> Steps carried out:
> 0: jdbc:drill:> select * from `t2.csv`;
> +------------+
> |  columns   |
> +------------+
> | ["9711942","HX362083"] |
> | ["9707867","HX357851"] |
> +------------+
> 2 rows selected (0.123 seconds)
> 0: jdbc:drill:> create table `t2-csv` as select * from `t2.csv`;
> +------------+---------------------------+
> |  Fragment  | Number of records written |
> +------------+---------------------------+
> | 0_0        | 2                         |
> +------------+---------------------------+
> 1 row selected (0.252 seconds)
> 0: jdbc:drill:> select * from `t2-csv`;
> +------------+
> |  columns   |
> +------------+
> | ["9711942","HX362083"] |
> | ["9707867","HX357851"] |
> +------------+
> 2 rows selected (0.116 seconds)
> 0: jdbc:drill:> select count(*) from `t2-csv`
> . . . . . . . > ;
> +------------+
> |   EXPR$0   |
> +------------+
> | 4          |
> +------------+
> 1 row selected (0.128 seconds)
> Is there a similar bug for this already filed? If there is a similar bug for this, mark
this as duplicate as I couldn't find that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message