drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Challapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4996) Parquet Date auto-correction is not working in auto-partitioned parquet files generated by drill-1.6
Date Mon, 07 Nov 2016 21:27:58 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645514#comment-15645514
] 

Rahul Challapalli commented on DRILL-4996:
------------------------------------------

Just tried it on Drill-1.8 and I see the wrong results as well.
{code}
[root@qa-node190 drillAutomation]# /opt/drill/bin/sqlline -u jdbc:drill:zk=10.10.100.190:5181
apache drill 1.8.0 
"a drill is a terrible thing to waste"
0: jdbc:drill:zk=10.10.100.190:5181> select * from sys.version;
+----------+-------------------------------------------+-----------------------------------------------------+----------------------------+-----------------------------+----------------------------+
| version  |                 commit_id                 |                   commit_message
                   |        commit_time         |         build_email         |         build_time
        |
+----------+-------------------------------------------+-----------------------------------------------------+----------------------------+-----------------------------+----------------------------+
| 1.8.0    | 80c4d0290b3f6aafbbd70777d6f29be9a0e767e3  | [maven-release-plugin] prepare release
drill-1.8.0  | 24.08.2016 @ 22:20:12 PDT  | challapallirahul@gmail.com  | 07.11.2016 @ 12:57:19
PST  |
+----------+-------------------------------------------+-----------------------------------------------------+----------------------------+-----------------------------+----------------------------+
1 row selected (0.602 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select i_rec_start_date, i_size from dfs.`/drill/testdata/parquet_date/auto_partition/item_multipart_autorefresh`
 group by i_rec_start_date, i_size;
+-------------------+--------------+
| i_rec_start_date  |    i_size    |
+-------------------+--------------+
| null              | large        |
| 366-11-08        | extra large  |
| 366-11-08        | medium       |
| null              | medium       |
| 366-11-08        | petite       |
| 364-11-07        | medium       |
| null              | petite       |
| 365-11-07        | medium       |
| 368-11-07        | economy      |
| 365-11-07        | large        |
| 365-11-07        | small        |
| 366-11-08        | small        |
| 365-11-07        | extra large  |
| 364-11-07        | N/A          |
| 366-11-08        | economy      |
| 366-11-08        | large        |
| 364-11-07        | small        |
| null              | small        |
| 364-11-07        | large        |
| 364-11-07        | extra large  |
| 368-11-07        | N/A          |
| 368-11-07        | extra large  |
| 368-11-07        | large        |
| 365-11-07        | petite       |
| null              | N/A          |
| 365-11-07        | economy      |
| 364-11-07        | economy      |
| 364-11-07        | petite       |
| 365-11-07        | N/A          |
| 368-11-07        | medium       |
| null              | extra large  |
| 368-11-07        | small        |
| 368-11-07        | petite       |
| 366-11-08        | N/A          |
+-------------------+--------------+
{code}

> Parquet Date auto-correction is not working in auto-partitioned parquet files generated
by drill-1.6
> ----------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4996
>                 URL: https://issues.apache.org/jira/browse/DRILL-4996
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Rahul Challapalli
>            Priority: Critical
>         Attachments: item.tgz
>
>
> git.commit.id.abbrev=4ee1d4c
> Below are the steps I followed to generate the data :
> {code}
> 1. Generate a parquet file with date column using hive1.2
> 2. Use drill 1.6 to create auto-partitioned parquet files partitioned on the date column
> {code}
> Now the below query returns wrong results :
> {code}
> select i_rec_start_date, i_size from dfs.`/drill/testdata/parquet_date/auto_partition/item_multipart_autorefresh`
 group by i_rec_start_date, i_size;
> +-------------------+--------------+
> | i_rec_start_date  |    i_size    |
> +-------------------+--------------+
> | null              | large        |
> | 366-11-08        | extra large  |
> | 366-11-08        | medium       |
> | null              | medium       |
> | 366-11-08        | petite       |
> | 364-11-07        | medium       |
> | null              | petite       |
> | 365-11-07        | medium       |
> | 368-11-07        | economy      |
> | 365-11-07        | large        |
> | 365-11-07        | small        |
> | 366-11-08        | small        |
> | 365-11-07        | extra large  |
> | 364-11-07        | N/A          |
> | 366-11-08        | economy      |
> | 366-11-08        | large        |
> | 364-11-07        | small        |
> | null              | small        |
> | 364-11-07        | large        |
> | 364-11-07        | extra large  |
> | 368-11-07        | N/A          |
> | 368-11-07        | extra large  |
> | 368-11-07        | large        |
> | 365-11-07        | petite       |
> | null              | N/A          |
> | 365-11-07        | economy      |
> | 364-11-07        | economy      |
> | 364-11-07        | petite       |
> | 365-11-07        | N/A          |
> | 368-11-07        | medium       |
> | null              | extra large  |
> | 368-11-07        | small        |
> | 368-11-07        | petite       |
> | 366-11-08        | N/A          |
> +-------------------+--------------+
> 34 rows selected (0.691 seconds)
> {code}
> However I tried generating the auto-partitioned parquet files using Drill 1.2 and then
the above query returned the right results.
> I attached the required data sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message