drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vitalii Diravka (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly
Date Wed, 28 Sep 2016 14:29:21 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529814#comment-15529814
] 

Vitalii Diravka commented on DRILL-4203:
----------------------------------------

No, it will not. This property in parquet metadata. For example: 
{code}
creator:     parquet-mr version 1.8.1-drill-r0 (build 6b605a4ea05b66e1a6bf843353abcb4834a4ced8)

extra:       drill.version = 1.9.0-SNAPSHOT 
extra:       is.date.correct = true
{code}
Old drill versions just can’t read this meta info but can read new parquet files.
Although by reason of new parquet files will consist correct DATE values, old drill versions,
surely, will read these DATE values incorrectly from new parquet files.

> Parquet File : Date is stored wrongly
> -------------------------------------
>
>                 Key: DRILL-4203
>                 URL: https://issues.apache.org/jira/browse/DRILL-4203
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Stéphane Trou
>            Assignee: Vitalii Diravka
>            Priority: Critical
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  Spark, 
all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) as epoch_date
from dfs.tmp.`date_parquet.csv`;
> +--------+-------------+
> |  name  | epoch_date  |
> +--------+-------------+
> | Epoch  | 1970-01-01  |
> +--------+-------------+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select columns[0] as
name, cast(columns[1] as date) as epoch_date from dfs.tmp.`date_parquet.csv`;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 1                          |
> +-----------+----------------------------+
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date],
epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:        file:/tmp/buggy_parquet/0_0_0.parquet 
> creator:     parquet-mr version 1.8.1-drill-r0 (build 6b605a4ea05b66e1a6bf843353abcb4834a4ced8)

> extra:       drill.version = 1.4.0 
> file schema: root 
> --------------------------------------------------------------------------------
> name:        OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> --------------------------------------------------------------------------------
> name:         BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message