drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vitalii Diravka (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly
Date Wed, 05 Oct 2016 18:26:22 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549577#comment-15549577
] 

Vitalii Diravka commented on DRILL-4203:
----------------------------------------

[~rkins] It is right that drill auto correct it. And yes, you are. You are using not that
option. The behaviour of both readers is the same. If you want to disable "auto correction"
you should use the parquet config in the plugin settings. Something like this: {code}  "formats":
{
    "parquet": {
      "type": "parquet",
      "autoCorrectCorruptDates": false
    }{code}
Or you can try to use the next query: {code}select l_shipdate, l_commitdate from table(dfs.`/drill/testdata/parquet_date/dates_nodrillversion/drillgen2_lineitem`
(type => 'parquet', autoCorrectCorruptDates => false)) limit 1;{code}

And it would be good more investigate the possibility to store from drill dates over 9999
years, cause from drill shell I can't got such values: {code}0: jdbc:drill:zk=local> select
TO_DATE(262784904600000) from (VALUES(1));
+-------------+
|   EXPR$0    |
+-------------+
| 297-04-27  |
+-------------+
{code}
But from drill unit test I can do it:
{code}  @Test
  public void myTest() throws Exception {
    String query = "select TO_DATE(262784904600000) from (VALUES(1))";
    setColumnWidths(new int[] {35});
    List<QueryDataBatch> sqlWithResults = testSqlWithResults(query);
    printResult(sqlWithResults);
  }
1 row(s):
--------------------------------------
| EXPR$0<DATE(REQUIRED)>             |
--------------------------------------
| 10297-04-27T22:50:00.000Z          |
--------------------------------------
{code}

> Parquet File : Date is stored wrongly
> -------------------------------------
>
>                 Key: DRILL-4203
>                 URL: https://issues.apache.org/jira/browse/DRILL-4203
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: St├ęphane Trou
>            Assignee: Vitalii Diravka
>            Priority: Critical
>             Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  Spark, 
all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) as epoch_date
from dfs.tmp.`date_parquet.csv`;
> +--------+-------------+
> |  name  | epoch_date  |
> +--------+-------------+
> | Epoch  | 1970-01-01  |
> +--------+-------------+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select columns[0] as
name, cast(columns[1] as date) as epoch_date from dfs.tmp.`date_parquet.csv`;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 1                          |
> +-----------+----------------------------+
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date],
epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:        file:/tmp/buggy_parquet/0_0_0.parquet 
> creator:     parquet-mr version 1.8.1-drill-r0 (build 6b605a4ea05b66e1a6bf843353abcb4834a4ced8)

> extra:       drill.version = 1.4.0 
> file schema: root 
> --------------------------------------------------------------------------------
> name:        OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> --------------------------------------------------------------------------------
> name:         BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message