drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deneche A. Hakim (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-2296) parquet reads CHAR field as VARCHAR
Date Tue, 24 Feb 2015 23:03:06 GMT
Deneche A. Hakim created DRILL-2296:
---------------------------------------

             Summary: parquet reads CHAR field as VARCHAR
                 Key: DRILL-2296
                 URL: https://issues.apache.org/jira/browse/DRILL-2296
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet
    Affects Versions: 0.7.0
            Reporter: Deneche A. Hakim
            Assignee: Steven Phillips
             Fix For: Future


I have the following {{test_char.json}} file:
{code}
{ "a": "aaa" }
{ "a": "bbb" }
{ "a": "ccc" }
{code}

when creating a parquet file from Drill using the following query:
{noformat}
create table dfs.tmp.`test_char` as select cast(a as char(10)) char_col, cast(a as varchar(10))
varchar_col from dfs.data.`test_char.json`;
{noformat}

Both CHAR and VARCHAR values are saved as BINARY with converted type UTF8, like you can see
from the output of the {{parquet tools}}:
{noformat}
creator:     parquet-mr 

file schema: root 
-----------------------------------------------------------------------------------------------------
char_col:    OPTIONAL BINARY O:UTF8 R:0 D:1
varchar_col: OPTIONAL BINARY O:UTF8 R:0 D:1

row group 1: RC:3 TS:116 
-----------------------------------------------------------------------------------------------------
char_col:     BINARY SNAPPY DO:0 FPO:4 SZ:60/58/0.97 VC:3 ENC:BIT_PACKED,PLAIN,RLE
varchar_col:  BINARY SNAPPY DO:0 FPO:64 SZ:60/58/0.97 VC:3 ENC:BIT_PACKED,PLAIN,RLE
{noformat}

when querying the file, both fields are read as VARCHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message