drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deneche A. Hakim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2267) Parquet writer with dictionary encoding results in corrupted varchar columns
Date Fri, 27 Feb 2015 21:16:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340825#comment-14340825
] 

Deneche A. Hakim commented on DRILL-2267:
-----------------------------------------

all unit tests are passing along with functional, customer and tpch100

> Parquet writer with dictionary encoding results in corrupted varchar columns
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-2267
>                 URL: https://issues.apache.org/jira/browse/DRILL-2267
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Ramana Inukonda Nagaraj
>            Assignee: Deneche A. Hakim
>             Fix For: 0.8.0
>
>         Attachments: 0_0_0.parquet, DRILL-2267.1.patch.txt
>
>
> Using CTAS created a parquet file through drill having the varchar datatype.
> Created parquet file looks like this through parquet-tools 
> VARCHAR_col:         OPTIONAL BINARY O:UTF8 R:0 D:1
> VAR16CHAR_col:       OPTIONAL BINARY O:UTF8 R:0 D:1
> VARCHAR_col:          BINARY SNAPPY DO:0 FPO:894307 SZ:16344/231716/14.18 VC:378624 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
> VAR16CHAR_col:        BINARY SNAPPY DO:0 FPO:910651 SZ:25830/381493/14.77 VC:378624 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
> On querying the file several records show up having corrupted data for these fields.
> | VAR16CHAR_col |
> +---------------+
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> |          |
> |           |
> | ��        |
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> |          |
> |           |
> | ��        |
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> |          |
> |           |
> | ��        |
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> |          |
> |           |
> | ��        |
> | ������������  |
> |               |
> | ��������      |
> | �����         |
> | ��   |
> If dictionary encoding is turned off the resultant file can be read without these issues.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message