drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Altekruse (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-815) Parquet files created in impala using data from hive tables resulted in incorrect string representation
Date Mon, 09 Jun 2014 21:41:02 GMT

    [ https://issues.apache.org/jira/browse/DRILL-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025680#comment-14025680
] 

Jason Altekruse edited comment on DRILL-815 at 6/9/14 9:40 PM:
---------------------------------------------------------------

Impala does not currently mark columns with the standard parquet meta data to indicate how
data should be read. Instead they are using the hive meta-store to persist this information.
This is against the model of Drill where we are avoiding a meta-store and just allowing users
to point at any file and read it. This means that for now this data must be cast  to varchar
if you want it to be shown as strings. We should talk to Impala about supporting this meta-data
alongside the metastore, as this an issue for all the hadoop projects that want to read impala
produced parquet files.


was (Author: jaltekruse):
Impala does not currently mark columns with the standard parquet meta data to indicate how
data should be read. Instead they are using the hive meta-store to persist this information.
This is against the model of Drill where we are avoiding a meta-store and just allowing users
to point at any file and read it. This means that for now this data must be cast  to varchar
if you want it to be shown as strings. We should talk to Impala about supporting this meta-data
alongside the metastore, as this an issue for all the hadoop project that want to read parquet
produced files.

> Parquet files created in impala using data from hive tables resulted in incorrect string
representation
> -------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-815
>                 URL: https://issues.apache.org/jira/browse/DRILL-815
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Norris Lee
>            Assignee: Jason Altekruse
>
> The parquet file was created by first loading a csv file into a hive table. A parquet
table was then created in impala and data from the hive table was loaded in. The file was
extracted from hdfs to local and placed into drill's dfs.
> The keycolumn column in hive is of type string.
> {code}
> 0: jdbc:drill:schema=hivestg> select * from `dfs`.`/opt/drill/integer.parquet`;
> +------------+------------+
> | keycolumn  |  column1   |
> +------------+------------+
> | [B@7385c043 | 0          |
> | [B@5211a9f5 | 1          |
> | [B@5ad3deb | -1         |
> | [B@30bc1236 | 2          |
> | [B@b4fb039 | 127        |
> | [B@1cba73fc | -128       |
> | [B@1514b420 | 255        |
> | [B@23dabb0 | 128        |
> | [B@1ed2b0f6 | -129       |
> | [B@1a5ff649 | 256        |
> | [B@12224026 | 32767      |
> | [B@6a18817 | -32768     |
> | [B@56eda167 | 65535      |
> | [B@aff9dc7 | -32769     |
> | [B@13cf7975 | 32768      |
> | [B@1a2efa7c | 65536      |
> | [B@23ef052 | 2147483647 |
> | [B@721398a4 | -2147483648 |
> +------------+------------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message