hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dong Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10254) Parquet PPD support DECIMAL
Date Tue, 12 May 2015 07:26:01 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539434#comment-14539434
] 

Dong Chen commented on HIVE-10254:
----------------------------------

After investigating this, I found we might need some changes on Parquet side.

*Problem:*
Decimal in Hive is mapped to {{Binary}} in Parquet. When using predicate and statistic to
filter values, comparing Binary values in Parquet cannot reflect the correct relationship
of Decimal values in Hive. This type mapping causes 2 problems:
1. When writing Decimal column, {{Binary.compareTo()}} is used to judge and set the column
statistic (min, max). The generated statistic value is not correct from a Decimal perspective.
2. When reading with Predicate (also Filter), in which the expected Decimal value is converted
to Binary type, {{Binary.compareTo()}} is used to compare the expected value and column statistic
value. They are Binary perspective, and also the result is not right.

*An idea:*
I was thinking whether we could add a customized comparator as an attribute in {{Binary}}
class, and high level user like Hive provides the comparator, since Hive knows how to decode
the binary to Decimal and compare. Then {{Binary.compareTo()}} could be changed to switch
between customized and original comparison method.

Not sure this solution is ok. It has to change Parquet API. 

Any thoughts? Other ideas?



> Parquet PPD support DECIMAL
> ---------------------------
>
>                 Key: HIVE-10254
>                 URL: https://issues.apache.org/jira/browse/HIVE-10254
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Dong Chen
>            Assignee: Dong Chen
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message