hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dong Chen (JIRA)" <>
Subject [jira] [Commented] (HIVE-10254) Parquet PPD support DECIMAL
Date Tue, 12 May 2015 07:26:01 GMT


Dong Chen commented on HIVE-10254:

After investigating this, I found we might need some changes on Parquet side.

Decimal in Hive is mapped to {{Binary}} in Parquet. When using predicate and statistic to
filter values, comparing Binary values in Parquet cannot reflect the correct relationship
of Decimal values in Hive. This type mapping causes 2 problems:
1. When writing Decimal column, {{Binary.compareTo()}} is used to judge and set the column
statistic (min, max). The generated statistic value is not correct from a Decimal perspective.
2. When reading with Predicate (also Filter), in which the expected Decimal value is converted
to Binary type, {{Binary.compareTo()}} is used to compare the expected value and column statistic
value. They are Binary perspective, and also the result is not right.

*An idea:*
I was thinking whether we could add a customized comparator as an attribute in {{Binary}}
class, and high level user like Hive provides the comparator, since Hive knows how to decode
the binary to Decimal and compare. Then {{Binary.compareTo()}} could be changed to switch
between customized and original comparison method.

Not sure this solution is ok. It has to change Parquet API. 

Any thoughts? Other ideas?

> Parquet PPD support DECIMAL
> ---------------------------
>                 Key: HIVE-10254
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Dong Chen
>            Assignee: Dong Chen

This message was sent by Atlassian JIRA

View raw message