hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6590) Hive does not work properly with boolean partition columns (wrong results and inserts to incorrect HDFS path)
Date Wed, 15 Feb 2017 19:10:41 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868385#comment-15868385
] 

Jason Dere commented on HIVE-6590:
----------------------------------

Wow, I had no idea PrimitiveObjectInpsectorUtils.getBoolean() treated all non-empty strings
as TRUE.
The thing about changing this behavior is that it has ramifications beyond just the partition
columns, and in fact even UDFToBoolean has the same behavior.
cc [~ashutoshc] [~alangates] to see if they have any opinions here.

Where does the partitioning code actually call PrimitiveObjectInpsectorUtils.getBoolean()
to convert the string value to boolean? Wondering if it is possible to special case boolean
behavior to not use PrimitiveObjectInpsectorUtils.getBoolean() during partitioning.

> Hive does not work properly with boolean partition columns (wrong results and inserts
to incorrect HDFS path)
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-6590
>                 URL: https://issues.apache.org/jira/browse/HIVE-6590
>             Project: Hive
>          Issue Type: Bug
>          Components: Database/Schema, Metastore
>    Affects Versions: 0.10.0
>            Reporter: Lenni Kuff
>            Assignee: Zoltan Haindrich
>         Attachments: HIVE-6590.1.patch, HIVE-6590.2.patch, HIVE-6590.3.patch
>
>
> Hive does not work properly with boolean partition columns. Queries return wrong results
and also insert to incorrect HDFS paths.
> {code}
> create table bool_part(int_col int) partitioned by(bool_col boolean);
> # This works, creating 3 unique partitions!
> ALTER TABLE bool_table ADD PARTITION (bool_col=FALSE);
> ALTER TABLE bool_table ADD PARTITION (bool_col=false);
> ALTER TABLE bool_table ADD PARTITION (bool_col=False);
> {code}
> The first problem is that Hive cannot filter on a bool partition key column. "select
* from bool_part" returns the correct results, but if you apply a filter on the bool partition
key column hive won't return any results.
> The second problem is that Hive seems to just call "toString()" on the boolean literal
value. This means you can end up with multiple partitions (FALSE, false, FaLSE, etc) mapping
to the literal value 'FALSE'. For example, if you can add three partition in have for the
same logic value "false" doing:
> ALTER TABLE bool_table ADD PARTITION (bool_col=FALSE) -> /test-warehouse/bool_table/bool_col=FALSE/
> ALTER TABLE bool_table ADD PARTITION (bool_col=false) -> /test-warehouse/bool_table/bool_col=false/
> ALTER TABLE bool_table ADD PARTITION (bool_col=False) -> /test-warehouse/bool_table/bool_col=False/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message