hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pala M Muthaia (JIRA)" <>
Subject [jira] [Commented] (HIVE-6028) Partition predicate literals are not interpreted correctly.
Date Fri, 13 Dec 2013 21:36:07 GMT


Pala M Muthaia commented on HIVE-6028:

Xuefu, you are right that using string literal is the correct way. 

I suppose previously implicit conversion was actually supported in some sense, because the
literal 01 was actually treated as string. However, in 0.12, even though hour partition column
is STRING, when i specify hour=01, the literal 01 is not converted implicitly into string,
but instead is treated as int.

i digged into this a bit more and suspect that this change in behavior is related to commit
for HIVE-2702, which adds support for integral partition columns. Previously, all partition
filter value literals were always treated as string. Now, both integral and string types are

I think the best fix would be to support implicit conversion behavior again. At least, a type
check and subsequent error should be thrown, so that when user specifies hour=01, query fails
and user can fix his/her query, though this will be more disruptive change for end users.

> Partition predicate literals are not interpreted correctly.
> -----------------------------------------------------------
>                 Key: HIVE-6028
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Pala M Muthaia
>         Attachments: Hive-6028-explain-plan.txt
> When parsing/analyzing query, hive treats partition predicate value as int instead of
string. This breaks down and leads to incorrect result when the partition predicate value
starts with int 0, e.g: hour=00, hour=05 etc.
> The following repro illustrates the bug:
> -- create test table and partition, populate with some data
> create table test_partition_pred(col1 int) partitioned by (hour STRING);
> insert into table test_partition_pred partition (hour=00) select 21 FROM  some_table
limit 1;
> -- this query returns incorrect results, i.e. just empty set.
> select * from test_partition_pred where hour=00;
> OK
> -- this query returns correct result. Note predicate value is string literal
> select * from test_partition_pred where hour='00';
> OK
> 21	00
> explain plan illustrates how the query was interpreted. Particularly the partition predicate
is pushed down as regular filter clause, with hour=0 as predicate. See attached explain plan
> Note:
> 1. The type of the partition column is defined as string, not int.
> 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
> 3. Not an issue when the partition value starts with integer other than 0, e.g hour=10,
hour=11 etc.
> 4. As seen above, workaround is to use string literal hour='00' etc.
> This should not be too bad if in the failing case hive complains that partition hour=0
is not found, or complains literal type doesn't match column type. Instead hive silently pushes
it down as filter clause, and query succeeds with empty set as result.
> We found this out in our production tables partitioned by hour, only a few days after
it started occurring, when there were empty data sets for partitions hour=00 to hour=09.

This message was sent by Atlassian JIRA

View raw message