hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Gutman (JIRA)" <>
Subject [jira] [Commented] (HIVE-5083) Group by ignored when group by column is a partition column
Date Wed, 14 Aug 2013 17:50:48 GMT


Micah Gutman commented on HIVE-5083:

Finally found the bug by using "show extended <table> <partition spec>" to figure
out that all partitions were pointing to a single file. My selects only looked like they were
working, they were just reading the same data over and over.

Specifically, I created my partitions with "alter table" using multiple partition specs in
the same command. Interestingly, the wiki page help said:

Note that it is proper syntax to have multiple partition_spec in a single ALTER TABLE, but
if you do this in version 0.7, your partitioning scheme will fail. That is, every query specifying
a partition will always use only the first partition.

I am using 0.11, not 0.7. Apparently, 0.11 (and perhaps everything after 0.7?) has this problem.
> Group by ignored when group by column is a partition column
> -----------------------------------------------------------
>                 Key: HIVE-5083
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 0.11.0
>         Environment: linux
>            Reporter: Micah Gutman
> I have an external table X with partition date (a string YYYYMMDD):
> select, count(*) from X group by
> Rather then get a count breakdown by date, I get a single row returned with the count
for the entire table. The "date" column returned in my single row appears to be the last partition
in the table.
> Note results appear as expected if I select an arbitrary "real" column from my table:
> select, count(*) from X group by 
> correctly gives me a single row per value of
> Also, my query works fine when I use the date column in the "where" clause, so the partition
does seem to be working.
> select, count(*) from X where = "20130101"
> correctly gives me a single row with the count for the date 20130101.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message