hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Gutman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5083) Group by ignored when group by column is a partition column
Date Wed, 14 Aug 2013 17:50:48 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739951#comment-13739951
] 

Micah Gutman commented on HIVE-5083:
------------------------------------

Finally found the bug by using "show extended <table> <partition spec>" to figure
out that all partitions were pointing to a single file. My selects only looked like they were
working, they were just reading the same data over and over.

Specifically, I created my partitions with "alter table" using multiple partition specs in
the same command. Interestingly, the wiki page help said:

Note that it is proper syntax to have multiple partition_spec in a single ALTER TABLE, but
if you do this in version 0.7, your partitioning scheme will fail. That is, every query specifying
a partition will always use only the first partition.

I am using 0.11, not 0.7. Apparently, 0.11 (and perhaps everything after 0.7?) has this problem.
                
> Group by ignored when group by column is a partition column
> -----------------------------------------------------------
>
>                 Key: HIVE-5083
>                 URL: https://issues.apache.org/jira/browse/HIVE-5083
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 0.11.0
>         Environment: linux
>            Reporter: Micah Gutman
>
> I have an external table X with partition date (a string YYYYMMDD):
> select X.date, count(*) from X group by X.date
> Rather then get a count breakdown by date, I get a single row returned with the count
for the entire table. The "date" column returned in my single row appears to be the last partition
in the table.
> Note results appear as expected if I select an arbitrary "real" column from my table:
> select X.foo, count(*) from X group by X.foo 
> correctly gives me a single row per value of X.foo.
> Also, my query works fine when I use the date column in the "where" clause, so the partition
does seem to be working.
> select X.date, count(*) from X where X.date = "20130101"
> correctly gives me a single row with the count for the date 20130101.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message