hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Capriolo (JIRA)" <>
Subject [jira] [Commented] (HIVE-4891) Distinct includes duplicate records
Date Fri, 19 Jul 2013 14:50:48 GMT


Edward Capriolo commented on HIVE-4891:

I suspect there are some funny issues when partitions are not the same format, this is a newish
feature and we may not have as much coverage as we need.
> Distinct includes duplicate records
> -----------------------------------
>                 Key: HIVE-4891
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats, HiveServer2, Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Fengdong Yu
> I have two partitions, one is sequence file, another is RCFile, but they are the same
data(only different file format).
> I have the following SQL:
> {code}
> select distinct uid from test where (dt ='20130718' or dt ='20130718_1') and cur_url
like '';
> {code}
> dt ='20130718' is sequence file,(default input format, which specified when create table)
> dt ='20130718_1' is RCFile.
> {code}
> ALTER TABLE test ADD IF NOT EXISTS PARTITION (dt='20130718_1') LOCATION '/user/test/test-data'
> {code}
> but there are duplicate recoreds in the result.
> If two partitions with the same input format, then there are no duplicate records.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message