hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Swarnim Kulkarni (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table
Date Tue, 17 Jun 2014 23:07:16 GMT

     [ https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Swarnim Kulkarni updated HIVE-7248:
-----------------------------------

    Component/s: HBase Handler

> UNION ALL in hive returns incorrect results on Hbase backed table
> -----------------------------------------------------------------
>
>                 Key: HIVE-7248
>                 URL: https://issues.apache.org/jira/browse/HIVE-7248
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>    Affects Versions: 0.12.0, 0.13.0, 0.13.1
>            Reporter: Mala Chikka Kempanna
>
> The issue can be recreated with following steps
> 1) In hbase 
> create 'TABLE_EMP','default' 
> 2) On hive 
> sudo -u hive hive 
> CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME string,CDS_UPDATED_DATE string,CDS_PK
string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping"
= "default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key", "hbase.scan.cache"
= "500", "hbase.scan.cacheblocks" = "false" ) TBLPROPERTIES("hbase.table.name" = "TABLE_EMP",'serialization.null.format'='');

> 3) On hbase insert the following data 
> put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' 
> put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' 
> put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
> put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' 
> put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' 
> put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
> 4) On hive execute the following query 
> hive 
> SELECT * 
> FROM ( 
> SELECT CDS_PK 
> FROM TABLE_EMP 
> WHERE 
> CDS_PK >= '0' 
> AND CDS_PK <= '9' 
> AND CDS_UPDATED_DATE IS NOT NULL 
> UNION ALL SELECT CDS_PK 
> FROM TABLE_EMP 
> WHERE 
> CDS_PK >= 'a' 
> AND CDS_PK <= 'z' 
> AND CDS_UPDATED_DATE IS NOT NULL 
> )t ; 
> 5) Output of the query 
> 1 
> 1 
> 2 
> 2 
> 6) Output of just 
> SELECT CDS_PK 
> FROM TABLE_EMP 
> WHERE 
> CDS_PK >= '0' 
> AND CDS_PK <= '9' 
> AND CDS_UPDATED_DATE IS NOT NULL 
> is 
> 1 
> 2 
> 7) Output of just 
> SELECT CDS_PK 
> FROM TABLE_EMP 
> WHERE 
> CDS_PK >= 'a' 
> AND CDS_PK <= 'z' 
> AND CDS_UPDATED_DATE IS NOT NULL 
> Empty 
> 8) UNION is used to combine the result from multiple SELECT statements into a single
result set. Hive currently only supports UNION ALL (bag union), in which duplicates are not
eliminated 
> Accordingly above query should return output 
> 1 
> 2 
> instead it is giving wrong output 
> 1 
> 1 
> 2 
> 2



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message