hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naveen Gangam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table
Date Mon, 06 Aug 2018 05:41:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569758#comment-16569758
] 

Naveen Gangam commented on HIVE-7248:
-------------------------------------

[~navis] This still seems to be an issue (https://issues.apache.org/jira/browse/HIVE-20206)
. Was there a reason this was never committed to the source? if there isnt, should we revive
this issue and rebase the patch? Thanks

> UNION ALL in hive returns incorrect results on Hbase backed table
> -----------------------------------------------------------------
>
>                 Key: HIVE-7248
>                 URL: https://issues.apache.org/jira/browse/HIVE-7248
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>    Affects Versions: 0.12.0, 0.13.0, 0.13.1
>            Reporter: Mala Chikka Kempanna
>            Assignee: Navis
>            Priority: Major
>         Attachments: HIVE-7248.1.patch.txt, HIVE-7248.2.patch.txt, HIVE-7248.3.patch.txt
>
>
> The issue can be recreated with following steps
> 1) In hbase 
> create 'TABLE_EMP','default' 
> 2) On hive 
> sudo -u hive hive 
> CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME string,CDS_UPDATED_DATE string,CDS_PK
string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping"
= "default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key", "hbase.scan.cache"
= "500", "hbase.scan.cacheblocks" = "false" ) TBLPROPERTIES("hbase.table.name" = "TABLE_EMP",'serialization.null.format'='');

> 3) On hbase insert the following data 
> put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' 
> put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' 
> put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
> put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' 
> put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' 
> put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
> 4) On hive execute the following query 
> hive 
> SELECT * 
> FROM ( 
> SELECT CDS_PK 
> FROM TABLE_EMP 
> WHERE 
> CDS_PK >= '0' 
> AND CDS_PK <= '9' 
> AND CDS_UPDATED_DATE IS NOT NULL 
> UNION ALL SELECT CDS_PK 
> FROM TABLE_EMP 
> WHERE 
> CDS_PK >= 'a' 
> AND CDS_PK <= 'z' 
> AND CDS_UPDATED_DATE IS NOT NULL 
> )t ; 
> 5) Output of the query 
> 1 
> 1 
> 2 
> 2 
> 6) Output of just 
> SELECT CDS_PK 
> FROM TABLE_EMP 
> WHERE 
> CDS_PK >= '0' 
> AND CDS_PK <= '9' 
> AND CDS_UPDATED_DATE IS NOT NULL 
> is 
> 1 
> 2 
> 7) Output of just 
> SELECT CDS_PK 
> FROM TABLE_EMP 
> WHERE 
> CDS_PK >= 'a' 
> AND CDS_PK <= 'z' 
> AND CDS_UPDATED_DATE IS NOT NULL 
> Empty 
> 8) UNION is used to combine the result from multiple SELECT statements into a single
result set. Hive currently only supports UNION ALL (bag union), in which duplicates are not
eliminated 
> Accordingly above query should return output 
> 1 
> 2 
> instead it is giving wrong output 
> 1 
> 1 
> 2 
> 2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message