hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupinder Singh <rsi...@care.com>
Subject Select queries on hbase table with composite key return wrong number of rows
Date Tue, 19 Mar 2013 07:51:51 GMT
I have an hbase table created as follows:
create 'event', {NAME => 'm', VERSIONS => 1}, {NAME => 'e', VERSIONS => 1}

I have a hive table mapped to the hbase table defined as follows:
CREATE EXTERNAL TABLE h_event(key struct<name:string,dateCreated:string,userId:string>,
dummy string, dummy2 string)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '~'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,m:dum,e:dum2")
TBLPROPERTIES ("hbase.table.name" = "event");

(You may notice that this is as per the workaround suggested in https://issues.apache.org/jira/browse/HIVE-2599
by Swarnim Kulkarni)

I am loading data in the table from a text file by doing an 'INSERT OVERWRITE TABLE h_event
...' statement.
Data is loaded successfully without errors and a count 'event' done in hbase shell returns
the correct number of expected records.

Environment:
I am on hive 0.8.1 and hbase 0.92.0, both running on Debian/Squeeze.

Problem:
"select count(*) from h_event where key.name='abc' " returns incorrect count and is always
less than the actual number of records having key.name='abc' in the table.
"select count(*) , key.name from h_event group by key.name" also returns the wrong counts
for various values of key.name and the subtotals returned by this query do not add up to the
total number of records in the table.

"select * from h_event where key.name='abc' " returns the expected number of rows with the
right keys and data.
"select count(*) from h_event" returns the correct number of rows in the table
"select count(key) from h_event" returns the wrong number, lesser than the actual rows

"select dummy from h_event where key.name='abc' " returns correct rows with right data
"select key.userId from h_event where key.name='abc' " returns wrong(lesser) number of rows

It seems to me that composite keys just don't seem to work in the expected manner; if you
have any of the key columns in the select clause, the returned results are incomplete. Am
I missing something obvious here, or is the only solution to go back to duplicating key columns
again in the table ?

Any help is greatly appreciated.

Thanks
Rupinder



This email is intended for the person(s) to whom it is addressed and may contain information
that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use, distribution, copying, or disclosure
by any person other than the addressee(s) is strictly prohibited. If you have received this
email in error, please notify the sender immediately by return email and delete the message
and any attachments from your system.
Mime
View raw message