hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Subramanian <Sanjay.Subraman...@wizecommerce.com>
Subject Re: Very poor read performance with composite keys in hbase
Date Tue, 30 Apr 2013 18:04:55 GMT
My experience with hive + hbase has been about 8x slower on an average. So I went ahead with
hive only option.

Sent from my iPhone

On Apr 30, 2013, at 11:19 PM, "Rupinder Singh" <rsingh@care.com<mailto:rsingh@care.com>>
wrote:

Hi,

I have an hbase cluster where I have a table with a composite key. I map this table to a Hive
external table using which I insert/select data into/from this table:
CREATE EXTERNAL TABLE event(key struct<name:string,dateCreated:string,uid:string>, {more
columns here})
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '~'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")
TBLPROPERTIES ("hbase.table.name" = "event");

The table has about 10 million rows. When I do a select * using all 3 components of the key,
essentially selecting just 1 row, the response time is almost 700 sec, which seems pretty
bad.

For comparison purpose, I created another table with a simple string key, and the rest of
the columns etc same. The key is a string UUID. Table has same number of column families and
same number of rows.
CREATE EXTERNAL TABLE test_event(key string, blah blah…..
TBLPROPERTIES ("hbase.table.name" = "test_event");

When I select a single row from this table by doing select * where key=’something’, the
response time is 35 sec.

This seems to indicate that in case of composite keys, there is a full table scan happening.
 This seems weird.

What am I missing here? Is there something special I need to do to get good read performance
if I am using composite keys ?
Insert performance in both cases is comparable and is as per expectation.

Any help is appreciated.
Here is the env spec:

Amazon EMR
Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GHz each. Master 7.5 GB RAM,
2 CPUs of 2.2 GHz each
Hive Cluster – 3 core nodes 3.75 GB RAM each, 1 CPU of 1.8 GHz. Master 3.75 GB RAM, 1 CPU
of 1.8 GHz

Thanks
Rupinder



This email is intended for the person(s) to whom it is addressed and may contain information
that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use, distribution, copying, or disclosure
by any person other than the addressee(s) is strictly prohibited. If you have received this
email in error, please notify the sender immediately by return email and delete the message
and any attachments from your system.


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s)
and may contain confidential and privileged information. Any unauthorized review, use, disclosure
or distribution is prohibited. If you are not the intended recipient, please contact the sender
by reply email and destroy all copies of the original message along with any attachments,
from your computer system. If you are the intended recipient, please be advised that the content
of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Mime
View raw message