hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jtay...@salesforce.com>
Subject Re: Very poor read performance with composite keys in hbase
Date Tue, 30 Apr 2013 21:15:38 GMT
Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? It'll use all of the
parts of your row key and depending on how much data you're returning back to the client,
will query over 10 million row in seconds.

James
@JamesPlusPlus
http://phoenix-hbase.blogspot.com

On Apr 30, 2013, at 1:59 PM, "kulkarni.swarnim@gmail.com" <kulkarni.swarnim@gmail.com>
wrote:

> That depends on how dynamic your data is. If it is pretty static, you can
> also consider using something like Create Table As Select (CTAS) to create
> a snapshot of your data to HDFS and then run queries on top of that data.
> 
> So your query might become something like:
> 
> create table my_table as select * from event where key.name=’Signup’ and
> key.dateCreated=’2013-03-06 16:39:55.353’ and key.uid=’7af4c330-5988-4255-
> 9250-924ce5864e3bf’;
> 
> Since your data is now in HDFS, this should give you a considerable
> performance boost.
> 
> 
> On Tue, Apr 30, 2013 at 3:00 PM, Rupinder Singh <rsingh@care.com> wrote:
> 
>> Swarnim,****
>> 
>> ** **
>> 
>> Thanks. So this means custom map reduce is the viable option when working
>> with hbase tables having composite keys, since it allows to set the start
>> and stop keys. Hive+Hbase combination is out.****
>> 
>> ** **
>> 
>> Regards****
>> 
>> Rupinder****
>> 
>> ** **
>> 
>> *From:* kulkarni.swarnim@gmail.com [mailto:kulkarni.swarnim@gmail.com]
>> *Sent:* Wednesday, May 01, 2013 12:17 AM
>> 
>> *To:* user@hive.apache.org
>> *Cc:* user@hbase.apache.org
>> *Subject:* Re: Very poor read performance with composite keys in hbase****
>> 
>> ** **
>> 
>> Rupinder,****
>> 
>> ** **
>> 
>> Hive supports a filter pushdown[1] which means that the predicates in the
>> where clause are pushed down to the storage handler level where either they
>> get handled by the storage handler or delegated to hive if they cannot
>> handle them. As of now, the HBaseStorageHandler only supports primitive
>> types. So when you use strings as keys, behind the scenes they get
>> converted to start and stop keys and restrict the hbase scan. This does not
>> happen for structs. Hence you see a full table scan causing bad performance.
>> ****
>> 
>> ** **
>> 
>> [1] https://cwiki.apache.org/Hive/filterpushdowndev.html****
>> 
>> ** **
>> 
>> On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian <
>> Sanjay.Subramanian@wizecommerce.com> wrote:****
>> 
>> My experience with hive + hbase has been about 8x slower on an average. So
>> I went ahead with hive only option.
>> 
>> Sent from my iPhone****
>> 
>> 
>> On Apr 30, 2013, at 11:19 PM, "Rupinder Singh" <rsingh@care.com> wrote:***
>> *
>> 
>> Hi,****
>> 
>> ****
>> 
>> I have an hbase cluster where I have a table with a composite key. I map
>> this table to a Hive external table using which I insert/select data
>> into/from this table:****
>> 
>> CREATE EXTERNAL TABLE event(key
>> struct<name:string,dateCreated:string,uid:string>, {more columns here})***
>> *
>> 
>> ROW FORMAT DELIMITED****
>> 
>> COLLECTION ITEMS TERMINATED BY '~'****
>> 
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'****
>> 
>> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")***
>> *
>> 
>> TBLPROPERTIES ("hbase.table.name" = "event");****
>> 
>> ****
>> 
>> The table has about 10 million rows. When I do a select * using all 3
>> components of the key, essentially selecting just 1 row, the response time
>> is almost 700 sec, which seems pretty bad.****
>> 
>> ****
>> 
>> For comparison purpose, I created another table with a simple string key,
>> and the rest of the columns etc same. The key is a string UUID. Table has
>> same number of column families and same number of rows.****
>> 
>> CREATE EXTERNAL TABLE test_event(key string, blah blah…..****
>> 
>> TBLPROPERTIES ("hbase.table.name" = "test_event");****
>> 
>> ****
>> 
>> When I select a single row from this table by doing select * where
>> key=’something’, the response time is 35 sec.****
>> 
>> ****
>> 
>> This seems to indicate that in case of composite keys, there is a full
>> table scan happening.  This seems weird.****
>> 
>> ****
>> 
>> What am I missing here? Is there something special I need to do to get
>> good read performance if I am using composite keys ?****
>> 
>> Insert performance in both cases is comparable and is as per expectation.*
>> ***
>> 
>> ****
>> 
>> Any help is appreciated.****
>> 
>> Here is the env spec:****
>> 
>> ****
>> 
>> Amazon EMR****
>> 
>> Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GHz each.
>> Master 7.5 GB RAM, 2 CPUs of 2.2 GHz each****
>> 
>> Hive Cluster – 3 core nodes 3.75 GB RAM each, 1 CPU of 1.8 GHz. Master
>> 3.75 GB RAM, 1 CPU of 1.8 GHz****
>> 
>> ****
>> 
>> Thanks****
>> 
>> Rupinder****
>> 
>> ** **
>> 
>> ** **
>> 
>> This email is intended for the person(s) to whom it is addressed and may
>> contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized
>> use, distribution, copying, or disclosure by any person other than the
>> addressee(s) is strictly prohibited. If you have received this email in
>> error, please notify the sender immediately by return email and delete the
>> message and any attachments from your system.****
>> 
>> ** **
>> 
>> ** **
>> 
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.*
>> ***
>> 
>> 
>> 
>> ****
>> 
>> ** **
>> 
>> --
>> Swarnim ****
>> 
> 
> 
> 
> -- 
> Swarnim
Mime
View raw message