hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "kulkarni.swarnim@gmail.com" <kulkarni.swar...@gmail.com>
Subject Re: Very poor read performance with composite keys in hbase
Date Tue, 30 Apr 2013 20:56:18 GMT
That depends on how dynamic your data is. If it is pretty static, you can
also consider using something like Create Table As Select (CTAS) to create
a snapshot of your data to HDFS and then run queries on top of that data.

So your query might become something like:

create table my_table as select * from event where key.name=’Signup’ and
key.dateCreated=’2013-03-06 16:39:55.353’ and key.uid=’7af4c330-5988-4255-
9250-924ce5864e3bf’;

Since your data is now in HDFS, this should give you a considerable
performance boost.


On Tue, Apr 30, 2013 at 3:00 PM, Rupinder Singh <rsingh@care.com> wrote:

>  Swarnim,****
>
> ** **
>
> Thanks. So this means custom map reduce is the viable option when working
> with hbase tables having composite keys, since it allows to set the start
> and stop keys. Hive+Hbase combination is out.****
>
> ** **
>
> Regards****
>
> Rupinder****
>
> ** **
>
> *From:* kulkarni.swarnim@gmail.com [mailto:kulkarni.swarnim@gmail.com]
> *Sent:* Wednesday, May 01, 2013 12:17 AM
>
> *To:* user@hive.apache.org
> *Cc:* user@hbase.apache.org
> *Subject:* Re: Very poor read performance with composite keys in hbase****
>
>  ** **
>
> Rupinder,****
>
> ** **
>
> Hive supports a filter pushdown[1] which means that the predicates in the
> where clause are pushed down to the storage handler level where either they
> get handled by the storage handler or delegated to hive if they cannot
> handle them. As of now, the HBaseStorageHandler only supports primitive
> types. So when you use strings as keys, behind the scenes they get
> converted to start and stop keys and restrict the hbase scan. This does not
> happen for structs. Hence you see a full table scan causing bad performance.
> ****
>
> ** **
>
> [1] https://cwiki.apache.org/Hive/filterpushdowndev.html****
>
> ** **
>
> On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian <
> Sanjay.Subramanian@wizecommerce.com> wrote:****
>
> My experience with hive + hbase has been about 8x slower on an average. So
> I went ahead with hive only option.
>
> Sent from my iPhone****
>
>
> On Apr 30, 2013, at 11:19 PM, "Rupinder Singh" <rsingh@care.com> wrote:***
> *
>
>  Hi,****
>
>  ****
>
> I have an hbase cluster where I have a table with a composite key. I map
> this table to a Hive external table using which I insert/select data
> into/from this table:****
>
> CREATE EXTERNAL TABLE event(key
> struct<name:string,dateCreated:string,uid:string>, {more columns here})***
> *
>
> ROW FORMAT DELIMITED****
>
> COLLECTION ITEMS TERMINATED BY '~'****
>
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'****
>
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")***
> *
>
> TBLPROPERTIES ("hbase.table.name" = "event");****
>
>  ****
>
> The table has about 10 million rows. When I do a select * using all 3
> components of the key, essentially selecting just 1 row, the response time
> is almost 700 sec, which seems pretty bad.****
>
>  ****
>
> For comparison purpose, I created another table with a simple string key,
> and the rest of the columns etc same. The key is a string UUID. Table has
> same number of column families and same number of rows.****
>
> CREATE EXTERNAL TABLE test_event(key string, blah blah…..****
>
> TBLPROPERTIES ("hbase.table.name" = "test_event");****
>
>  ****
>
> When I select a single row from this table by doing select * where
> key=’something’, the response time is 35 sec.****
>
>  ****
>
> This seems to indicate that in case of composite keys, there is a full
> table scan happening.  This seems weird.****
>
>  ****
>
> What am I missing here? Is there something special I need to do to get
> good read performance if I am using composite keys ?****
>
> Insert performance in both cases is comparable and is as per expectation.*
> ***
>
>  ****
>
> Any help is appreciated.****
>
> Here is the env spec:****
>
>  ****
>
> Amazon EMR****
>
> Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GHz each.
> Master 7.5 GB RAM, 2 CPUs of 2.2 GHz each****
>
> Hive Cluster – 3 core nodes 3.75 GB RAM each, 1 CPU of 1.8 GHz. Master
> 3.75 GB RAM, 1 CPU of 1.8 GHz****
>
>  ****
>
> Thanks****
>
> Rupinder****
>
> ** **
>
> ** **
>
> This email is intended for the person(s) to whom it is addressed and may
> contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized
> use, distribution, copying, or disclosure by any person other than the
> addressee(s) is strictly prohibited. If you have received this email in
> error, please notify the sender immediately by return email and delete the
> message and any attachments from your system.****
>
> ** **
>
> ** **
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.*
> ***
>
>
>
> ****
>
> ** **
>
> --
> Swarnim ****
>



-- 
Swarnim

Mime
View raw message