hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navis류승우 <navis....@nexr.com>
Subject Re: Very poor read performance with composite keys in hbase
Date Thu, 02 May 2013 01:39:47 GMT
Currently, hive storage handler reads rows one by one.

https://issues.apache.org/jira/browse/HIVE-3603 is for setting cache
size, which is not yet fixed.

2013/5/1 kulkarni.swarnim@gmail.com <kulkarni.swarnim@gmail.com>:
> That depends on how dynamic your data is. If it is pretty static, you can
> also consider using something like Create Table As Select (CTAS) to create a
> snapshot of your data to HDFS and then run queries on top of that data.
>
> So your query might become something like:
>
> create table my_table as select * from event where key.name=’Signup’ and
> key.dateCreated=’2013-03-06 16:39:55.353’ and
> key.uid=’7af4c330-5988-4255-9250-924ce5864e3bf’;
>
> Since your data is now in HDFS, this should give you a considerable
> performance boost.
>
>
> On Tue, Apr 30, 2013 at 3:00 PM, Rupinder Singh <rsingh@care.com> wrote:
>>
>> Swarnim,
>>
>>
>>
>> Thanks. So this means custom map reduce is the viable option when working
>> with hbase tables having composite keys, since it allows to set the start
>> and stop keys. Hive+Hbase combination is out.
>>
>>
>>
>> Regards
>>
>> Rupinder
>>
>>
>>
>> From: kulkarni.swarnim@gmail.com [mailto:kulkarni.swarnim@gmail.com]
>> Sent: Wednesday, May 01, 2013 12:17 AM
>>
>>
>> To: user@hive.apache.org
>> Cc: user@hbase.apache.org
>> Subject: Re: Very poor read performance with composite keys in hbase
>>
>>
>>
>> Rupinder,
>>
>>
>>
>> Hive supports a filter pushdown[1] which means that the predicates in the
>> where clause are pushed down to the storage handler level where either they
>> get handled by the storage handler or delegated to hive if they cannot
>> handle them. As of now, the HBaseStorageHandler only supports primitive
>> types. So when you use strings as keys, behind the scenes they get converted
>> to start and stop keys and restrict the hbase scan. This does not happen for
>> structs. Hence you see a full table scan causing bad performance.
>>
>>
>>
>> [1] https://cwiki.apache.org/Hive/filterpushdowndev.html
>>
>>
>>
>> On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian
>> <Sanjay.Subramanian@wizecommerce.com> wrote:
>>
>> My experience with hive + hbase has been about 8x slower on an average. So
>> I went ahead with hive only option.
>>
>> Sent from my iPhone
>>
>>
>> On Apr 30, 2013, at 11:19 PM, "Rupinder Singh" <rsingh@care.com> wrote:
>>
>> Hi,
>>
>>
>>
>> I have an hbase cluster where I have a table with a composite key. I map
>> this table to a Hive external table using which I insert/select data
>> into/from this table:
>>
>> CREATE EXTERNAL TABLE event(key
>> struct<name:string,dateCreated:string,uid:string>, {more columns here})
>>
>> ROW FORMAT DELIMITED
>>
>> COLLECTION ITEMS TERMINATED BY '~'
>>
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>>
>> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")
>>
>> TBLPROPERTIES ("hbase.table.name" = "event");
>>
>>
>>
>> The table has about 10 million rows. When I do a select * using all 3
>> components of the key, essentially selecting just 1 row, the response time
>> is almost 700 sec, which seems pretty bad.
>>
>>
>>
>> For comparison purpose, I created another table with a simple string key,
>> and the rest of the columns etc same. The key is a string UUID. Table has
>> same number of column families and same number of rows.
>>
>> CREATE EXTERNAL TABLE test_event(key string, blah blah…..
>>
>> TBLPROPERTIES ("hbase.table.name" = "test_event");
>>
>>
>>
>> When I select a single row from this table by doing select * where
>> key=’something’, the response time is 35 sec.
>>
>>
>>
>> This seems to indicate that in case of composite keys, there is a full
>> table scan happening.  This seems weird.
>>
>>
>>
>> What am I missing here? Is there something special I need to do to get
>> good read performance if I am using composite keys ?
>>
>> Insert performance in both cases is comparable and is as per expectation.
>>
>>
>>
>> Any help is appreciated.
>>
>> Here is the env spec:
>>
>>
>>
>> Amazon EMR
>>
>> Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GHz each.
>> Master 7.5 GB RAM, 2 CPUs of 2.2 GHz each
>>
>> Hive Cluster – 3 core nodes 3.75 GB RAM each, 1 CPU of 1.8 GHz. Master
>> 3.75 GB RAM, 1 CPU of 1.8 GHz
>>
>>
>>
>> Thanks
>>
>> Rupinder
>>
>>
>>
>>
>>
>> This email is intended for the person(s) to whom it is addressed and may
>> contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized
>> use, distribution, copying, or disclosure by any person other than the
>> addressee(s) is strictly prohibited. If you have received this email in
>> error, please notify the sender immediately by return email and delete the
>> message and any attachments from your system.
>>
>>
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the sender
>> by reply email and destroy all copies of the original message along with any
>> attachments, from your computer system. If you are the intended recipient,
>> please be advised that the content of this message is subject to access,
>> review and disclosure by the sender's Email System Administrator.
>>
>>
>>
>>
>>
>> --
>> Swarnim
>
>
>
>
> --
> Swarnim

Mime
View raw message