hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@hortonworks.com>
Subject Re: Performance: hive+hbase integration query against the row_key
Date Wed, 12 Sep 2012 01:20:34 GMT
 
On Sep 11, 2012, at 7:00 AM, bharath vissapragada wrote:

> Hey,
> 
> Hive does all kinds of parsing , metadata lookups, query tree building and stuff before
executing the query. Not sure if this all was included in those 36 seconds ! 
> 
> Also what hive does is, it builds a scan object with ranges based on predicates (and
mappers too ) on key column and not a direct "get" call as in hbase shell. This might incur
some overhead too!

Since Hive does this in a MapReduce job it definitely incurs overhead.  It does not run directly
against HBase as you might wish it did here.

Alan.

> 
> On Tue, Sep 11, 2012 at 7:10 PM, Shengjie Min <kelvin.msj@gmail.com> wrote:
> Hi,
> 
> I am trying to get hive working on top of my hbase table following the guide below:
> https://cwiki.apache.org/Hive/hbaseintegration.html
> 
> CREATE EXTERNAL TABLE hive_hbase_test (key string, a string, b string, c string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES
> ("hbase.columns.mapping"=":key,cf:a,cf:b,cf:c") TBLPROPERTIES ("hbase.table.name"="test");
> 
> this hive table creation makes my mapping roughly look like this:
> 
> hive_hbase_test  VS   test
> Hive key  -   hbase row_key
> Hive column a -  hbase cf:a
> Hive column b  -  hbase cf:b
> Hive column c  -  hbase cf:c
> 
> From my understanding on how HBaseStorageHandler works, it's supposed to take advantage
of the hbase row_key index as much as possible. So I would expect, 
> 
> 1. if you do a hive query against the row key like "select * from hive_hbase_test where
key='blabla'", this would utilize the hbase row_key index which give you very quick nearly
real-time response just like hbase does.
> 
> 2. of coz, if you do a hive query against a column like "select * from hive_hbase_test
where a='blabla'", in this case, it queries against a specific column, it probably uses mapred
because there is nothing from Hbase side can be utilized.
> 
> From my test, query 1 doesn't seem fast at all, still taking ages, so 
> select * from hive_hbase_test where key='blabla'   36secs
> vs
> get 'test', 'blabla'      less than 1 sec
> still shows a huge difference.
> 
> Anybody has tried this before? Is there anyway I can do sort of query plan analysis against
hive query? or I am not mapping hive table against hbase table correctly?
> 
> -- 
> All the best,
> Shengjie Min
> 
> 
> 
> 
> -- 
> Regards,
> Bharath .V
> w:http://researchweb.iiit.ac.in/~bharath.v


Mime
View raw message