hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Why HBase integation with Hive makes Hive slow
Date Thu, 01 Aug 2013 19:00:26 GMT
Need to set scanner caching, otherwise each call to next will be an network RTT.

 From: Hao Ren <h.ren@claravista.fr>
To: user@hbase.apache.org 
Sent: Thursday, August 1, 2013 7:45 AM
Subject: Why HBase integation with Hive makes Hive slow


I have a cluster (1 master + 3 slaves) on which there Hive, Hbase, and 

In order to do some daily row-level update routine, we need to integrate 
Hbase with hive, but the performance is not good.

E.g. There are 2 tables in hive,
     hbase_table:  a hbase table created via Hive
     hive_table: a native hive table
  both hold the same data set.

When runing:
     select count(*) from hbase_table; ===> takes 500 s
     select count(*) from hive_table; ===> takes 6 s

I have tried a lot of queries on the two tables. But hbase_table is 
always very slow.

To be claire, I created the hbase_ table as below:

CREATE TABLE hbase_table (
idvisite string,
client_list Array<string>,
nb_client int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = 
TBLPROPERTIES("hbase.table.name" = "table_test")

And my Hbase is on pseudo-distributed mode.

I guess, at the beginning of a hive query execution, hive will load data 
from Hbase, where serde takes a long time.

Could someone tell me how to improve my poor performance ?
Is this cause by my wrongly configured integration ?
Is a fully-distributed mode needed here ?

Thank you in advance for your time.


Hao Ren
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message