hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Bian <weidong....@gmail.com>
Subject About performance issue of Hive/HBase vs Hive/HDFS
Date Wed, 21 Dec 2011 07:14:57 GMT
Hi there,
After I read these two posts on the mailing list
http://search-hadoop.com/m/nVaw59rFlY1/Performance+between+Hive+queries+vs.+Hive+over+HBase+queries&subj=Performance+between+Hive+queries+vs+Hive+over+HBase+queries
http://search-hadoop.com/m/X1rzQ1QDSaf2/Hive%252BHBase+performance+is+much+poorer+than+Hive%252BHDFS&subj=Hive+HBase+performance+is+much+poorer+than+Hive+HDFS
Seems like a 4~5X performance downgrade of Hive/HBase vs Hive/HDFS is
expected due to hbase built another layer on top of HDFS. If this is the
issue here, is it possible to bypass the HBase layer to read the HFiles
stored on HDFS directly?
Another possibility maybe the fact that for the same table, the storage is
much larger in HBase(around 5X in my test case, both uncompressed)than in
Hive, as hbase stores each KV pair for one column which causes the key to
be repeated several times. But after I tried compress the Hbase table using
LZO(now nearly the same as in hive uncompressed table), there's no
performance gain for queries like select count(*) from xtable;
Is there anyone working on this?Not sure whether I should put this post to
Hive's mailing list but there seems to be no progress on issues like
https://issues.apache.org/jira/browse/HIVE-1231

Regards,
Bruce

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message