hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Bian <weidong....@gmail.com>
Subject Re: About performance issue of Hive/HBase vs Hive/HDFS
Date Wed, 21 Dec 2011 13:32:03 GMT
Hi Michel,
Maybe I missed something, but that's what was said in those two posts and
also the results I've got so far when I was doing my own tests.
So as for tuning HBase, after ensuring data locality, using scanner caching
and turning off block caching, what are other configs I should pay
attention to, any tips?
Yeah,I'm happy to give snappy a shot.


On Wed, Dec 21, 2011 at 8:52 PM, Michel Segel <michael_segel@hotmail.com>wrote:

> Ok... Just my random thoughts...
> There definitely is overhead in HBase that doesn't exist when you are
> doing direct access against a hive table. 4 to 5 times slower? I'd question
> how you tuned your HBase.
> Having said that, I would imagine that there are still some potential
> improvements that could be done on hive to work better w HBase.
> Also why LZO and not Snappy?
> Sent from a remote device. Please excuse any typos...
> Mike Segel
> On Dec 21, 2011, at 1:14 AM, Bruce Bian <weidong.ban@gmail.com> wrote:
> > Hi there,
> > After I read these two posts on the mailing list
> >
> http://search-hadoop.com/m/nVaw59rFlY1/Performance+between+Hive+queries+vs.+Hive+over+HBase+queries&subj=Performance+between+Hive+queries+vs+Hive+over+HBase+queries
> >
> http://search-hadoop.com/m/X1rzQ1QDSaf2/Hive%252BHBase+performance+is+much+poorer+than+Hive%252BHDFS&subj=Hive+HBase+performance+is+much+poorer+than+Hive+HDFS
> > Seems like a 4~5X performance downgrade of Hive/HBase vs Hive/HDFS is
> > expected due to hbase built another layer on top of HDFS. If this is the
> > issue here, is it possible to bypass the HBase layer to read the HFiles
> > stored on HDFS directly?
> > Another possibility maybe the fact that for the same table, the storage
> is
> > much larger in HBase(around 5X in my test case, both uncompressed)than in
> > Hive, as hbase stores each KV pair for one column which causes the key to
> > be repeated several times. But after I tried compress the Hbase table
> using
> > LZO(now nearly the same as in hive uncompressed table), there's no
> > performance gain for queries like select count(*) from xtable;
> > Is there anyone working on this?Not sure whether I should put this post
> to
> > Hive's mailing list but there seems to be no progress on issues like
> > https://issues.apache.org/jira/browse/HIVE-1231
> >
> > Regards,
> > Bruce

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message