hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Hive + Hbase scanning performance
Date Mon, 10 Feb 2014 23:03:57 GMT
I do not know much about Hive. Sorry.

It all depends on where Hive creates the ClientScanner object. Normally you would call HTable.getScanner(Scan)
in order to get a scanner.
ClientScanner checks whether the scannerCaching on the passed Scan object is > 0, if so
it takes that, otherwise it looks into the environment Configuration for hbase.client.scanner.caching
and defaults to 1 if not set.

So it all depends on what Configuration Hive sees.

-- Lars

 From: java8964 <java8964@hotmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org> 
Sent: Monday, February 10, 2014 2:33 PM
Subject: RE: Hive + Hbase scanning performance

Hi, Lars:
Is there any logging I can enable to verify this?
I am not questioning your knowledge, but from my performance testing, I really didn't see
any result.
I read org.apache.hadoop.hbase.client.Scan of Hbase 0.94.3 version, I didn't see any logging
I can use to check if the cache value is being set on what value.
From the Hive code org.apache.hadoop.hive.hbase.HiveBaseTableInputFormat, it will create a
Scan object with default caching value (-1), and set this scan into its BaseClass, which is

I believe then this Scan class will be serialized to the server and I didn't find any place
its caching value will be reset based on the Configuration. Of course, I maybe miss it since
I just start reading Hbase codebase and not knowing too much about it.
Any log in the server side can show the cache value, if I change any log level? If so, how?
Also, can you comment out about Hive Jira https://issues.apache.org/jira/browse/HIVE-3603?

In fact, I have the same question as the 2nd to last comment in the Jira ticket, but no one
ever answered it. 
Swarnim Kulkarni added a comment - 26/Aug/13 19:28Edward Capriolo Thanks! Also how is setting
this property different than directly setting the "hbase.client.scanner.caching" property
in hive-site.xml without this enhancement? Wouldn't they have the same effect?


> Date: Mon, 10 Feb 2014 12:37:07 -0800
> From: larsh@apache.org
> Subject: Re: Hive + Hbase scanning performance
> To: user@hbase.apache.org
> The block caching won't buy you much in terms of performance.
> You *must* set the scanner caching.
> Note that hbase.client.scanner.caching is a global config option. (see HTable.getScanner(...)),
so as long as that option is set on the Configuration that the HTable sees that Hive uses
to create the scanner it should work.
> -- Lars
> ________________________________
>  From: java8964 <java8964@hotmail.com>
> To: "user@hbase.apache.org" <user@hbase.apache.org> 
> Sent: Monday, February 10, 2014 12:19 PM
> Subject: Re: Hive + Hbase scanning performance
> Hi, Ted:
> Our environment is using a distribution from a Vendor, so it is not easy just to patch
it myself.
> But I can seek the option to see if the vendor is willing to patch it in next release.
> Before I do that, I just want to make sure patching the code is the ONLY solution.
> I read the source code of Hive 0.9.0 of HiveHBaseTableInputFormat. I didn't see any place
it invoked scan.setCaching(), so I don't think "set hbase.client.scanner.caching" in the hive
session will work, but that is just my guess. There are quite a lot of messages on the internet
that it will work in this case, so it confused me.
> What I want to confirm is that "set hbase.client.scanner.caching" in fact doesn't work
in hive for scan.setCaching(). Is that true?
> Thanks
> Yong
> Date: Mon, 13 Jan 2014 19:31:38 -0800
> Subject: Re: Hive + Hbase scanning performance
> From: yuzhihong@gmail.com
> To: user@hbase.apache.org
> You can patch HIVE-3603 into your deployment so that you can make use of
> scan.setCacheBlocks(false).
> Cheers                          
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message