hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yu Li <car...@gmail.com>
Subject Re: Default value of caching in Scanner
Date Tue, 01 Nov 2016 09:15:37 GMT
A brief answer yes, by default the caching size is Integer.MAX_VALUE now
and it's a big difference from 0.98. This is changed by HBASE-11544 and you
could find below statement on http://hbase.apache.org/book.html:

hbase.client.scanner.caching
Description

Number of rows that we try to fetch when calling next on a scanner if it is
not served from (local, client) memory. This configuration works together
with hbase.client.scanner.max.result.size to try and use the network
efficiently. The default value is Integer.MAX_VALUE by default so that the
network will fill the chunk size defined by
hbase.client.scanner.max.result.size rather than be limited by a particular
number of rows since the size of rows varies table to table. If you know
ahead of time that you will not require more than a certain number of rows
from a scan, this configuration should be set to that row limit via
Scan#setCaching. Higher caching values will enable faster scanners but will
eat up more memory and some calls of next may take longer and longer times
when the cache is empty. Do not set this value such that the time between
invocations is greater than the scanner timeout; i.e.
hbase.client.scanner.timeout.period
Default

2147483647

And user will be able to control the time limit of each call from client
configuration after HBASE-15593, but only after 1.3.0 get released (sorry
but for all existing release we could only control this by server side
configuration, say half of hbase.client.scanner.timeout.period)

We're discussing about this in
https://issues.apache.org/jira/browse/HBASE-16973 recently, you can get
more details there.

Small world, isn't it? (Smile)

Best Regards,
Yu

On 1 November 2016 at 13:10, Sachin Jain <sachinjain024@gmail.com> wrote:

> Hi,
>
> I am using HBase v1.1.2. I have few questions regarding full table scan:-
>
> 1. When we instantiate a Scanner and do not set any caching on it. What is
> the value it picks by default.
> - By looking at the code, I have found the following:
>
> From documentation on the top in Scan.java class
>
> * To modify scanner caching for just this scan, use {@link
> #setCaching(int) setCaching}.
> * If caching is NOT set, we will use the caching value of the hosting
> {@link Table}.
>
> And
>
> /**
>  * Set the number of rows for caching that will be passed to scanners.
>  * If not set, the Configuration setting {@link
> HConstants#HBASE_CLIENT_SCANNER_CACHING} will
>  * apply.
>  * Higher caching values will enable faster scanners but will use more
> memory.
>  * @param caching the number of rows for caching
>  */
> public Scan setCaching(int caching) {
>   this.caching = caching;
>   return this;
> }
>
> And, default value in HConstants file is
>
> public static final String HBASE_CLIENT_SCANNER_CACHING =
> "hbase.client.scanner.caching";
> public static final int DEFAULT_HBASE_CLIENT_SCANNER_CACHING = 2147483647;
>
>
> Does that mean the default value viz number of records read per scan is
> 2147483647.
> Can someone please clarify this ?
>
> 2. Another question is: I assume we have to set the caching value higher so
> that we can reduce the number of RPC calls between client and region
> server.
> So if we increase the caching value, should we also increase the RPC
> timeout and scannerTimeout values otherwise we may reach that threshold for
> the new cache value.
>
> Thanks
> -Sachin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message