hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Jain <sachinjain...@gmail.com>
Subject Re: Default value of caching in Scanner
Date Wed, 02 Nov 2016 06:31:49 GMT
Thanks Yu!! This is very helpful.

On Tue, Nov 1, 2016 at 2:45 PM, Yu Li <carp84@gmail.com> wrote:

> A brief answer yes, by default the caching size is Integer.MAX_VALUE now
> and it's a big difference from 0.98. This is changed by HBASE-11544 and you
> could find below statement on http://hbase.apache.org/book.html:
>
> hbase.client.scanner.caching
> Description
>
> Number of rows that we try to fetch when calling next on a scanner if it is
> not served from (local, client) memory. This configuration works together
> with hbase.client.scanner.max.result.size to try and use the network
> efficiently. The default value is Integer.MAX_VALUE by default so that the
> network will fill the chunk size defined by
> hbase.client.scanner.max.result.size rather than be limited by a
> particular
> number of rows since the size of rows varies table to table. If you know
> ahead of time that you will not require more than a certain number of rows
> from a scan, this configuration should be set to that row limit via
> Scan#setCaching. Higher caching values will enable faster scanners but will
> eat up more memory and some calls of next may take longer and longer times
> when the cache is empty. Do not set this value such that the time between
> invocations is greater than the scanner timeout; i.e.
> hbase.client.scanner.timeout.period
> Default
>
> 2147483647
>
> And user will be able to control the time limit of each call from client
> configuration after HBASE-15593, but only after 1.3.0 get released (sorry
> but for all existing release we could only control this by server side
> configuration, say half of hbase.client.scanner.timeout.period)
>
> We're discussing about this in
> https://issues.apache.org/jira/browse/HBASE-16973 recently, you can get
> more details there.
>
> Small world, isn't it? (Smile)
>
> Best Regards,
> Yu
>
> On 1 November 2016 at 13:10, Sachin Jain <sachinjain024@gmail.com> wrote:
>
> > Hi,
> >
> > I am using HBase v1.1.2. I have few questions regarding full table scan:-
> >
> > 1. When we instantiate a Scanner and do not set any caching on it. What
> is
> > the value it picks by default.
> > - By looking at the code, I have found the following:
> >
> > From documentation on the top in Scan.java class
> >
> > * To modify scanner caching for just this scan, use {@link
> > #setCaching(int) setCaching}.
> > * If caching is NOT set, we will use the caching value of the hosting
> > {@link Table}.
> >
> > And
> >
> > /**
> >  * Set the number of rows for caching that will be passed to scanners.
> >  * If not set, the Configuration setting {@link
> > HConstants#HBASE_CLIENT_SCANNER_CACHING} will
> >  * apply.
> >  * Higher caching values will enable faster scanners but will use more
> > memory.
> >  * @param caching the number of rows for caching
> >  */
> > public Scan setCaching(int caching) {
> >   this.caching = caching;
> >   return this;
> > }
> >
> > And, default value in HConstants file is
> >
> > public static final String HBASE_CLIENT_SCANNER_CACHING =
> > "hbase.client.scanner.caching";
> > public static final int DEFAULT_HBASE_CLIENT_SCANNER_CACHING =
> 2147483647;
> >
> >
> > Does that mean the default value viz number of records read per scan is
> > 2147483647.
> > Can someone please clarify this ?
> >
> > 2. Another question is: I assume we have to set the caching value higher
> so
> > that we can reduce the number of RPC calls between client and region
> > server.
> > So if we increase the caching value, should we also increase the RPC
> > timeout and scannerTimeout values otherwise we may reach that threshold
> for
> > the new cache value.
> >
> > Thanks
> > -Sachin
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message