hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Scan performance in version 0.20.3
Date Mon, 06 Dec 2010 18:10:24 GMT
On Mon, Dec 6, 2010 at 5:44 AM, Lior Schachter <liors@infolinks.com> wrote:
> Hi all,
> I would like to speed up my scans and noticed these two methods on
> org.apache.hadoop.hbase.client.Scan:
> 1. setCacheBlocks

This is whether we should add blocks to the server-side block cache as
we scan (Follow it in the code and you'll see how this flag makes it
all the ways down into the reader we use pulling from our files in

> 2. setCaching

I presume you mean HTable#setScannerCaching?  If so, its like it says
in the javadoc,
its how many rows to fetch per RPC.

> Can you please specify how these parameters should be configured and how
> they relate to each other.

Leave the former alone.  Play with the latter.  The larger you can set
it, the more improvement you will see (because IIRC, the default is to
do an RPC for each row), but don't set  it so high you pull too much
per invocation and put pressure on server-side or even client-side
heaps.  You have an idea on the size of your rows so you should have
notion of what to set it too.  Start with a low value.  Even a small
change should make a difference.


> Thanks,
> Lior

View raw message