hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Seigal <selek...@yahoo.com>
Subject block caching
Date Thu, 17 Nov 2011 21:44:08 GMT
I have a table that I only use for generating indexes. It rarely will
have random reads, but will have M/R jobs running against it
constantly for generating indexes. Even the index table, random reads
will be rare. It will mostly be used for scanning blocks of data.

According to HBase The Definitive Guide

"As HBase reads entire blocks of data for efficient IO usage it
retains these blocks in an in-memory cache, so that subsequent reads
do not need any disk operation. The default of true enables the block
cache for every read operation. But if your use-case only ever has
sequential reads on a particular column family it is advisable to
disable it from polluting the block cache by setting the block cache
enabled flag to false. "

"There are other options you can use to influence how the block cache
is used, for example during a scan operation. This is useful during
full table scans so that you do not cause a major churn on the cache.
See the section called “Configuration” for more information about this

"Scan instances can be set to use the block cache in the region server
via the setCacheBlocks() method. For scans used with MapReduce jobs,
this should be false. For frequently accessed rows, it is advisable to
use the block cache."

What is the reasoning behind the above ?  Why is using a block cache
for M/R jobs not a good idea if it is doing full table scans ?

View raw message