hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vidhyashankar Venkataraman <vidhy...@yahoo-inc.com>
Subject Re: HBase minimum block size for sequential access
Date Tue, 27 Jul 2010 17:34:15 GMT
>> eg if your cell is 5KB and your block size is 1MB, that's how much you need to get
on the network in order to read it.

Is the network traffic you indicate, between the client and the regionserver?
The regionserver can only fetch HDFS-defined blocks, isnt it?

I thought the block size refers to the granularity with which you define the index in each
Hfile.. So, if the block size if 1 MB, you have an index entry for every 1 MB of rows in the
Hfile.. Larger block size would then mean smaller indices.. (I havent explored enough on the
intricacies in Hfile, so don't cringe if this is completely off :) )

Vidhya

On 7/27/10 9:30 AM, "Jean-Daniel Cryans" <jdcryans@apache.org> wrote:

Ryan (who wrote HFile) did a lot of testing around block size and
didn't really see any difference when changing it. So I would
recommend that you benchmark different values with your own data/usage
pattern and see if you do have better/worse perfs.

The tradeoff for larger values is that in order to retrieve a single
cell, you would have to fetch a lot more data than required eg if your
cell is 5KB and your block size is 1MB, that's how much you need to
get on the network in order to read it. Obviously if you are scanning,
then you probably want all that data anyways so larger values *
theoretically* gives you better performance.

J-D

On Mon, Jul 26, 2010 at 10:41 PM, Andrew Nguyen
<andrew-lists-hbase@ucsfcti.org> wrote:
> I found the following snippet in the HFile javadocs and had some questions seeking clarification.
 The recommendation is a minimum block size between 8KB and 1MB with larger for sequential
accesses.  Our data are time series data (high resolution, sampled at 125Hz).  The primary/typical
access pattern are subsets of the data, anywhere from 37k points to millions of points.
>
> Should I be setting this to 1MB?  Would even larger values be a good idea (i.e. greater
than 1MB)?  What are the tradeoffs for larger values?
>
>
> From the HFile javadocs:
>
> Minimum block size. We recommend a setting of minimum block size between 8KB to 1MB for
general usage. Larger block size is preferred if files are primarily for sequential access.
However, it would lead to inefficient random access (because there are more data to decompress).
Smaller blocks are good for random access, but require more memory to hold the block index,
and may be slower to create (because we must flush the compressor stream at the conclusion
of each data block, which leads to an FS I/O flush). Further, due to the internal caching
in Compression codec, the smallest possible block size would be around 20KB-30KB.
>
> Thanks!
>
> --Andrew
>
>
>
>
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message