hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: the size of a value and the block size.
Date Wed, 01 Feb 2012 17:33:36 GMT
On Tue, Jan 31, 2012 at 7:30 PM, Zheng Da <zhengda1936@gmail.com> wrote:
> It mentions "block size" and also the figure shows data is split into
> blocks and each block starts with a magic header, which shows whether data
> in the block is compressed or not. Also blocks in HBase is indexed.

These 'blocks' are not hdfs 'blocks'.  The hbase hfile that we write
to hdfs is written in, by default, 64k chunks/blocks (This is the same
as the read-time blocks as I talked of in my earlier message).  As
said already, these are not hdfs blocks (this blocking is done on top
of hdfs blocking).

> "Minimum block size. We recommend a setting of minimum block size between
> 8KB to 1MB for general usage. Larger block size is preferred if files are
> primarily for sequential access. However, it would lead to inefficient
> random access (because there are more data to decompress). Smaller blocks
> are good for random access, but require more memory to hold the block
> index, and may be slower to create (because we must flush the compressor
> stream at the conclusion of each data block, which leads to an FS I/O
> flush). Further, due to the internal caching in Compression codec, the
> smallest possible block size would be around 20KB-30KB."
> So each block with its prefixed "magic" header contains either plain or
> compressed data. How that looks like we will have a look at in the next
> section.
> If data isn't split into blocks, how do these things work?

The above prescription rings about right (you should be referring to
the reference guide though rather than to Lars' blog; See
http://hbase.apache.org/book.html#hfilev2  It builds on Lars blog to
explain how hfile works in more recent hbases').  It pertains to the
hfile blocks.

I don't understand your question 'If data isn't split into blocks, how
do these things work?'

Data is split into hfile blocks.   Splits happen on hfile block
boundaries usually.

Please ask more questions so I can help you understand whats going on.


View raw message