hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: the size of a value and the block size.
Date Tue, 31 Jan 2012 20:45:45 GMT
On Mon, Jan 30, 2012 at 5:27 PM, Zheng Da <zhengda1936@gmail.com> wrote:
> Hello,
> I'm thinking of using HBase to store a matrix, so each subblock of a matrix
> is stored as a value in HBase, and the key of the value is the location of
> the subblock in the matrix. At beginning, I wanted the subblock to be as
> large as 8MB. But when I read
> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html, I
> found HBase splits keyvalue pairs into blocks and the block size is usually
> much smaller than 8MB. So what happens if I store data of 8MB as a value in
> HBase? I tried, and it seems to work fine. But how about the performance?

Please point to what in that blog has you thinking we split keyvalues.
 We do not.

Writing, we persist files that by default use hdfs blocks of 64MB.
Reading we will by default read in 64k chunks (hbase read blocks).
The 64k will contain whole keyvalues which means we likely rarely read
exactly 64kb.  If a keyvalue is 8MB, though we're configured to read
in 64kb blocks, we'll read in the coherent 8MB keyvalue as a block.

Performance-wise, its best you try it out.  Be aware that unless you
configure stuff otherwise, this 8MB block coming up out of the
filesystem will probably traverse the read-side block cache and blow
out a bunch of lesser entries.  These are the kind of things you'll
need to think consider.  Check out the performance section in the
hbase reference guide: http://hbase.apache.org/book.html#performance


View raw message