hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Tarnas <...@email.com>
Subject Re: Blob storage
Date Tue, 08 Mar 2011 19:04:47 GMT
Just as a point of reference, in one of our systems we have 500+million rows that have a cell
in its own column family that is about usually about 100bytes, but in about 10,000 of rows
the cell can get to 300mb (average is probably about 30mb for the larger data). The jumbo
sized data gets loaded in separately from the smaller data, although it all goes through the
same pipeline. We are using cdh3b45 (0.90.1) GZ compression, region size of 1GB and with a
max value size of 500mb. So far we have had no problems with the larger values.

Our largest problem was performance related to inserting into several column families for
the small sized value loads and pauses when flushing the memstores. 0.90.1 helped quite a
bit with that.

-chris



On Mar 8, 2011, at 10:54 AM, Jean-Daniel Cryans wrote:

>> The blobs vary in size from smallish (10K) to largish (20MB).
> 
> 20MB is quite large, but could be harmless if most of the rows are under 1MB
> 
>> They are too small to put into individual files in HDFS, but if I have too many largish
rows in a region, I think I would suffer.
> 
> Yeah, need more info about the size distribution.
> 
>> 
>> Would it be possible to put the blobs in their own column family that has a significantly
different block size (10x).  I hesitate to do this mostly because I already have too many
column families, but since I don't expect the blobs to be touched very often, a separate column
family would make them mostly harmless.
> 
> The block size is dynamic, if you store a single cell of 20MB then
> that will be 1 block of the same size. Instead of creating a new
> family, you could also create a new table.
> 
> J-D


Mime
View raw message