hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject Re: Optimal block size for large columns
Date Wed, 19 May 2010 15:05:36 GMT
Currently every block requires another HDFS fetch.  There are open  
jiras about prefetching all required blocks, in which case there would  
be no difference.

Your best bet is to test and benchmark with varied block and row  
sizes.  If you show big perf hits for multiple blocks, that would be a  
good argument for getting prefetching implemented (at an already  
largish size if 64k it's not clear how beneficial it will be).

Please share your findings if you do any more experimentation.

On May 18, 2010, at 6:43 PM, "Jason Strutz" <jason@cumuluscode.com>  

> Thanks for your response Jonathan.  We'll be doing largely single- 
> row random lookups.  In this scenario, would it be best to try to  
> make the block size encompass a single row?  How significant is the  
> performance hit if hbase has to dig up multiple blocks to serve a  
> singe row?
> On May 18, 2010, at 3:12 PM, Jonathan Gray wrote:
>> It would depend on your read patterns.
>> Is everything going to be single row gets, or will you also scan?
>> Single row lookups will be faster with smaller block sizes, at the  
>> expense of a larger index size (and potentially slower scans as you  
>> have to deal with more block fetches).
>>> -----Original Message-----
>>> From: Jason Strutz [mailto:jason@cumuluscode.com]
>>> Sent: Tuesday, May 18, 2010 9:33 AM
>>> To: hbase-user@hadoop.apache.org
>>> Subject: Optimal block size for large columns
>>> I am working with a small cluster, trying to nail down appropriate
>>> settings for block size.  We will have a single table with a single
>>> column of data averaging 300k in size, sometimes upwards of 2mb,  
>>> never
>>> more than 10mb.
>>> Is there any rule-of-thumb or other sage advice for block sizes for
>>> large columns?
>>> Thanks!

View raw message