hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Kim <benkimkim...@gmail.com>
Subject Data locality in HBase
Date Fri, 15 Jun 2012 04:56:47 GMT

I've been posting questions in the mailing-list quiet often lately, and
here goes another one about data locality
I read the excellent blog post about data locality that Lars George wrote
at http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html

I understand data locality in hbase as locating a region in a region-server
where most of its data blocks reside.
So that way fast data access is guranteed when running a MR because each
map/reduce task is run for each region in the tasktracker where the region

But what if the data blocks of the region are evenly spread over multiple
Does a MR task has to remotely access the data blocks from other
How good is hbase locating datablocks where a region resides?

Also is it correct to say that if i set smaller data block size data
locality gets worse, and if data block size gets bigger  data locality gets

Best regards,

*Benjamin Kim*
*benkimkimben at gmail*

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message