hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ma, Ming" <min...@ebay.com>
Subject region assignment and HFile HDFS block locality
Date Thu, 23 Jun 2011 22:05:50 GMT
Normally, when we put hbase and HDFS in the same cluster ( e.g., region server runs on the
datenode ), we have a reasonably good data locality, as explained<http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html>
by Lars. Also Work<https://issues.apache.org/jira/browse/HBASE-2896> has been done by
Jonathan to address the startup situation.

There are scenarios where regions can be on a different machine from the machines that hold
the underlying HFile blocks, at least for some period of time. This will have performance
impact on whole table scan operation and map reduce job during that time.


1.       After load balancer moves the region and before compaction (thus generate HFile on
the new region server ) on that region, HDFS block can be remote.

2.       When a new machine is added, or removed, Hbase's region assignment policy is different
from HDFS's block reassignment policy.

3.       Even if there is no much hbase activity, HDFS can load balance HFile blocks as other
non-hbase applications push other data to HDFS.

Lots has been or will be done in load balancer, as summarized<http://zhihongyu.blogspot.com/2011/04/load-balancer-in-hbase-090.html>
by Ted. I am curious if HFile HDFS block locality should be used as another factor here.

Thanks.

Ming

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message