hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jan Lukavsky (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-57) [hbase] Master should allocate regions to regionservers based upon data locality and rack awareness
Date Tue, 20 Apr 2010 13:28:51 GMT

    [ https://issues.apache.org/jira/browse/HBASE-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858876#action_12858876

Jan Lukavsky commented on HBASE-57:

I suspect this issue is causing us trouble during Map/Reduce having HBase as data source.
TableInputFormat tells JobTracker that regions are data-local to RegionServer, which serves
them. This IMO causes serious imbalance of load on small clusters (our has about 10 nodes),
because the RegionServer may (and probably will) contact DataNode on different machine. Thus,
in extreme case, single DataNode may (in some time) be handling reads from all the Mappers.

If regions were assigned to RegionServer which holds the most blocks, I suppose this imbalance
will be minimized. Stack's proposed solution seems fairly appropriate to me.

> [hbase] Master should allocate regions to regionservers based upon data locality and
rack awareness
> ---------------------------------------------------------------------------------------------------
>                 Key: HBASE-57
>                 URL: https://issues.apache.org/jira/browse/HBASE-57
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.2.0
>            Reporter: stack
> Currently, regions are assigned regionservers based off a basic loading attribute.  A
factor to include in the assignment calcuation is the location of the region in hdfs; i.e.
servers hosting region replicas.  If the cluster is such that regionservers are being run
on the same nodes as those running hdfs, then ideally the regionserver for a particular region
should be running on the same server as hosts a region replica.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message