hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samuel Guo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-57) [hbase] Master should allocate regions to regionservers based upon data locality and rack awareness
Date Mon, 30 Mar 2009 13:54:51 GMT

    [ https://issues.apache.org/jira/browse/HBASE-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693770#action_12693770

Samuel Guo commented on HBASE-57:

Thanks for your comments, Jim.

> Solid performance data evaluating the cost of:
> 1) network access to a block in a different rack
> 2) network access to a block in the same rack but on a different server
> 3) network access to a block on the same server
> 4) direct disk access to a block on the same server
> would be highly useful. If there is little difference between 1, 2, 3 (access to a block
through a datanode) then
> locality may not be useful. On the other hand, if there is a significant difference between
1, 2, 3 then we should
> try to exploit locality if we can.

> There is a lot of performance evaluation that needs to be done before we actually take
the step of using
> locality-based region assignment. If doing that performance evaluation sounds interesting
to you, I think
> that would be a great GSOC project.

Yes, I agree with you. We need to do a detail analysis of most behaviors of HDFS and HBase
before we try locality-based assignment. And the analysis work will be the main part of my
GSOC project.

> Suppose there was one 'hot' datanode that hosted blocks from many regions. Using locality
might end up in
> overloading the region server on that node, resulting in poorer performance.

Yes, Locality should be taken carefully not to overload the  region server or the data node.
 An ideal region assignment can assign regions close to its data to reduce network traffic
while balancing the loads between region servers, datanodes and avoiding disk competition
on the same datanode. As what you suggested, we need to know the following things clearly
before making it.
1) what is the difference we access data from different locations(local, local by-pass, remote,
remote rack)?
2) In regions' life time, what is the data-blocks' distribution? And how many bytes that the
region reads data from local node? how many from remote? from remote rack? 
3) After a balance operation happened in HDFS, how 2) changes?
4) After some region servers failed, how 2) changes?

I am not so clear now about how to analysis it. but I think I can take them one by one to
make things clearly. 

> [hbase] Master should allocate regions to regionservers based upon data locality and
rack awareness
> ---------------------------------------------------------------------------------------------------
>                 Key: HBASE-57
>                 URL: https://issues.apache.org/jira/browse/HBASE-57
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.2.0
>            Reporter: stack
>             Fix For: 0.20.0
> Currently, regions are assigned regionservers based off a basic loading attribute.  A
factor to include in the assignment calcuation is the location of the region in hdfs; i.e.
servers hosting region replicas.  If the cluster is such that regionservers are being run
on the same nodes as those running hdfs, then ideally the regionserver for a particular region
should be running on the same server as hosts a region replica.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message