hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samuel Guo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-57) [hbase] Master should allocate regions to regionservers based upon data locality and rack awareness
Date Thu, 26 Mar 2009 13:50:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689484#action_12689484

Samuel Guo commented on HBASE-57:

Hi hbasers,
I'd like to work on this issue as my GSOC project "Exploit locality when assigning regions
in HBase".

After talking with Stack in emails, I have got some initial thoughts on this issue. I'd like
to share them with you and welcome for your comments.

Before designing a suitable mechanism to using the region's locality, we need to know how
blocks are allocated in a hbase cluster and the data-blocks distribution of a specified region
over its lifetime in hbase. so that we can find out how the region locality effect the performance.
It is difficult to capture all these information in a real cluster. An alternative way to
study the locality phenomeon may be simulating the data-block placement procedure in HDFS(local
node, local rack, and remote rack) and the regions-allocation mechanism of a hbase cluster
in a single machine. And a approximate detail report from simulation can be used for analysis
and development.

Although I haven't got any detail information about the locality phenomeon, I try to give
an initial proposal first.  The initial proposal is to schedule the regions to the datanodes(regionservers)
that contains most data-blocks of the specified region. The most challenge thing is to know
the data-blocks layout(we can query namenode in HDFS to get these information) of a region
in master. And an initial method is to record these layout information of regions in .META.
Some background threads may be run on the master scanning the .META. table to pick up the
candidate nodes for region-allocation(these nodes may be sorted by the number of blocks they
contain). The detail allocation mechanism will be discussed below.
(1) A blank region created when the table is first created. As we haven't got any data in
it, we can allocate it according to the current loads of the cluster. It is an easy way. And
after the region grows up and were flushed back to HDFS, we get the blocks' locations information
and records them to .META. table for next-allocation.
(2) A region is created by splitting its parent region. We can use parent-region's blocks'
location information to make an allocation decision. And after we finish the splitting procedure,
we can simply copy the parent-region's blocks' location information to each sub-region's .META.
table information. 
(3) A region is re-allocated after the regionserver crash. The logfiles' blocks information
will be considered into allocation so that we may accelerate the recovery of a failed-region.

> [hbase] Master should allocate regions to regionservers based upon data locality and
rack awareness
> ---------------------------------------------------------------------------------------------------
>                 Key: HBASE-57
>                 URL: https://issues.apache.org/jira/browse/HBASE-57
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.2.0
>            Reporter: stack
>             Fix For: 0.20.0
> Currently, regions are assigned regionservers based off a basic loading attribute.  A
factor to include in the assignment calcuation is the location of the region in hdfs; i.e.
servers hosting region replicas.  If the cluster is such that regionservers are being run
on the same nodes as those running hdfs, then ideally the regionserver for a particular region
should be running on the same server as hosts a region replica.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message