hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1678) [hbase] On region split, master should designate which host should serve daughter splits
Date Fri, 03 Aug 2007 23:54:52 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517665
] 

Jim Kellerman commented on HADOOP-1678:
---------------------------------------

The region server that performed the split should serve the lower half of the split region.

It should update the meta with information about the two new regions and change the region
info for the old parent region to indicate it is being shared by the two children.

When it reports the split to the master, the master will assign the upper half region  to
the most lightly loaded server.

In order to determine load, HServerInfo should be augmented to include the number of regions
being served by the region server with other statistics such as Runtime.freeMemory(),  Runtime.totalMemory(),
Runtime.maxMemory(), Runtime.availableProcessors().request rate, etc. These statistics can
be used to determine a region servers "load factor". (actually the load factor is probably
what needs to go into the HServerInfo object - the region server can compute its load factor
before sending its regular heartbeat message)

Should the master miss the split message, it will assign the upper child region during the
next meta scan (since the region server updated the meta before reporting the split to the
master).

The master will need to track the load factor of each server so that it can assign new regions
to the server with the smallest load factor.

Periodically, the master should run a thread that attempts to re-balance the load on the cluster.
Without detailed statistics such as the request rate per region, however it would be hard
for the master to make a determination of which regions should be moved to a different server
in order to most effectively balance the load. For example a server could be serving 1000
regions which are receiving little traffic and still be assigned another region without greatly
effecting its performance. Another server could be serving two heavy traffic regions yet be
so heavily loaded that it should be relieved of one of the regions to more effectively balance
load.

In the near term, computing a load factor from percentage of free memory and request rate
is probably the best metric for determining which server should be assigned a new region.
As we gain more experience with HBase performance we can include some of the other factors
mentioned above.

> [hbase] On region split, master should designate which host should serve daughter splits
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1678
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1678
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>
> On region split, the daughter regions are deployed on the same host as served the split
parent.  This makes it so currently (unless the cluster is restarted), as a table grows, all
its regions remain on the one server.
> Instead, jurisdiction over who serves daughter splits should be passed to the master.
 If possible, before making a determination, the master should take into consideration current
cluster loadings and region distribution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message