hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiang Li (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-20186) Improve RSGroupBasedLoadBalancer#balanceCluster() to be more efficient when calculating cluster state for each rsgroup
Date Wed, 14 Mar 2018 03:22:00 GMT

     [ https://issues.apache.org/jira/browse/HBASE-20186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiang Li updated HBASE-20186:
-----------------------------
    Status: Patch Available  (was: Open)

> Improve RSGroupBasedLoadBalancer#balanceCluster() to be more efficient when calculating
cluster state for each rsgroup
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-20186
>                 URL: https://issues.apache.org/jira/browse/HBASE-20186
>             Project: HBase
>          Issue Type: Improvement
>          Components: rsgroup
>            Reporter: Xiang Li
>            Assignee: Xiang Li
>            Priority: Minor
>         Attachments: HBASE-20186.master.000.patch, HBASE-20186.master.001.patch
>
>
> In RSGroupBasedLoadBalancer
> {code}
> public List<RegionPlan> balanceCluster(Map<ServerName, List<RegionInfo>>
clusterState)
> {code}
> The second half of the function is to calculate region move plan for regions which have
been already placed according to the rsgroup assignment, and it is calculated one rsgroup
after another.
> The following logic to check if a server belongs to the rsgroup is not quite efficient,
as it does not make good use of the fact that servers in RSGroupInfo is a TreeSet.
> {code}
> for (Address sName : info.getServers()) {
>   for(ServerName curr: clusterState.keySet()) {
>     if(curr.getAddress().equals(sName)) {
>       groupClusterState.put(curr, correctedState.get(curr));
>     }
>   }
> }
> {code}
> Given there are m region servers in the cluster and n region servers for each rsgroup
in average, the code above has time complexity as O(m * n), while using TreeSet's contains(),
the time complexity could be reduced to O (m * logn).
> Another improvement is we do not need to scan every server for each rsgroup. If the processed
server could be recorded,  we could skip those.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message