hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chenxu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-18215) some advises about refactoring of rsgroup
Date Tue, 20 Jun 2017 14:35:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055201#comment-16055201
] 

chenxu edited comment on HBASE-18215 at 6/20/17 2:34 PM:
---------------------------------------------------------

bq. #5 Can you provide a specific scenario? When the rsgroup patch was written if I remember
correctly the reverse was true. AM cannot handle null results when calling randomAssignment().

there is a scenario

{code:title=RSGroupBasedLoadBalancer.java|borderStyle=solid}
public List<RegionPlan> balanceCluster(Map<ServerName, List<HRegionInfo>>
clusterState)
    throws HBaseIOException {
    ...
    List<HRegionInfo> misplacedRegions = correctedState.get(LoadBalancer.BOGUS_SERVER_NAME);
    for (HRegionInfo regionInfo : misplacedRegions) {
      regionPlans.add(new RegionPlan(regionInfo, null, null));
    }
   ...
{code}
if region misplaced, RegionPlan’s dest is null, and the following will happen
{code:title=AssignmentManager.java|borderStyle=solid}
private RegionPlan getRegionPlan(final HRegionInfo region,
    final boolean forceNewPlan) throws HBaseIOException {
    ...
   if (forceNewPlan
          || existingPlan == null
          || existingPlan.getDestination() == null
          || !destServers.contains(existingPlan.getDestination())) {
        newPlan = true;
        try {
          randomPlan = new RegionPlan(region, null,
              balancer.randomAssignment(region, destServers));
        } catch (IOException ex) {
          LOG.warn("Failed to create new plan.",ex);
          return null;
        }
        this.regionPlans.put(encodedName, randomPlan);
      }
    }
    if (newPlan) {
      if (randomPlan.getDestination() == null) {
        LOG.warn("Can't find a destination for " + encodedName);
        return null;
      }
      ...
      return randomPlan;
    }
    ...
}
{code}
if balancer.randomAssignment return null, getRegionPlan will be null, and AM will do the following
thing
{code:title=AssignmentManager.java|borderStyle=solid}
private void assign(RegionState state, boolean forceNewPlan) {
    ...
    if (plan == null) {
      LOG.warn("Unable to determine a plan to assign " + region);
      // For meta region, we have to keep retrying until succeeding
      if (region.isMetaRegion()) {
        ....
      }
      regionStates.updateRegionState(region, State.FAILED_OPEN);
      return;
    }
    ....
}
{code}
the target region will be transition to FAILED_OPEN status.
but if balancer.randomAssignment return BOGUS_SERVER_NAME, AM can’t handle this.
an OPEN RPC will be sent to the BOGUS_SERVER_NAME, this should not happen


was (Author: javaman_chen):
bq. #5 Can you provide a specific scenario? When the rsgroup patch was written if I remember
correctly the reverse was true. AM cannot handle null results when calling randomAssignment().

there is a scenario

{code:title=RSGroupBasedLoadBalancer.java|borderStyle=solid}
public List<RegionPlan> balanceCluster(Map<ServerName, List<HRegionInfo>>
clusterState)
    throws HBaseIOException {
    ...
    List<HRegionInfo> misplacedRegions = correctedState.get(LoadBalancer.BOGUS_SERVER_NAME);
    for (HRegionInfo regionInfo : misplacedRegions) {
      regionPlans.add(new RegionPlan(regionInfo, null, null));
    }
   ...
{code}
if region misplaced, RegionPlan’s dest is null, and the following will happen
{code:title=AssignmentManager.java|borderStyle=solid}
private RegionPlan getRegionPlan(final HRegionInfo region,
    final boolean forceNewPlan) throws HBaseIOException {
    ...
   if (forceNewPlan
          || existingPlan == null
          || existingPlan.getDestination() == null
          || !destServers.contains(existingPlan.getDestination())) {
        newPlan = true;
        try {
          randomPlan = new RegionPlan(region, null,
              balancer.randomAssignment(region, destServers));
        } catch (IOException ex) {
          LOG.warn("Failed to create new plan.",ex);
          return null;
        }
        this.regionPlans.put(encodedName, randomPlan);
      }
    }
    if (newPlan) {
      if (randomPlan.getDestination() == null) {
        LOG.warn("Can't find a destination for " + encodedName);
        return null;
      }
      ...
      return randomPlan;
    }
    ...
}
{code}
if balancer.randomAssignment return null, getRegionPlan will null, AM will handle like this
{code:title=AssignmentManager.java|borderStyle=solid}
private void assign(RegionState state, boolean forceNewPlan) {
    ...
    if (plan == null) {
      LOG.warn("Unable to determine a plan to assign " + region);
      // For meta region, we have to keep retrying until succeeding
      if (region.isMetaRegion()) {
        ....
      }
      regionStates.updateRegionState(region, State.FAILED_OPEN);
      return;
    }
    ....
}
{code}
the target region will be transition to FAILED_OPEN status.
but if balancer.randomAssignment return BOGUS_SERVER_NAME, AM can’t handle this.
an OPEN RPC will be sent to the BOGUS_SERVER_NAME, this should not happen

> some advises about refactoring of rsgroup
> -----------------------------------------
>
>                 Key: HBASE-18215
>                 URL: https://issues.apache.org/jira/browse/HBASE-18215
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer
>            Reporter: chenxu
>         Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster,  after Integrated, found some refactoring
points. maybe the points were not right, but i think there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer should consider
masterServer assignment first in balanceCluster, roundRobinAssignment, retainAssignment and
randomAssignment
>   do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table. 
> in our implementation, we first modify the local rsgroup file, then load the group info
into memory, after that execute the balancer command, everything is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it can greatly
reduce the complexity of rsgroup implementation, there is no need to wait for the rsgroup
table to be online, and methods like moveServers, moveTables, addRSGroup, removeRSGroup, moveServersAndTables
can be removed from RSGroupAdminService.only a refresh method is need(modify persistence layer
first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module, because MasterStatusTmpl.jamon
depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will overwrite the previous
data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we should return
null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced one by one,
if there are two many groups, we can do this in parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message