hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5615) the master never does balance because of balancing the parent region
Date Fri, 06 Apr 2012 06:20:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248119#comment-13248119
] 

ramkrishna.s.vasudevan commented on HBASE-5615:
-----------------------------------------------

bq.RS_SPLIT above refers to RS_ZK_REGION_SPLIT, right ?
Yes.
I will explain the problem first once again with the code
The patch does the following change while rebuildUserRegions on master startup.
{code}
      if (regionInfo.isOffline() && regionInfo.isSplit()) continue;
{code}

Take the case where the RS was splitting a region.  In SplitTransaction
{code}
try {
        this.znodeVersion = transitionNodeSplit(server.getZooKeeper(),
          parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo(),
          server.getServerName(), this.znodeVersion);

        int spins = 0;
{code}

After doing the above step RS waits for the master to respond for this change done in znode.
{code}
if (spins % 10 == 0) {
            LOG.debug("Still waiting on the master to process the split for " +
                this.parent.getRegionInfo().getEncodedName());
          }
          Thread.sleep(100);
          // When this returns -1 it means the znode doesn't exist
          this.znodeVersion = tickleNodeSplit(server.getZooKeeper(),
            parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo(),
            server.getServerName(), this.znodeVersion);
{code}
But the master had gone down.  So RS will keep waiting here
Now in master side when master comes up, the master tries to form all the existing regions
and their corresponding servers in AM.rebuildUserRegions()
{code}
        if (false == checkIfRegionBelongsToDisabled(regionInfo)
            && false == checkIfRegionsBelongsToEnabling(regionInfo)) {
          regions.put(regionInfo, regionLocation);
          addToServers(regionLocation, regionInfo);
        }
{code}
As the region was already splitted the current fix in this patch will not allow me to continue
to the above step where i addToServers.  
Now because the RS keeps on tickling the node to SPLIT state in AM.handleRegion
{code}
case RS_ZK_REGION_SPLIT:
          // RegionState must be null, or SPLITTING or PENDING_CLOSE.
          if (!isInStateForSplitting(regionState)) break;
          // If null, add SPLITTING state before going to SPLIT
          if (regionState == null) {
            regionState = addSplittingToRIT(sn, encodedName);
{code}
We see that the current regionState is null as no entry is present in RIT for that splitted
region. As the regionState is null we fist try to get the RIT populated
{code}
 private HRegionInfo findHRegionInfo(final ServerName sn,
      final String encodedName) {
    if (!this.serverManager.isServerOnline(sn)) return null;
    Set<HRegionInfo> hris = this.servers.get(sn);
    HRegionInfo foundHri = null;
    for (HRegionInfo hri: hris) {
      if (hri.getEncodedName().equals(encodedName)) {
        foundHri = hri;
        break;
      }
    }
    return foundHri;
  }
{code}
But my servers map doesnot have this region. So it will always be null and master will not
process the SPLIT.
I reverted the patch and i was able to overcome the problem . We need to make the fix for
0.92+ branches considering these scenarios.




                
> the master never does balance because of balancing the parent region
> --------------------------------------------------------------------
>
>                 Key: HBASE-5615
>                 URL: https://issues.apache.org/jira/browse/HBASE-5615
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.7
>            Reporter: xufeng
>            Assignee: xufeng
>            Priority: Critical
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, NoPatched-surefire-report-5615-90.html,
Patched_surefire-report-5615-90.html
>
>
> the master never do balance becauseof when master do rebuildUserRegions()´╝îit will add
the parent region into  AssignmentManager#servers,
> if balancer let the parent region to move,the parent will in RIT forever.thus balance
will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message