hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maryann Xue (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-5829) Inconsistency between the "regions" map and the "servers" map in AssignmentManager
Date Wed, 25 Apr 2012 06:12:14 GMT

     [ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Maryann Xue updated HBASE-5829:
-------------------------------

    Status: Patch Available  (was: Open)
    
> Inconsistency between the "regions" map and the "servers" map in AssignmentManager
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5829
>                 URL: https://issues.apache.org/jira/browse/HBASE-5829
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.92.1, 0.90.6
>            Reporter: Maryann Xue
>         Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch
>
>
> There are occurrences in AM where this.servers is not kept consistent with this.regions.
This might cause balancer to offline a region from the RS that already returned NotServingRegionException
at a previous offline attempt.
> In AssignmentManager.unassign(HRegionInfo, boolean)
>     try {
>       // TODO: We should consider making this look more like it does for the
>       // region open where we catch all throwables and never abort
>       if (serverManager.sendRegionClose(server, state.getRegion(),
>         versionOfClosingNode)) {
>         LOG.debug("Sent CLOSE to " + server + " for region " +
>           region.getRegionNameAsString());
>         return;
>       }
>       // This never happens. Currently regionserver close always return true.
>       LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
>         region.getRegionNameAsString());
>     } catch (NotServingRegionException nsre) {
>       LOG.info("Server " + server + " returned " + nsre + " for " +
>         region.getRegionNameAsString());
>       // Presume that master has stale data.  Presume remote side just split.
>       // Presume that the split message when it comes in will fix up the master's
>       // in memory cluster state.
>     } catch (Throwable t) {
>       if (t instanceof RemoteException) {
>         t = ((RemoteException)t).unwrapRemoteException();
>         if (t instanceof NotServingRegionException) {
>           if (checkIfRegionBelongsToDisabling(region)) {
>             // Remove from the regionsinTransition map
>             LOG.info("While trying to recover the table "
>                 + region.getTableNameAsString()
>                 + " to DISABLED state the region " + region
>                 + " was offlined but the table was in DISABLING state");
>             synchronized (this.regionsInTransition) {
>               this.regionsInTransition.remove(region.getEncodedName());
>             }
>             // Remove from the regionsMap
>             synchronized (this.regions) {
>               this.regions.remove(region);
>             }
>             deleteClosingOrClosedNode(region);
>           }
>         }
>         // RS is already processing this region, only need to update the timestamp
>         if (t instanceof RegionAlreadyInTransitionException) {
>           LOG.debug("update " + state + " the timestamp.");
>           state.update(state.getState());
>         }
>       }
> In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
>           synchronized (this.regions) {
>             this.regions.put(plan.getRegionInfo(), plan.getDestination());
>           }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message