hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-4306) Race between CatalogJanitor and LoadBalancer
Date Tue, 13 Sep 2011 18:31:11 GMT

     [ https://issues.apache.org/jira/browse/HBASE-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack updated HBASE-4306:

         Priority: Minor  (was: Blocker)
    Fix Version/s:     (was: 0.90.5)
                       (was: 0.92.0)

> Race between CatalogJanitor and LoadBalancer
> --------------------------------------------
>                 Key: HBASE-4306
>                 URL: https://issues.apache.org/jira/browse/HBASE-4306
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Minor
> It is possible for the LoadBalancer to try to assign an offline/split region while it
is waiting to be CatalogJanitor'ed. It goes like this:
> {quote}
> 2011-08-25 00:32:07,137 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
parent: Daughters; d1, d2 from sv4r22s16,60020,1314211225331
> ...
> (cleaning never happens or whatever)
> ...
> 2011-08-29 13:45:14,561 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=parent,
src=sv4r22s16,60020,1314211225331, dest=sv4r19s17,60020,1314218170402
> 2011-08-29 13:45:14,561 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region parent (offlining)
> 2011-08-29 13:45:14,588 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server
serverName=sv4r22s16,60020,1314211225331, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException:
Received close for parent but we are not serving it for parent
> {quote}
> Here it took 4 days of balancing to finally get to try to balance the parent (that was
never deleted because of HBASE-4238), but it can also happen if the balancer decides to balance
the parent just before it's cleaned. The end effect is that the balancer will be disabled
_forever_ until that's fixed.
> The culprit here is that the master keeps the region "online" until AssignmentManager.regionOffline
is called by the CJ, which means it's still treated like any other region although it's offline.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message