hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-4306) Race between CatalogJanitor and LoadBalancer
Date Sat, 10 Sep 2011 20:33:08 GMT

     [ https://issues.apache.org/jira/browse/HBASE-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-4306:
-------------------------

    Fix Version/s: 0.92.0

> Race between CatalogJanitor and LoadBalancer
> --------------------------------------------
>
>                 Key: HBASE-4306
>                 URL: https://issues.apache.org/jira/browse/HBASE-4306
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0, 0.90.5
>
>
> It is possible for the LoadBalancer to try to assign an offline/split region while it
is waiting to be CatalogJanitor'ed. It goes like this:
> {quote}
> 2011-08-25 00:32:07,137 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
parent: Daughters; d1, d2 from sv4r22s16,60020,1314211225331
> ...
> (cleaning never happens or whatever)
> ...
> 2011-08-29 13:45:14,561 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=parent,
src=sv4r22s16,60020,1314211225331, dest=sv4r19s17,60020,1314218170402
> 2011-08-29 13:45:14,561 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region parent (offlining)
> 2011-08-29 13:45:14,588 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server
serverName=sv4r22s16,60020,1314211225331, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException:
Received close for parent but we are not serving it for parent
> {quote}
> Here it took 4 days of balancing to finally get to try to balance the parent (that was
never deleted because of HBASE-4238), but it can also happen if the balancer decides to balance
the parent just before it's cleaned. The end effect is that the balancer will be disabled
_forever_ until that's fixed.
> The culprit here is that the master keeps the region "online" until AssignmentManager.regionOffline
is called by the CJ, which means it's still treated like any other region although it's offline.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message