hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-3039) Stuck in regionsInTransition because rebalance came in at same time as a split
Date Sat, 25 Sep 2010 22:00:32 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-3039:
-------------------------

    Attachment: 3039.txt

Here is fix... remove stuff from regionsintransition on receipt of split message.  This will
do for now but I think there are likely other holes in state transition probably around split
since this is the one action the master does not control.  Plugging the holes is easier in
new master.  Just have to find them.

Here is what patch does.  I'm testing it now.

{code}

M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  Add region name to warning log message (w/o it message is no good).
M src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
  Add src of split message else need to deduce where it came from by looking
  elsewhere.
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
  Updated log messages to include region and where appropritate source
  server name; debug hard w/o
  Changed regionOnline and regionOffline to check for unexpected
  states and log warnings rather than proceed regardless.
  Added in fix for concurrent balance+split; split message now
  updates regionsintransition where previous it did not.
  Remove checkRegion method.  Its a reimplementation of
  what regionOnline and regionOffline do only less comprehensive
  regards what gets updated (this.regions + this.servers rather
  than this.regions, this.servers and regionsInTransition)
  That they were less comprehensive is root of this bug.
M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
  Make the message about why we are not running balancer richer
  (print out how many reigons in transition and more of the
  regionsintrnasition list).
M src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionData.java
  Javadoc and minor formatting.
{code}

> Stuck in regionsInTransition because rebalance came in at same time as a split
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-3039
>                 URL: https://issues.apache.org/jira/browse/HBASE-3039
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: stack
>             Fix For: 0.90.0
>
>         Attachments: 3039.txt
>
>
> Saw this doing cluster tests:
> {code}
> 2010-09-25 21:31:48,212 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer
because regions in transition: {73781e505e452221c9cd0e03585eb5d1=usertable,user800184056,

> 128...
> {code}
> Here's the problem:
> {code}
> 2010-09-25 08:16:48,186 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=usertable,user800184056,1285397376525.73781e505e452221c9cd0e03585eb5d1.,
src=su184,60020,      
> 1285371621579, dest=sv2borg189,60020,1285371621577
> 2010-09-25 08:16:48,186 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region usertable,user800184056,1285397376525.                            
  
> 73781e505e452221c9cd0e03585eb5d1. (offlining)
> 2010-09-25 08:16:52,656 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
usertable,user800184056,1285397376525.73781e505e452221c9cd0e03585eb5d1.:           
> Daughters; usertable,user800184056,1285402609029.c05825561e7ea3cc6507c70bfb21541a., usertable,user804024623,1285402609029.28f64903a7875bdafc1e7ee344b225b0.
> 2010-09-25 08:17:11,414 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions
in transition timed out:  usertable,user800184056,1285397376525.                         
    
> 73781e505e452221c9cd0e03585eb5d1. state=PENDING_CLOSE, ts=1285402608186
> {code}
> ....just as we were doing a balance, the region split.
> Over on RS, I see the split starting up and then in comes the balance 'close' message.
 By the time the close handler runs on regionserver the split is well underway and close handler
actually doesn't find an online region to split.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message