hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3362) If .META. offline between OPENING and OPENED, then wrong server location in .META. is possible
Date Fri, 17 Dec 2010 01:15:03 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972331#action_12972331
] 

HBase Review Board commented on HBASE-3362:
-------------------------------------------

Message from: "Jonathan Gray" <jgray@apache.org>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1298/#review2103
-----------------------------------------------------------

Ship it!


it's getting pretty crazy but this looks good.

it's unfortunate we have all these extra node transitioning methods inside this class.  this
pattern of doing node transitions and tracking expected version is very common and we'll probably
have more of it so we should look at doing some kind of generic abstraction for that pattern
soon.

+1 for commit, thanks for the changes


trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
<http://review.cloudera.org/r/1298/#comment6561>

    typo 'initalizes' but good comment



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
<http://review.cloudera.org/r/1298/#comment6562>

    interesting thing is... we only use this progressable if we do a log replay.  in that
case, a region open is not really idempotent as we treat it here.
    
    outside scope of this jira but something to think about.


- Jonathan





> If .META. offline between OPENING and OPENED, then wrong server location in .META. is
possible
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3362
>                 URL: https://issues.apache.org/jira/browse/HBASE-3362
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.90.0
>
>
> This is a good one.  It happened to me testing OOME in split logging.
> * Balancer moves region to new location, regionservrer X.
> * New location regionserver X successfully opens the region and then goes to update .META.
> * At this point, the server carrying .META. crashes.
> * Regionserver X is stuck waiting on .META. to come back online.  It takes so long master
times out the region-in-transition
> * Master assigns the region elsewhere to regionserver Y
> * It opens successfully on regionserver Y and then it also parks waiting on .META. coming
online
> * .META. comes online
> * The two servers X and Y race to update .META.
> I saw case where server X edit went in after server Ys edit which means that lookups
in .META. get the wrong server.  HBCK can detect this situation.
> RegionServer X when it wakes up coreeclty notices that its lost control of the region
but the damage is done -- where damage is .META. edit.
> Chatting with Jon, he suggested that regionserver X should 'rollback' the .META. edit
-- do explicit delete of what it added.  This would work I think but chatting more, I'll make
a fix that keeps updating the zookeeper OPENING state while edit goes on in a separate thread.
 Our continuous setting of OPENING will make it so region-in-transition does not timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message