hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-3362) If .META. offline between OPENING and OPENED, then wrong server location in .META. is possible
Date Wed, 15 Dec 2010 20:52:01 GMT
If .META. offline between OPENING and OPENED, then wrong server location in .META. is possible
----------------------------------------------------------------------------------------------

                 Key: HBASE-3362
                 URL: https://issues.apache.org/jira/browse/HBASE-3362
             Project: HBase
          Issue Type: Bug
            Reporter: stack
            Assignee: stack
            Priority: Critical
             Fix For: 0.90.0


This is a good one.  It happened to me testing OOME in split logging.

* Balancer moves region to new location, regionservrer X.
* New location regionserver X successfully opens the region and then goes to update .META.
* At this point, the server carrying .META. crashes.
* Regionserver X is stuck waiting on .META. to come back online.  It takes so long master
times out the region-in-transition
* Master assigns the region elsewhere to regionserver Y
* It opens successfully on regionserver Y and then it also parks waiting on .META. coming
online
* .META. comes online
* The two servers X and Y race to update .META.

I saw case where server X edit went in after server Ys edit which means that lookups in .META.
get the wrong server.  HBCK can detect this situation.

RegionServer X when it wakes up coreeclty notices that its lost control of the region but
the damage is done -- where damage is .META. edit.

Chatting with Jon, he suggested that regionserver X should 'rollback' the .META. edit -- do
explicit delete of what it added.  This would work I think but chatting more, I'll make a
fix that keeps updating the zookeeper OPENING state while edit goes on in a separate thread.
 Our continuous setting of OPENING will make it so region-in-transition does not timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message