hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "gaojinchao (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4511) There is data loss when master failovers
Date Mon, 17 Oct 2011 01:20:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128554#comment-13128554
] 

gaojinchao commented on HBASE-4511:
-----------------------------------

Ihis cannot be reproduced in real cluster and downgrade its priority.
                
> There is data loss when master failovers
> ----------------------------------------
>
>                 Key: HBASE-4511
>                 URL: https://issues.apache.org/jira/browse/HBASE-4511
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.92.0
>            Reporter: gaojinchao
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: org.apache.hadoop.hbase.master.TestMasterFailover-output.rar
>
>
> It goes like this:
> Master crashed ,  at the same time RS with meta is crashing, but RS doesn't eixt.
> Master startups again and finds all living RS. 
> Master verifies the meta failed,  because this RS is crashing.
> Master reassigns the meta, but it doesn't split the Hlog. 
> So some meta data is loss.
> About the logs of a failover test case fail. 
> //It said that we want to kill a RS
> 2011-09-28 19:54:45,694 INFO  [Thread-988] regionserver.HRegionServer(1443): STOPPED:
Killing for unit test
> 2011-09-28 19:54:45,694 INFO  [Thread-988] master.TestMasterFailover(1007): 
> RS 192.168.2.102,54385,1317264874629 killed 
> //Rs didn't crash. 
> 2011-09-28 19:54:51,763 INFO  [Master:0;192.168.2.102,54557,1317264885720] master.HMaster(458):
Registering server found up in zk: 192.168.2.102,54385,1317264874629
> 2011-09-28 19:54:51,763 INFO  [Master:0;192.168.2.102,54557,1317264885720] master.ServerManager(232):
Registering server=192.168.2.102,54385,1317264874629
> 2011-09-28 19:54:51,770 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKUtil(491):
master:54557-0x132b31adbb30005 Unable to get data of znode /hbase/unassigned/1028785192 because
node does not exist (not an error)
> 2011-09-28 19:54:51,771 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKUtil(1003):
master:54557-0x132b31adbb30005 Retrieved 33 byte(s) of data from znode /hbase/root-region-server
and set watcher; 192.168.2.102,54383,131726487...
> //Meta verification failed and ressigned the meta. So all the regions in the meta is
loss.
> 2011-09-28 19:54:51,773 INFO  [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(476):
Failed verification of .META.,,1 at address=192.168.2.102,54385,1317264874629; org.apache.hadoop.hbase.regionserver.RegionServerStoppedException:
org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 192.168.2.102,54385,1317264874629
not running, aborting
> 2011-09-28 19:54:51,773 DEBUG [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(316):
new .META. server: 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
> 2011-09-28 19:54:52,274 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKUtil(1003):
master:54557-0x132b31adbb30005 Retrieved 33 byte(s) of data from znode /hbase/root-region-server
and set watcher; 192.168.2.102,54383,131726487...
> 2011-09-28 19:54:52,277 INFO  [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(476):
Failed verification of .META.,,1 at address=192.168.2.102,54385,1317264874629; org.apache.hadoop.hbase.regionserver.RegionServerStoppedException:
org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 192.168.2.102,54385,1317264874629
not running, aborting
> 2011-09-28 19:54:52,277 DEBUG [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(316):
new .META. server: 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
> 2011-09-28 19:54:52,778 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKUtil(1003):
master:54557-0x132b31adbb30005 Retrieved 33 byte(s) of data from znode /hbase/root-region-server
and set watcher; 192.168.2.102,54383,131726487...
> 2011-09-28 19:54:52,782 INFO  [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(476):
Failed verification of .META.,,1 at address=192.168.2.102,54385,1317264874629; org.apache.hadoop.hbase.regionserver.RegionServerStoppedException:
org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 192.168.2.102,54385,1317264874629
not running, aborting
> 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(316):
new .META. server: 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
> 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKAssign(264):
master:54557-0x132b31adbb30005 Creating (or updating) unassigned node for 1028785192 with
OFFLINE state
> 2011-09-28 19:54:52,825 DEBUG [Thread-988-EventThread] zookeeper.ZooKeeperWatcher(233):
master:54557-0x132b31adbb30005 Received ZooKeeper Event, type=NodeCreated, state=SyncConnected,
path=/hbase/unassigned/1028785192
> //It said that Master clean the cluster.
> 2011-09-28 19:54:52,889 INFO  [Master:0;192.168.2.102,54557,1317264885720] master.AssignmentManager(383):
Clean cluster startup. Assigning userregions
> 2011-09-28 19:54:52,889 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKAssign(494):
master:54557-0x132b31adbb30005 Deleting any existing unassigned nodes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message