hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elliott Clark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6461) Killing the HRegionServer and DataNode hosting ROOT can result in a malformed root table.
Date Fri, 27 Jul 2012 01:02:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423601#comment-13423601

Elliott Clark commented on HBASE-6461:

Small update:
I tried it again(well 30 times actually) with more logs enabled.  and I noticed this in the

eclark@sv4r11s38:/export1/eclark$ grep "recovery started" /export1/eclark/logs/hadoop-eclark-namenode-sv4r11s38.log

2012-07-27 00:39:45,094 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* blk_3380368109770176913_1021
recovery started, primary=

where the primary listed is the server that was just killed. and the block id is the id for
the RegionServer's hlog.  According to the comments around the log message the primary is
supposed to be an alive data node.  I'm wondering if this is an hdfs bug.  Thoughts ?
> Killing the HRegionServer and DataNode hosting ROOT can result in a malformed root table.
> -----------------------------------------------------------------------------------------
>                 Key: HBASE-6461
>                 URL: https://issues.apache.org/jira/browse/HBASE-6461
>             Project: HBase
>          Issue Type: Bug
>         Environment: hadoop-0.20.2-cdh3u3
> HBase 0.94.1 RC1
>            Reporter: Elliott Clark
>            Priority: Critical
>             Fix For: 0.94.2
> Spun up a new dfs on hadoop-0.20.2-cdh3u3
> Started hbase
> started running loadtest tool.
> killed rs and dn holding root with killall -9 java on server sv4r27s44 at about 2012-07-25
> After things stabilize Root is in a bad state. Ran hbck and got:
> Exception in thread "main" org.apache.hadoop.hbase.client.NoServerForRegionException:
No server address listed in -ROOT- for region .META.,,1.1028785192 containing row 
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1016)
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:841)
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:810)
> at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:232)
> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:172)
> at org.apache.hadoop.hbase.util.HBaseFsck.connect(HBaseFsck.java:241)
> at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3236)
> hbase(main):001:0> scan '-ROOT-'
> ROW                                           COLUMN+CELL                           
> 12/07/25 22:43:18 INFO security.UserGroupInformation: JAAS Configuration already set
up for Hadoop, not re-installing.
>  .META.,,1                                    column=info:regioninfo, timestamp=1343255838525,
value={NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192,}
>  .META.,,1                                    column=info:v, timestamp=1343255838525,

> 1 row(s) in 0.5930 seconds
> Here's the master log: https://gist.github.com/3179194
> I tried the same thing with 0.92.1 and I was able to get into a similar situation, so
I don't think this is anything new. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message