hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8422) Master won't go down. Stuck waiting on .META. to come on line.
Date Thu, 25 Apr 2013 23:04:17 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642348#comment-13642348
] 

stack commented on HBASE-8422:
------------------------------

[~lhofhansl] This fixes issue where I shut down a master that was waiting on regionservers.
 It would not go down.

Master should stay up and wait for ever as it used to with this patch in place.

Regarding the worrisome bit of code, I think your worries will be alleviated if you check
where the code is located: i.e. it is run as part of our shutdown on our way down.
                
> Master won't go down.  Stuck waiting on .META. to come on line.
> ---------------------------------------------------------------
>
>                 Key: HBASE-8422
>                 URL: https://issues.apache.org/jira/browse/HBASE-8422
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.95.0
>            Reporter: stack
>            Assignee: rajeshbabu
>             Fix For: 0.98.0, 0.94.8, 0.95.1
>
>         Attachments: HBASE-8422_2.patch, HBASE-8422_3.patch, HBASE-8422_94.patch, HBASE-8422.patch
>
>
> Master came up w/ no regionservers.  I then tried to shut it down.  You can see in below
that it started to go down....
> {code}
> 2013-04-24 14:28:49,770 INFO  [IPC Server handler 7 on 60000] org.apache.hadoop.hbase.master.HMaster:
Cluster shutdown requested
> 2013-04-24 14:28:49,815 INFO  [master-stack-1.ent.cloudera.com,60000,1366838923135] org.apache.hadoop.hbase.master.ServerManager:
Finished waiting for region servers count to settle; checked in 0, slept for 2818 ms, expecting
minimum of 1, maximum of 2147483647, master is stopped.
> 2013-04-24 14:28:49,815 WARN  [master-stack-1.ent.cloudera.com,60000,1366838923135] org.apache.hadoop.hbase.master.MasterFileSystem:
Master stopped while splitting logs
> 2013-04-24 14:28:50,104 INFO  [stack-1.ent.cloudera.com,60000,1366838923135.splitLogManagerTimeoutMonitor]
org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor: stack-1.ent.cloudera.com,60000,1366838923135.splitLogManagerTimeoutMonitor
exiting
> 2013-04-24 14:28:50,850 INFO  [master-stack-1.ent.cloudera.com,60000,1366838923135] org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker:
Unsetting META region location in ZooKeeper
> 2013-04-24 14:28:50,884 WARN  [master-stack-1.ent.cloudera.com,60000,1366838923135] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
Node /hbase/meta-region-server already deleted, retry=false
> 2013-04-24 14:28:50,884 INFO  [master-stack-1.ent.cloudera.com,60000,1366838923135] org.apache.hadoop.hbase.master.AssignmentManager:
Cluster shutdown is set; skipping assign of .META.,,1.1028785192
> 2013-04-24 14:28:50,884 INFO  [master-stack-1.ent.cloudera.com,60000,1366838923135] org.apache.hadoop.hbase.master.ServerManager:
AssignmentManager hasn't finished failover cleanup
> 2013-04-24 14:29:46,188 INFO  [master-stack-1.ent.cloudera.com,60000,1366838923135.oldLogCleaner]
org.apache.hadoop.hbase.master.cleaner.LogCleaner: master-stack-1.ent.cloudera.com,60000,1366838923135.oldLogCleaner
exiting
> 2013-04-24 14:29:46,193 INFO  [master-stack-1.ent.cloudera.com,60000,1366838923135.archivedHFileCleaner]
org.apache.hadoop.hbase.master.cleaner.HFileCleaner: master-stack-1.ent.cloudera.com,60000,1366838923135.archivedHFileCleaner
exiting
> {code}
> ... but not it is stuck.
> We keep looping here:
> {code}
> "master-stack-1.ent.cloudera.com,60000,1366838923135" prio=10 tid=0x00007f154853f000
nid=0x18b in Object.wait() [0x00007f1545fde000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000000c727d738> (a org.apache.hadoop.hbase.zookeeper.MetaRegionTracker)
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:161)
>         - locked <0x00000000c727d738> (a org.apache.hadoop.hbase.zookeeper.MetaRegionTracker)
>         at org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.waitMetaRegionLocation(MetaRegionTracker.java:105)
>         at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:250)
>         at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:299)
>         at org.apache.hadoop.hbase.master.HMaster.enableSSHandWaitForMeta(HMaster.java:905)
>         at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:879)
>         at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:764)
>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:522)
>         at java.lang.Thread.run(Thread.java:722)
> {code}
> Odd.  It is supposed to be checking the 'stopped' flag; maybe it has wrong stop flag.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message