hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8422) Master won't go down. Stuck waiting on .META. to come on line.
Date Thu, 25 Apr 2013 22:40:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642322#comment-13642322
] 

Lars Hofhansl commented on HBASE-8422:
--------------------------------------

Does this have any negative side-effects? Right now the master would wait forever for a regionserver
to come online. What if (with this patch) a regionserver comes online much later (say an hour
later)? Will the master be able to continue?

Particularly worried about this part:
{code}
+    // If no region server is online then master may stuck waiting on -ROOT- and .META. to
come on
+    // line. See HBASE-8422.
+    if (this.catalogTracker != null && this.serverManager.getOnlineServers().isEmpty())
{
+      this.catalogTracker.stop();
+    }
{code}
(Is that even needed?)

                
> Master won't go down.  Stuck waiting on .META. to come on line.
> ---------------------------------------------------------------
>
>                 Key: HBASE-8422
>                 URL: https://issues.apache.org/jira/browse/HBASE-8422
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.95.0
>            Reporter: stack
>            Assignee: rajeshbabu
>             Fix For: 0.98.0, 0.94.8, 0.95.1
>
>         Attachments: HBASE-8422_2.patch, HBASE-8422_3.patch, HBASE-8422_94.patch, HBASE-8422.patch
>
>
> Master came up w/ no regionservers.  I then tried to shut it down.  You can see in below
that it started to go down....
> {code}
> 2013-04-24 14:28:49,770 INFO  [IPC Server handler 7 on 60000] org.apache.hadoop.hbase.master.HMaster:
Cluster shutdown requested
> 2013-04-24 14:28:49,815 INFO  [master-stack-1.ent.cloudera.com,60000,1366838923135] org.apache.hadoop.hbase.master.ServerManager:
Finished waiting for region servers count to settle; checked in 0, slept for 2818 ms, expecting
minimum of 1, maximum of 2147483647, master is stopped.
> 2013-04-24 14:28:49,815 WARN  [master-stack-1.ent.cloudera.com,60000,1366838923135] org.apache.hadoop.hbase.master.MasterFileSystem:
Master stopped while splitting logs
> 2013-04-24 14:28:50,104 INFO  [stack-1.ent.cloudera.com,60000,1366838923135.splitLogManagerTimeoutMonitor]
org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor: stack-1.ent.cloudera.com,60000,1366838923135.splitLogManagerTimeoutMonitor
exiting
> 2013-04-24 14:28:50,850 INFO  [master-stack-1.ent.cloudera.com,60000,1366838923135] org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker:
Unsetting META region location in ZooKeeper
> 2013-04-24 14:28:50,884 WARN  [master-stack-1.ent.cloudera.com,60000,1366838923135] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
Node /hbase/meta-region-server already deleted, retry=false
> 2013-04-24 14:28:50,884 INFO  [master-stack-1.ent.cloudera.com,60000,1366838923135] org.apache.hadoop.hbase.master.AssignmentManager:
Cluster shutdown is set; skipping assign of .META.,,1.1028785192
> 2013-04-24 14:28:50,884 INFO  [master-stack-1.ent.cloudera.com,60000,1366838923135] org.apache.hadoop.hbase.master.ServerManager:
AssignmentManager hasn't finished failover cleanup
> 2013-04-24 14:29:46,188 INFO  [master-stack-1.ent.cloudera.com,60000,1366838923135.oldLogCleaner]
org.apache.hadoop.hbase.master.cleaner.LogCleaner: master-stack-1.ent.cloudera.com,60000,1366838923135.oldLogCleaner
exiting
> 2013-04-24 14:29:46,193 INFO  [master-stack-1.ent.cloudera.com,60000,1366838923135.archivedHFileCleaner]
org.apache.hadoop.hbase.master.cleaner.HFileCleaner: master-stack-1.ent.cloudera.com,60000,1366838923135.archivedHFileCleaner
exiting
> {code}
> ... but not it is stuck.
> We keep looping here:
> {code}
> "master-stack-1.ent.cloudera.com,60000,1366838923135" prio=10 tid=0x00007f154853f000
nid=0x18b in Object.wait() [0x00007f1545fde000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000000c727d738> (a org.apache.hadoop.hbase.zookeeper.MetaRegionTracker)
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:161)
>         - locked <0x00000000c727d738> (a org.apache.hadoop.hbase.zookeeper.MetaRegionTracker)
>         at org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.waitMetaRegionLocation(MetaRegionTracker.java:105)
>         at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:250)
>         at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:299)
>         at org.apache.hadoop.hbase.master.HMaster.enableSSHandWaitForMeta(HMaster.java:905)
>         at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:879)
>         at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:764)
>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:522)
>         at java.lang.Thread.run(Thread.java:722)
> {code}
> Odd.  It is supposed to be checking the 'stopped' flag; maybe it has wrong stop flag.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message