hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3047) If new master crashes, restart is messy
Date Wed, 29 Sep 2010 04:01:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916007#action_12916007
] 

HBase Review Board commented on HBASE-3047:
-------------------------------------------

Message from: stack@duboce.net

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/915/#review1353
-----------------------------------------------------------


Here's a few comments on yours.

Actually, testing this patch on cluster brought up some issues.  I think I should recast.
 I have some ideas on how.  v2 coming.  Will incorporate your belows.


trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<http://review.cloudera.org/r/915/#comment4495>

    I can change it (you get my intent but it still confused so I should change it).



trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/915/#comment4496>

    Yeah, what you say.  Let me fix up comments.



trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/915/#comment4497>

    will do


- stack





> If new master crashes, restart is messy
> ---------------------------------------
>
>                 Key: HBASE-3047
>                 URL: https://issues.apache.org/jira/browse/HBASE-3047
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.90.0
>
>         Attachments: 3047.txt
>
>
> If master crashes, the cluster-is-up flag is left stuck on.
> On restart of cluster, regionservers may come up before the master.  They'll have registered
themselves in zk by time the master assumes its role and master will think its joining an
up and running cluster when in fact this is a fresh startup.  Other probs. are that there'll
be a root region that is bad up in zk.  Same for meta and at moment we're not handling bad
root and meta very well.
> Here's sample of kinda of issues we're running into:
> {code}
> 2010-09-25 23:53:13,938 FATAL org.apache.hadoop.hbase.master.HMaster:
> Unhandled exception. Starting shutdown.
> java.io.IOException: Call to /10.20.20.188:60020 failed on local
> exception: java.io.IOException: Connection reset by peer
>    at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:781)
>    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:255)
>    at $Proxy1.getProtocolVersion(Unknown Source)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:412)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:388)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:435)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:345)
>    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:889)
>    at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:350)
>    at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:209)
>    at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:241)
>    at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:286)
>    at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:326)
>    at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:157)
>    at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:140)
>    at org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:753)
>    at org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:174)
>    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314)
> Caused by: java.io.IOException: Connection reset by peer
>    at sun.nio.ch.FileDispatcher.read0(Native Method)
>    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>    at sun.nio.ch.IOUtil.read(IOUtil.java:206)
> {code}
> Notice, we think its a case of processFailover so we think we can just scan meta to fixup
our inmemory picture of the running cluster, only the scan of meta fails because the meta
isn not assigned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message