hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3010) Can't start/stop/start... cluster using new master
Date Fri, 17 Sep 2010 23:47:37 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910855#action_12910855
] 

HBase Review Board commented on HBASE-3010:
-------------------------------------------

Message from: "Todd Lipcon" <todd@cloudera.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/873/#review1267
-----------------------------------------------------------

Ship it!



src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
<http://review.cloudera.org/r/873/#comment4312>

    hrm, I guess that's a good idea, but something seems a little strange about this :)



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/873/#comment4313>

    this should probably move down until after we're the active master


- Todd





> Can't start/stop/start... cluster using new master
> --------------------------------------------------
>
>                 Key: HBASE-3010
>                 URL: https://issues.apache.org/jira/browse/HBASE-3010
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.90.0
>
>
> Currently you might start a small cluster the first time on TRUNK -- i.e. new master
-- but second time you do the startup you run into a couple of interesting issues:
> + The old root-region-location is still in place. It gets cleaned later but for a while
on startup it does not have the 'right' address.
> + Regionserver (or a client) on startup creates a catalogtracker, a class that notices
changes in meta tables keeping up catalog table locations.  Starting the catalogtracker results
in a check for current catalog locations.  As part of this process, since root-region-location
"exists", catalogtracker tries to verify root's location by doing a noop against root host,
only, to do this it needs to do the initial rpc proxy setup.  It can so happen that the old
root address was that of the current regionserver trying to initialize so we'll be trying
to connect to ourself to verify root location ONLY, we're doing this before we've setup the
rpcserver and handlers -- so we block, and as it happens there is no timeout on proxy setup
(Todd ran into this yesterday, I ran into it today -- its easy to manufacture).
> + So regionserver can't progress.  Meantime the master can't progress because there are
no regionservers checking in.  And you can't shut it down because we're not looking at the
right 'stop' flag

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message