hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2441) ZK failures early in RS startup sequence cause infinite busy loop
Date Wed, 14 Apr 2010 01:25:52 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856719#action_12856719

Todd Lipcon commented on HBASE-2441:

I think I caused this by starting a RS while the master was down, and then killing ZK. First
got the NPE because metrics wasn't initialized yet when abort() came:
2010-04-13 17:40:28,495 ERROR org.apache.zookeeper.ClientCnxn: Error while calling watcher

        at org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1263)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:373)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
and then looped forever with:
2010-04-13 18:00:19,158 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Start code
already taken, trying another one
2010-04-13 18:00:19,158 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to
create /hbase/rs -- check quorum servers, currently=monster01.sf.cloudera.com:2222
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
for /hbase/rs
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:405)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeRSLocation(ZooKeeperWrapper.java:586)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1339)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:428)
        at java.lang.Thread.run(Thread.java:619)

> ZK failures early in RS startup sequence cause infinite busy loop
> -----------------------------------------------------------------
>                 Key: HBASE-2441
>                 URL: https://issues.apache.org/jira/browse/HBASE-2441
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
> If the RS loses its ZK session before it reports for duty, the abort() call will trigger
an NPE, and then the stop boolean doesn't get toggled. The RS will then loop forever trying
to register itself in the expired ZK session, and fill up the logs.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message