hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2441) ZK failures early in RS startup sequence cause infinite busy loop
Date Wed, 14 Apr 2010 01:25:52 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856719#action_12856719
] 

Todd Lipcon commented on HBASE-2441:
------------------------------------

I think I caused this by starting a RS while the master was down, and then killing ZK. First
got the NPE because metrics wasn't initialized yet when abort() came:
{code}
2010-04-13 17:40:28,495 ERROR org.apache.zookeeper.ClientCnxn: Error while calling watcher

java.lang.NullPointerException
        at org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1263)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:373)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
{code}
and then looped forever with:
{code}
2010-04-13 18:00:19,158 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Start code
already taken, trying another one
2010-04-13 18:00:19,158 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to
create /hbase/rs -- check quorum servers, currently=monster01.sf.cloudera.com:2222
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
for /hbase/rs
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:405)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeRSLocation(ZooKeeperWrapper.java:586)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1339)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:428)
        at java.lang.Thread.run(Thread.java:619)
{code}

> ZK failures early in RS startup sequence cause infinite busy loop
> -----------------------------------------------------------------
>
>                 Key: HBASE-2441
>                 URL: https://issues.apache.org/jira/browse/HBASE-2441
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> If the RS loses its ZK session before it reports for duty, the abort() call will trigger
an NPE, and then the stop boolean doesn't get toggled. The RS will then loop forever trying
to register itself in the expired ZK session, and fill up the logs.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message