hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Liochon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-8871) The region server can crash at startup
Date Thu, 04 Jul 2013 13:25:47 GMT

     [ https://issues.apache.org/jira/browse/HBASE-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nicolas Liochon updated HBASE-8871:
-----------------------------------

    Description: 
I have this stack when I start a fresh region server. 5% of the time I would say (per region
server).

{code}
2013-07-04 12:00:22,609 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region
server ip-10-137-7-67.ec2.internal,60020,1372939221819: Initialization of RS failed.  Hence
aborting RS.
java.util.ConcurrentModificationException
	at java.util.Hashtable$Enumerator.next(Hashtable.java:1200)
	at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1820)
	at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:92)
	at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:267)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:158)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:133)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:667)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:647)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:778)
	at java.lang.Thread.run(Thread.java:722)
2013-07-04 12:00:22,614 FATAL [regionserver60020] regionserver.HRegionServer: RegionServer
abort: loaded coprocessors are: []
2013-07-04 12:00:22,614 INFO  [regionserver60020] regionserver.HRegionServer: STOPPED: Initialization
of RS failed.  Hence aborting RS.
2013-07-04 12:00:22,616 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region
server ip-10-137-7-67.ec2.internal,60020,1372939221819: Unhandled: null
java.lang.NullPointerException
	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:798)
	at java.lang.Thread.run(Thread.java:722)
2013-07-04 12:00:22,616 FATAL [regionserver60020] regionserver.HRegionServer: RegionServer
abort: loaded coprocessors are: []
2013-07-04 12:00:22,617 INFO  [regionserver60020] regionserver.HRegionServer: STOPPED: Unhandled:
null
2013-07-04 12:00:22,767 INFO  [main] regionserver.ShutdownHook: Installed shutdown hook thread:
Shutdownhook:regionserver60020
2013-07-04 12:00:22,768 ERROR [main] regionserver.HRegionServerCommandLine: Region server
exiting
java.lang.RuntimeException: HRegionServer Aborted
	at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:66)
	at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:85)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:78)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2309)
2013-07-04 12:00:22,770 INFO  [Thread-4] regionserver.ShutdownHook: Shutdown hook starting;
hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@21f0dbb9
{code}

There is one bug here in the region server: we should not start the snapshot if the machine
is supposed to stop:

{code}
      // start the snapshot handler, since the server is ready to run
      this.snapshotManager.start();
{code}

and the root cause is here in ZKConfig:
{code}
    for (Entry<String, String> entry : conf) { // <=== BUG
      String key = entry.getKey();
      if (key.startsWith(HConstants.ZK_CFG_PROPERTY_PREFIX)) {
        String zkKey = key.substring(HConstants.ZK_CFG_PROPERTY_PREFIX_LEN);
        String value = entry.getValue();
        // If the value has variables substitutions, need to do a get.
        if (value.contains(VARIABLE_START)) {
          value = conf.get(key);
        }
        zkProperties.put(zkKey, value);
      }
{code}

  was:
I've got this stack when I start a fresh region server. 5% of the time I would say (per region
server).

{code}
2013-07-04 12:00:22,609 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region
server ip-10-137-7-67.ec2.internal,60020,1372939221819: Initialization of RS failed.  Hence
aborting RS.
java.util.ConcurrentModificationException
	at java.util.Hashtable$Enumerator.next(Hashtable.java:1200)
	at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1820)
	at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:92)
	at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:267)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:158)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:133)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:667)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:647)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:778)
	at java.lang.Thread.run(Thread.java:722)
2013-07-04 12:00:22,614 FATAL [regionserver60020] regionserver.HRegionServer: RegionServer
abort: loaded coprocessors are: []
2013-07-04 12:00:22,614 INFO  [regionserver60020] regionserver.HRegionServer: STOPPED: Initialization
of RS failed.  Hence aborting RS.
2013-07-04 12:00:22,616 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region
server ip-10-137-7-67.ec2.internal,60020,1372939221819: Unhandled: null
java.lang.NullPointerException
	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:798)
	at java.lang.Thread.run(Thread.java:722)
2013-07-04 12:00:22,616 FATAL [regionserver60020] regionserver.HRegionServer: RegionServer
abort: loaded coprocessors are: []
2013-07-04 12:00:22,617 INFO  [regionserver60020] regionserver.HRegionServer: STOPPED: Unhandled:
null
2013-07-04 12:00:22,767 INFO  [main] regionserver.ShutdownHook: Installed shutdown hook thread:
Shutdownhook:regionserver60020
2013-07-04 12:00:22,768 ERROR [main] regionserver.HRegionServerCommandLine: Region server
exiting
java.lang.RuntimeException: HRegionServer Aborted
	at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:66)
	at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:85)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:78)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2309)
2013-07-04 12:00:22,770 INFO  [Thread-4] regionserver.ShutdownHook: Shutdown hook starting;
hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@21f0dbb9
{code}

There is one bug here in the region server: we should not start the snapshot if the machine
is supposed to stop:

{code}
      // start the snapshot handler, since the server is ready to run
      this.snapshotManager.start();
{code}

and the root cause is here in ZKConfig:
{code}
    for (Entry<String, String> entry : conf) { // <=== BUG
      String key = entry.getKey();
      if (key.startsWith(HConstants.ZK_CFG_PROPERTY_PREFIX)) {
        String zkKey = key.substring(HConstants.ZK_CFG_PROPERTY_PREFIX_LEN);
        String value = entry.getValue();
        // If the value has variables substitutions, need to do a get.
        if (value.contains(VARIABLE_START)) {
          value = conf.get(key);
        }
        zkProperties.put(zkKey, value);
      }
{code}

    
> The region server can crash at startup
> --------------------------------------
>
>                 Key: HBASE-8871
>                 URL: https://issues.apache.org/jira/browse/HBASE-8871
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.98.0, 0.95.2
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>         Attachments: 8871.v1.patch
>
>
> I have this stack when I start a fresh region server. 5% of the time I would say (per
region server).
> {code}
> 2013-07-04 12:00:22,609 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING
region server ip-10-137-7-67.ec2.internal,60020,1372939221819: Initialization of RS failed.
 Hence aborting RS.
> java.util.ConcurrentModificationException
> 	at java.util.Hashtable$Enumerator.next(Hashtable.java:1200)
> 	at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1820)
> 	at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:92)
> 	at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:267)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:158)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:133)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:667)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:647)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:778)
> 	at java.lang.Thread.run(Thread.java:722)
> 2013-07-04 12:00:22,614 FATAL [regionserver60020] regionserver.HRegionServer: RegionServer
abort: loaded coprocessors are: []
> 2013-07-04 12:00:22,614 INFO  [regionserver60020] regionserver.HRegionServer: STOPPED:
Initialization of RS failed.  Hence aborting RS.
> 2013-07-04 12:00:22,616 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING
region server ip-10-137-7-67.ec2.internal,60020,1372939221819: Unhandled: null
> java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:798)
> 	at java.lang.Thread.run(Thread.java:722)
> 2013-07-04 12:00:22,616 FATAL [regionserver60020] regionserver.HRegionServer: RegionServer
abort: loaded coprocessors are: []
> 2013-07-04 12:00:22,617 INFO  [regionserver60020] regionserver.HRegionServer: STOPPED:
Unhandled: null
> 2013-07-04 12:00:22,767 INFO  [main] regionserver.ShutdownHook: Installed shutdown hook
thread: Shutdownhook:regionserver60020
> 2013-07-04 12:00:22,768 ERROR [main] regionserver.HRegionServerCommandLine: Region server
exiting
> java.lang.RuntimeException: HRegionServer Aborted
> 	at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:66)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:85)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:78)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2309)
> 2013-07-04 12:00:22,770 INFO  [Thread-4] regionserver.ShutdownHook: Shutdown hook starting;
hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@21f0dbb9
> {code}
> There is one bug here in the region server: we should not start the snapshot if the machine
is supposed to stop:
> {code}
>       // start the snapshot handler, since the server is ready to run
>       this.snapshotManager.start();
> {code}
> and the root cause is here in ZKConfig:
> {code}
>     for (Entry<String, String> entry : conf) { // <=== BUG
>       String key = entry.getKey();
>       if (key.startsWith(HConstants.ZK_CFG_PROPERTY_PREFIX)) {
>         String zkKey = key.substring(HConstants.ZK_CFG_PROPERTY_PREFIX_LEN);
>         String value = entry.getValue();
>         // If the value has variables substitutions, need to do a get.
>         if (value.contains(VARIABLE_START)) {
>           value = conf.get(key);
>         }
>         zkProperties.put(zkKey, value);
>       }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message