hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From psy <psy...@163.com>
Subject HMaster and HRegionServer quit automately in a few seconds after HBase started.
Date Tue, 15 Jul 2014 13:13:06 GMT
Hi, everyone. I'm a student and I'm a beginner to HBase. This days I
meet a problem when I tried to run HBase in three machines. Hadoop run's
well, but when I start HBase, the "HMaster" in master node and
"HRegionServer" in slave nodes quit after a few seconds. In the master
node, jps is like this:

	hadoop@psyDebian:/opt$ jps
	5416 NameNode
	5647 SecondaryNameNode
	5505 DataNode
	398 Jps
	32745 HMaster
	32670 HQuorumPeer

and just for a while, it is like this:

	hadoop@psyDebian:/opt$ jps
	5416 NameNode
	5647 SecondaryNameNode
	5505 DataNode
	423 Jps
	32670 HQuorumPeer

the master log:

hadoop@psyDebian:/opt$ tail -n
30 /opt/hbase/logs/hbase-hadoop-master-psyDebian.log
2014-07-15 20:27:21,470 INFO  [main-SendThread(localhost:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
(unknown error)
2014-07-15 20:27:21,471 INFO  [main-SendThread(localhost:2181)]
zookeeper.ClientCnxn: Socket connection established to
localhost/127.0.0.1:2181, initiating session
2014-07-15 20:27:21,471 INFO  [main-SendThread(localhost:2181)]
zookeeper.ClientCnxn: Unable to read additional data from server
sessionid 0x0, likely server has closed socket, closing socket
connection and attempting reconnect
2014-07-15 20:27:21,572 WARN  [main] zookeeper.RecoverableZooKeeper:
Possibly transient ZooKeeper,
quorum=centos1:2181,psyDebian:2181,centos2:2181,
exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase
2014-07-15 20:27:21,572 ERROR [main] zookeeper.RecoverableZooKeeper:
ZooKeeper create failed after 4 attempts
2014-07-15 20:27:21,572 ERROR [main] master.HMasterCommandLine: Master
exiting
java.lang.RuntimeException: Failed construction of Master: class
org.apache.hadoop.hbase.master.HMaster
	at
org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2789)
	at
org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:186)
	at
org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:135)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at
org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
	at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2803)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
	at
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:489)
	at
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:468)
	at
org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:1241)
	at
org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:1219)
	at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:174)
	at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:167)
	at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:481)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
	at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:534)
	at
org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2784)



the "out" log:
hadoop@psyDebian:/opt$
tail /opt/hbase/logs/hbase-hadoop-master-psyDebian.out
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.


the "zookeeper" log:
2014-07-15 20:48:20,572 INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181]
quorum.FollowerZooKeeperServer: Shutting down
2014-07-15 20:48:20,573 INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181]
server.ZooKeeperServer: shutting down
2014-07-15 20:48:20,573 INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181]
quorum.QuorumPeer: LOOKING
2014-07-15 20:48:20,574 INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181]
quorum.QuorumPeer: acceptedEpoch not found! Creating with a reasonable
default of 0. This should only happen when you are upgrading your
installation
2014-07-15 20:48:20,625 INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181]
quorum.FastLeaderElection: New election. My id =  0, proposed zxid=0x0
2014-07-15 20:48:20,626 INFO  [WorkerReceiver[myid=0]]
quorum.FastLeaderElection: Notification: 0 (n.leader), 0x0 (n.zxid),
0x57 (n.round), LOOKING (n.state), 0 (n.sid), 0x0 (n.peerEPoch), LOOKING
(my state)
2014-07-15 20:48:20,627 INFO  [WorkerReceiver[myid=0]]
quorum.FastLeaderElection: Notification: 2 (n.leader), 0x0 (n.zxid),
0x55 (n.round), LEADING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING
(my state)
2014-07-15 20:48:20,627 INFO  [WorkerReceiver[myid=0]]
quorum.FastLeaderElection: Notification: 1 (n.leader), 0x0 (n.zxid),
0x56 (n.round), LEADING (n.state), 1 (n.sid), 0x0 (n.peerEPoch), LOOKING
(my state)
2014-07-15 20:48:20,827 INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181]
quorum.FastLeaderElection: Notification time out: 400
2014-07-15 20:48:20,827 INFO  [WorkerReceiver[myid=0]]
quorum.FastLeaderElection: Notification: 0 (n.leader), 0x0 (n.zxid),
0x57 (n.round), LOOKING (n.state), 0 (n.sid), 0x0 (n.peerEPoch), LOOKING
(my state)
2014-07-15 20:48:20,828 INFO  [WorkerReceiver[myid=0]]
quorum.FastLeaderElection: Notification: 2 (n.leader), 0x0 (n.zxid),
0x55 (n.round), LEADING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING
(my state)
2014-07-15 20:48:20,828 INFO  [WorkerReceiver[myid=0]]
quorum.FastLeaderElection: Notification: 1 (n.leader), 0x0 (n.zxid),
0x56 (n.round), LEADING (n.state), 1 (n.sid), 0x0 (n.peerEPoch), LOOKING
(my state)
2014-07-15 20:48:21,229 INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181]
quorum.FastLeaderElection: Notification time out: 800


These are my configuration files:

core-site.xml:
<configuration>
	<property>
		<name>fs.default.name</name>
		<value>hdfs://psyDebian:9000</value>
	</property>

	<property>
		<name>hadoop.tmp.dir</name>
		<value>/home/hadoop/hadoop_tmp</value>
	</property>
</configuration>

hdfs-site.xml:
<configuration>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/home/hadoop/hadoop_tmp/dfs/data</value>
	</property>

	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/home/hadoop/hadoop_tmp/dfs/name</value>
	</property>

	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
</configuration>

hbase-site.xml:
<configuration>
	<property>
		<name>hbase.rootdir</name>
		<value>hdfs://psyDebian:9000/hbase</value>
	</property>

	<property>
		<name>hbase.cluster.distributed</name>
		<value>true</value>
	</property>

	<property>
		<name>hbase.master</name>
		<value>psyDebian:60000</value>
	</property>

	<property>
		<name>hbase.zookeeper.quorum</name>
		<value>psyDebian,centos1,centos2</value>
	</property>

	<property>
		<name>hbase.zookeeper.property.dataDir</name>
		<value>/home/hadoop/zookeeper_tmp</value>
	</property>

	<property>
		<name>zookeeper.session.timeout</name>
		<value>90000</value>
	</property>

	<property>
		<name>hbase.reginserver.restart.on.zk.expire</name>
		<value>true</value>
	</property>
</configuration>



The master node is Debian 7.5, and two slaves are both centos 6.5.
Hadoop is 2.2.0 and Hbase is 0.98.3. The time of three machines are
synchronized and firewalls(iptables) are closed. Java's version is
java-1.6.0-openjdk. I'm not very familiar with HBase so I can't
understand the ERRORS from the logs, and I didn't get any useful
information from the Internet these days. could you help me? or tells me
what should I do to find out the reason of this problem?
Thank you so much.



Mime
View raw message