hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "who.cat" <who....@qq.com>
Subject HBase resgionServer crashed with no gc detected
Date Wed, 19 Oct 2016 02:58:54 GMT
Hi all:
I've a  HDP big data cluster with 4 nodes and create by Ambari  the HBase is        1.1.2.
As running YCSB for benchmark the RegionServer instance or the Hmaster instance crashes which
it's logs shows:

---------------------log start ---------------------
2016-10-12 23:48:13,591 INFO  [main-SendThread(Node1:2181)] zookeeper.ClientCnxn: Unable to
read additional data from server sessionid 0x157b7f5f0bc0005, likely server has closed socket,
closing socket connection and attempting reconnect
2016-10-12 23:48:13,595 INFO  [HBase-Metrics2-1] impl.MetricsSinkAdapter: Sink timeline started
2016-10-12 23:48:13,606 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: Scheduled snapshot
period at 10 second(s).
2016-10-12 23:48:13,606 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system
started
2016-10-12 23:48:14,496 INFO  [main-SendThread(Node4:2181)] zookeeper.ClientCnxn: Opening
socket connection to server Node4/1.1.6.104:2181. Will not attempt to authenticate using SASL
(unknown error)
2016-10-12 23:48:14,506 INFO  [main-SendThread(Node4:2181)] zookeeper.ClientCnxn: Socket connection
established to Node4/1.17.6.104:2181, initiating session
2016-10-12 23:48:14,517 INFO  [main-SendThread(Node4:2181)] zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x157b7f5f0bc0005 has expired, closing socket connection
2016-10-12 23:48:14,517 FATAL [main-EventThread] regionserver.HRegionServer: ABORTING region
server node1,16020,1476260847716: regionserver:16020-0x157b7f5f0bc0005, quorum=node2:2181,node1:2181,node4:2181,
baseZNode=/hbase-unsecure regionserver:16020-0x157b7f5f0bc0005 received expired from ZooKeeper,
aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:585)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:517)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-12 23:48:14,518 FATAL [main-EventThread] regionserver.HRegionServer: RegionServer
abort: loaded coprocessors are: [org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint]
---------------------log end---------------------

After checked the log ,it shows  that the region server jvm paused a long time and the zkclient
cannot send heartbeats, the session times out Which the 'reference guide' had descripted http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 .So a read the log detail and to find the  java GC event  but there's no  full gc occurred.

And more a found the same symptom in the  DataNode instance .

The node os is Centos7 maybe the  kernel  futex bug  ,after checking the bug was fixed in
my OS .
 There's any other factor caused the problem except java GC?
Anyone who got the same problem ? Any ideas ?
Thank you .
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message