hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bogdan Ghidireac <bog...@ecstend.com>
Subject zookeeper connection hangs during shutdown
Date Mon, 04 Apr 2011 09:30:27 GMT
Hi,

I have a cluster of 90 servers (HBase 0.90.1, Hadoop 0.20-append) that
runs a write-intensive MapReduce job. Occasionally one or maybe more
region servers run out of memory and they try to shut down but the
operation does not always succeed so they get stuck.

If I dump the JVM threads in a console, it looks like the region
server wants to close all zookeeper connections and blocks until this
is done.
HRegionServer.java:672 --> HConnectionManager.deleteConnection(conf, true);

"regionserver60020-EventThread" daemon prio=10 tid=0x000000005d0dc000
nid=0x7e1d waiting on condition [0x0000000042941000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000781c9ce00> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)

"regionserver60020" prio=10 tid=0x00002aaab023e000 nid=0x7e1b in
Object.wait() [0x000000004273f000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000007be9ad330> (a org.apache.zookeeper.ClientCnxn$Packet)
	at java.lang.Object.wait(Object.java:485)
	at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1317)
	- locked <0x00000007be9ad330> (a org.apache.zookeeper.ClientCnxn$Packet)
	at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1295)
	at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:531)
	- locked <0x0000000781fae170> (a org.apache.zookeeper.ZooKeeper)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.close(ZooKeeperWatcher.java:399)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.close(HConnectionManager.java:1050)
	at org.apache.hadoop.hbase.client.HConnectionManager.deleteConnection(HConnectionManager.java:175)
	- locked <0x00000007801b6800> (a
org.apache.hadoop.hbase.client.HConnectionManager$1)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
	at java.lang.Thread.run(Thread.java:662)

"main-EventThread" daemon prio=10 tid=0x00002aaab0540000 nid=0x7e10
waiting on condition [0x0000000042139000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000781faf3e0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)

Unfortunately, some connections are never closed so the server does
not shut down.

Is is possible to add a timeout and then force a System.exit() ?

Bogdan

Mime
View raw message