zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Stribling <st...@nicira.com>
Subject uncaught exception handler
Date Tue, 03 Apr 2012 23:51:13 GMT
I'm curious about the origin of the uncaught exception handler that sits 
in NIOServerCnxn (looking at ZK 3.3.5).  It just logs the exception to 
log.error.  I wonder if it makes sense instead to do a System.exit(1) if 
the exception is an OutOfMemoryError (or perhaps a java.lang.Error in 
general, since those are not supposed to be caught).

I ask because our use of Zookeeper embeds it in a process where some 
other code can cause the JVM to hit its memory limit.  Instead of trying 
to soldier on in the face of adversity like this, it seems better for 
the whole process to come crashing down, to allow whatever monitor 
process is in place to restart the JVM.  When the process just logs and 
ignores errors like this, it seems to lead to the ZK servers being 
unable to make a quorum, even though they are up and running.

Here's a sample backtrace I've seen:

2012-04-03 19:40:03,643 600695063 [QuorumPeer:/172.29.1.220:2888] ERROR 
org.apache.zookeeper.server.NIOServerCnxn  - Thread 
Thread[QuorumPeer:/172.29.1.220:2888,5,main] died
java.lang.OutOfMemoryError: GC overhead limit exceeded
         at 
org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:102)
         at 
org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:232)
         at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
         at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
         at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
         at 
org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
         at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:131)
         at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
         at 
org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:242)
         at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:279)
         at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:658)

Any thoughts?  Happy to create a JIRA and possibly a patch if there's 
interest.  Thanks,

Jeremy

Mime
View raw message