zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Jaton <benjamin.ja...@gmail.com>
Subject Unable to recover for 1 failing node in 3 nodes ensemble
Date Tue, 13 Jan 2015 01:15:20 GMT
Hello,

I have a 3 nodes ensemble that stopped working after node2 ran out of space
on its disk:

2015-01-12 11:44:52,398 [myid:2] - ERROR
[SyncThread:2:SyncRequestProcessor@183] - Severe unrecoverable error,
exiting
java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:345)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
    at java.io.DataOutputStream.write(DataOutputStream.java:107)
    at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
    at
org.apache.jute.BinaryOutputArchive.writeBuffer(BinaryOutputArchive.java:119)
    at
org.apache.zookeeper.server.persistence.Util.writeTxnBytes(Util.java:277)
    at
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:224)
    at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:314)
    at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:476)
    at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:140)
2015-01-12 11:54:50,759 [myid:2] - ERROR [Thread-2:Util@239] - Last
transaction was partial.

So far, makes sense that node2 would just stop there.

But then I would expect node1 and node3 to start off where node2 left, i.e.
leader election and resume ZK service.

However that doesn't seem to be happening. The service doesn't come back up
and I can't figure out what's wrong from the logs.

Thanks for the help!

Mime
View raw message