zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From R Krishna <krishna...@gmail.com>
Subject regarding zookeper cluster setup replication, config issues and inconsistent state
Date Fri, 13 May 2016 10:15:59 GMT
Just tried to setup a 2 zookeeper cluster for the first time one each for
my 2 Kafka broker cluster and came across following issues:
1. Do we have to specify a separate value in vim ./var/lib/zookeeper/myid
although they are separate machine instances?
2. I kept seeing Mode:standalone between the two servers although I saw
connectivity between these two. After restarts, I saw them go to
Follower/Leader.
/usr/share/zookeeper/bin/zkServer.sh status
    JMX enabled by default
    Using config: /etc/zookeeper/conf/zoo.cfg
    Mode: standalone
3. The data was completely inconsistent, I was able to connect to each one
run the all netcat status commands from the other server without any issue.
However, Kafka broker data was inconsistent and kept failing, is there a
way to confirm if both nodes are in sync and part of same cluster?
org.I0Itec.zkclient.exception.ZkNoNodeException:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
NoNode for /config/changes

4. Whenever I updated the .cfg file, I cannot do a sudo
/usr/share/zookeeper/bin/zkServer.sh restart, I have to force kill the pid,
in which case in brings up another process reading the latest .cfg, why is
this so?

5. I realized we need at least 3 to make an ensemble, so I created and
added another ZK host updated the .cfg and force killed the process so it
reads the latest config and started getting these exceptions. Yes, this
probably means I have run out of connections.

*And finally, how do I safely restart such a cluster when adding new nodes
and then force them to sync data?*

MASTER: 75: ::::::::::::::::::::::::::::::::::::
3 09:56:03,823 - INFO  [main:FileSnap@83] - Reading snapshot
/var/lib/zookeeper/version-2/snapshot.30
2016-05-13 09:56:03,860 - ERROR [main:FileTxnSnapLog@210] - Parent
/brokers/ids missing for /brokers/ids/2
2016-05-13 09:56:03,862 - ERROR [main:QuorumPeer@453] - Unable to load
database on disk
java.io.IOException: Failed to process transaction type: 1 error:
KeeperErrorCode = NoNode for /brokers/ids
        at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:153)
        at
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
        at
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
        at
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /brokers/ids
        at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:211)
        at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
        ... 6 more
2016-05-13 09:56:03,865 - ERROR [main:QuorumPeerMain@89] - Unexpected
exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
        at
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
        at
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.io.IOException: Failed to process transaction type: 1
error: KeeperErrorCode = NoNode for /brokers/ids
        at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:153)
        at
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
        at
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
        ... 4 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /brokers/ids
        at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:211)
        at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
        ... 6 more


2016-05-13 09:57:29,084 - ERROR [main:QuorumPeerMain@89] - Unexpected
exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
        at
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
        at
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.io.IOException: Failed to process transaction type: 1
error: KeeperErrorCode = NoNode for /brokers/ids
        at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:153)
        at
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
        at
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
        ... 4 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /brokers/ids
        at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:211)
        at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
        ... 6 more


SECOND: 76 :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
ING (n.state), 3 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2016-05-13 09:42:40,650 - WARN
[RecvWorker:1:QuorumCnxManager$RecvWorker@762] - Connection broken for id
1, my id = 2, error =
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:747)
2016-05-13 09:42:40,650 - WARN
[RecvWorker:1:QuorumCnxManager$RecvWorker@765] - Interrupting SendWorker
2016-05-13 09:42:40,651 - WARN
[SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while waiting
for message on queue
java.lang.InterruptedException
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
        at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:831)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:62)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:667)
2016-05-13 09:42:40,651 - WARN
[SendWorker:1:QuorumCnxManager$SendWorker@688] - Send worker leaving threa


..... then these ...............

==> /var/log/zookeeper/zookeeper.log <==
2016-05-13 10:01:20,334 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /X.Y.Z.75:58954
2016-05-13 10:01:20,334 - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of
session 0x0 due to java.io.IOException: ZooKeeperServer not running
2016-05-13 10:01:20,335 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for
client /X.Y.Z.75:58954 (no session established for client)

==> /home/kafka/kafka/kafka.log <==
[2016-05-13 10:01:20,412] INFO Opening socket connection to server
X.Y.Z.75/X.Y.Z.75:2181. Will not attempt to authenticate using SASL
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2016-05-13 10:01:20,413] INFO Socket connection established to
X.Y.Z.75/X.Y.Z.75:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2016-05-13 10:01:20,637] WARN Session 0x254a9245fc00000 for server
X.Y.Z.75/X.Y.Z.75:2181, unexpected error, closing socket connection and
attempting reconnect (org.apache.zookeeper.ClientCnxn)
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)
        at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
        at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
[2016-05-13 10:01:21,782] INFO Opening socket connection to server
X.Y.Z.76/X.Y.Z.76:2181. Will not attempt to authenticate using SASL
(unknown error) (org.apache.zookeeper.ClientCnxn)





THIRD - added last::::::::::::::::::::::::::::::::::::::::

LOWING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2016-05-13 03:03:39,540 - INFO
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
Notification time out: 25600
2016-05-13 03:03:39,569 - WARN  [WorkerSender[myid=3]:QuorumCnxManager@368]
- Cannot open channel to 1 at election address /X.Y.Z.75:3888
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
        at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)
        at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)
        at java.lang.Thread.run(Thread.java:745)
2016-05-13 03:03:39,570 - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
(n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING (n.state), 3
(n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2016-05-13 03:03:39,596 - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
(n.leader), 0x100000052 (n.zxid), 0x108d1 (n.round), FOLLOWING (n.state), 2
(n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2016-05-13 03:03:47,801 - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 2
(n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING (n.state), 2
(n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2016-05-13 03:03:48,013 - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 2
(n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING (n.state), 2
(n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2016-05-13 03:03:48,415 - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 2
(n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING (n.state), 2
(n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2016-05-13 03:03:49,216 - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 2
(n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING (n.state), 2
(n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2016-05-13 03:03:50,818 - INFO
[WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 2
(n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING (n.state), 2
(n.sid), 0x1 (n.peerEPoch), LOOKING (my state)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message