Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 54BE4200ABD for ; Sat, 14 May 2016 15:10:18 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5356D160969; Sat, 14 May 2016 13:10:18 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2E2CE160131 for ; Sat, 14 May 2016 15:10:17 +0200 (CEST) Received: (qmail 97057 invoked by uid 500); 14 May 2016 13:10:11 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 97046 invoked by uid 99); 14 May 2016 13:10:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 May 2016 13:10:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 94D81180573 for ; Sat, 14 May 2016 13:10:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.719 X-Spam-Level: X-Spam-Status: No, score=-0.719 tagged_above=-999 required=6.31 tests=[HEADER_FROM_DIFFERENT_DOMAINS=0.001, NORMAL_HTTP_TO_IP=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id woUQxy9ZJM5y for ; Sat, 14 May 2016 13:10:07 +0000 (UTC) Received: from mail-wm0-f44.google.com (mail-wm0-f44.google.com [74.125.82.44]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 26B715F282 for ; Sat, 14 May 2016 13:10:07 +0000 (UTC) Received: by mail-wm0-f44.google.com with SMTP id n129so51707156wmn.1 for ; Sat, 14 May 2016 06:10:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=Q55fSt8+f5MrhBcSclsYmouWq30Qj5NcPWt8dcHz82A=; b=fEnau2ng0Iuw5IyEXeDiAjgMpOAocN+Xa0mVuL7I+EDZm01OfIuNajFErNS+XqcDzO Kb19E32QphPzfXEA3w5HARsCRniwBl6FVJZTS1VP95+ETHqwWlyxtDegvUd2dX67DS7d 6Pjz/jdyzkAWAwNIh1cGC32qk1e5P+tx+xUIjK0fGdUgnAUrHbFV6PCs8BZTvFrSoyiQ GLwk88cHO2y+ddry4ZCZoTfAnVw7w/hP9xHuS/aahFLWo4Sl+jKc93Drp+fR8675Y1Mc iMr1I8jcJyYjEmN7KfoeOEvvfpx+SSXu1WPp/gjUnDGp9KoeH4MRZkDoFeH9Lxmf5WjC G9cQ== X-Gm-Message-State: AOPr4FWsZJbv/D8SRWCs5IoAD/3UEwn5SV7sKtMzeB9ga2GrIt3dqMUVPC3TJdYqPA6K+g== X-Received: by 10.28.167.206 with SMTP id q197mr8362358wme.85.1463231406380; Sat, 14 May 2016 06:10:06 -0700 (PDT) Received: from [192.168.1.64] ([81.141.241.8]) by smtp.gmail.com with ESMTPSA id r204sm8302382wmg.20.2016.05.14.06.10.05 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 14 May 2016 06:10:05 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: regarding zookeper cluster setup replication, config issues and inconsistent state From: Flavio Junqueira In-Reply-To: Date: Sat, 14 May 2016 14:10:08 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <6EB7ADB4-2F5D-4EB2-9F69-A0D6FFFDF86F@apache.org> References: <3DC3202E-DCBC-498F-B555-3B2950B065B9@apache.org> To: user@zookeeper.apache.org X-Mailer: Apple Mail (2.2104) archived-at: Sat, 14 May 2016 13:10:18 -0000 If there is an active leader, then the ensemble is in sync. The simplest = way to check if the ensemble is working is to use zkCli to connect and = perform a few operations. You can also use four-letter commands to to = check the health of the servers. I didn't understand the point about a follower following. Perhaps it was = waiting for its connection to the leader to time out? -Flavio > On 14 May 2016, at 02:16, R Krishna wrote: >=20 > Hmm, we are using the same version as 3.4.5 with similar explanation = as > this bug. Bad experience for a first time setup. > https://issues.apache.org/jira/browse/ZOOKEEPER-1653 >=20 > So I finally stop ZooKeeper, clear all the data except the myid files = as > follows: >=20 > Clean restart all three servers one by one: > find /var/lib/zookeeper -name "*" > cat /var/lib/zookeeper/myid > rm -r /var/lib/zookeeper/ve* >> /var/lib/zookeeper/zookeeper_server.pid > ls -ltrh /var/lib/zookeeper/ >=20 > Q.) Everything seemed fine except on one of the followers I noticed > following. What does this mean? > Q.) How do I know the zookeeper cluster is stable and the data is in = sync?: >=20 > 2016-05-14 00:31:18,424 - WARN > [RecvWorker:2:QuorumCnxManager$RecvWorker@762] - Connection broken for = id > 2, my id =3D 1, error =3D > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:392) > at > = org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumC= nxManager.java:747) > 2016-05-14 00:31:18,424 - WARN > [RecvWorker:2:QuorumCnxManager$RecvWorker@765] - Interrupting = SendWorker > 2016-05-14 00:31:18,425 - WARN > [SendWorker:2:QuorumCnxManager$SendWorker@679] - Interrupted while = waiting > for message on queue > java.lang.InterruptedException > at > = java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.repo= rtInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) > at > = java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awai= tNanos(AbstractQueuedSynchronizer.java:2095) > at > = java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389) > at > = org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCn= xManager.java:831) > at > = org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxMa= nager.java:62) > at > = org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumC= nxManager.java:667) > 2016-05-14 00:31:18,425 - WARN > [SendWorker:2:QuorumCnxManager$SendWorker@688] - Send worker leaving = thread >=20 >=20 > More logs: > kafka@awo-p05-kafk01:~$ echo stat | nc X.Y.Z.75 2181 > Zookeeper version: 3.4.5--1, built on 06/10/2013 17:26 GMT > Clients: > /X.Y.Z.77:33995[1](queued=3D0,recved=3D214,sent=3D216) > /X.Y.Z.75:44172[0](queued=3D0,recved=3D1,sent=3D0) >=20 > Latency min/avg/max: 0/0/19 > Received: 239 > Sent: 240 > Connections: 2 > Outstanding: 0 > Zxid: 0x100000021 > Mode: follower > Node count: 19 > 2016-05-14 01:14:24,892 - INFO [Thread-19:NIOServerCnxn@1001] - = Closed > socket connection for client /X.Y.Z.75:44172 (no session established = for > client) > 2016-05-14 01:14:24,892 - INFO [Thread-19:NIOServerCnxn@1001] - = Closed > socket connection for client /X.Y.Z.75:44172 (no session established = for > client) > kafka@awo-p05-kafk01:~$ echo stat | nc X.Y.Z.76 2181 > echo stat | nc X.Y.Z.77 2181Zookeeper version: 3.4.5--1, built on > 06/10/2013 17:26 GMT > Clients: > /X.Y.Z.75:53076[0](queued=3D0,recved=3D1,sent=3D0) >=20 > Latency min/avg/max: 0/1/8 > Received: 36 > Sent: 35 > Connections: 1 > Outstanding: 0 > Zxid: 0x100000021 > Mode: follower > Node count: 19 > kafka@awo-p05-kafk01:~$ echo stat | nc X.Y.Z.77 2181 > Zookeeper version: 3.4.5--1, built on 06/10/2013 17:26 GMT > Clients: > /X.Y.Z.75:55941[0](queued=3D0,recved=3D1,sent=3D0) >=20 > Latency min/avg/max: 0/1/3 > Received: 27 > Sent: 26 > Connections: 1 > Outstanding: 0 > Zxid: 0x100000021 > Mode: leader > Node count: 19 >=20 >=20 >=20 > On Fri, May 13, 2016 at 12:59 PM, R Krishna = wrote: >=20 >> As I said before, I cannot even restart one server, it automatically >> brings up another process. >>=20 >> I tried specifically setting the PID. >> ps -aef | grep -i zoo >> vim /var/lib/zookeeper/zookeeper_server.pid >> sudo /usr/share/zookeeper/bin/zkServer.sh restart >>=20 >> or stop, neither works. Is there a setting to shutdown zookeper and = bring >> up one by one in 3 node cluster? >>=20 >>=20 >> On Fri, May 13, 2016 at 12:57 PM, R Krishna = wrote: >>=20 >>> I have a fairly simple config file (below), I tried to reboot the = machine >>> but server 75 never restarts properly by exposing LISTEN port on = 3888 and >>> obviously get 2016-05-13 12:54:58,555 - WARN >>> [WorkerSender[myid=3D3]:QuorumCnxManager@368] - Cannot open channel = to 1 >>> at election address /172.28.84.75:3888. Whereas 75 is unable to = expose >>> 3888 and unable to connect to other servers with those exceptions = shown >>> before. >>>=20 >>> Yes, I chose a distinct id=3D1 to 3 for each server. How do you do a >>> rolling restart? and where do you specify to take it easy if it = cannot find >>> all servers? >>>=20 >>> # The number of milliseconds of each tick >>> tickTime=3D2000 >>> # The number of ticks that the initial >>> # synchronization phase can take >>> initLimit=3D10 >>> # The number of ticks that can pass between >>> # sending a request and getting an acknowledgement >>> syncLimit=3D5 >>> # the directory where the snapshot is stored. >>> dataDir=3D/var/lib/zookeeper >>> # Place the dataLogDir to a separate physical disc for better = performance >>> # dataLogDir=3D/disk2/zookeeper >>>=20 >>> # the port at which the clients will connect >>> clientPort=3D2181 >>>=20 >>> # specify all zookeeper servers >>> # The fist port is used by followers to connect to the leader >>> # The second one is used for leader election >>> server.1=3DX.Y.Z.75:2888:3888 >>> server.2=3DX.Y.Z.76:2888:3888 >>> server.3=3DX.Y.Z.98:2888:3888 >>>=20 >>>=20 >>> On Fri, May 13, 2016 at 3:51 AM, Flavio Junqueira = wrote: >>>=20 >>>> Hi there, >>>>=20 >>>> The myid needs to contain the id for each server in the ensemble, = so >>>> each server will have a distinct value in its myid file. >>>>=20 >>>> The problem might be with you configuration file. I think you say = that >>>> you have specified the servers in the config file of each server, = but >>>> perhaps you want to have a look at the documentation to see if = there is >>>> anything you're missing. If you're not sure, please post it here. >>>>=20 >>>> In the 3.4 branch of ZK, you have to do a rolling upgrade of the = servers. >>>>=20 >>>> -Flavio >>>>=20 >>>>> On 13 May 2016, at 11:15, R Krishna wrote: >>>>>=20 >>>>> Just tried to setup a 2 zookeeper cluster for the first time one = each >>>> for >>>>> my 2 Kafka broker cluster and came across following issues: >>>>> 1. Do we have to specify a separate value in vim >>>> ./var/lib/zookeeper/myid >>>>> although they are separate machine instances? >>>>> 2. I kept seeing Mode:standalone between the two servers although = I saw >>>>> connectivity between these two. After restarts, I saw them go to >>>>> Follower/Leader. >>>>> /usr/share/zookeeper/bin/zkServer.sh status >>>>> JMX enabled by default >>>>> Using config: /etc/zookeeper/conf/zoo.cfg >>>>> Mode: standalone >>>>> 3. The data was completely inconsistent, I was able to connect to = each >>>> one >>>>> run the all netcat status commands from the other server without = any >>>> issue. >>>>> However, Kafka broker data was inconsistent and kept failing, is = there >>>> a >>>>> way to confirm if both nodes are in sync and part of same cluster? >>>>> org.I0Itec.zkclient.exception.ZkNoNodeException: >>>>> org.apache.zookeeper.KeeperException$NoNodeException: = KeeperErrorCode =3D >>>>> NoNode for /config/changes >>>>>=20 >>>>> 4. Whenever I updated the .cfg file, I cannot do a sudo >>>>> /usr/share/zookeeper/bin/zkServer.sh restart, I have to force kill = the >>>> pid, >>>>> in which case in brings up another process reading the latest = .cfg, >>>> why is >>>>> this so? >>>>>=20 >>>>> 5. I realized we need at least 3 to make an ensemble, so I created = and >>>>> added another ZK host updated the .cfg and force killed the = process so >>>> it >>>>> reads the latest config and started getting these exceptions. Yes, = this >>>>> probably means I have run out of connections. >>>>>=20 >>>>> *And finally, how do I safely restart such a cluster when adding = new >>>> nodes >>>>> and then force them to sync data?* >>>>>=20 >>>>> MASTER: 75: :::::::::::::::::::::::::::::::::::: >>>>> 3 09:56:03,823 - INFO [main:FileSnap@83] - Reading snapshot >>>>> /var/lib/zookeeper/version-2/snapshot.30 >>>>> 2016-05-13 09:56:03,860 - ERROR [main:FileTxnSnapLog@210] - Parent >>>>> /brokers/ids missing for /brokers/ids/2 >>>>> 2016-05-13 09:56:03,862 - ERROR [main:QuorumPeer@453] - Unable to = load >>>>> database on disk >>>>> java.io.IOException: Failed to process transaction type: 1 error: >>>>> KeeperErrorCode =3D NoNode for /brokers/ids >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnap= Log.java:153) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java= :417) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeer= Main.java:151) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumP= eerMain.java:111) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java= :78) >>>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >>>>> KeeperErrorCode =3D NoNode for /brokers/ids >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(= FileTxnSnapLog.java:211) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnap= Log.java:151) >>>>> ... 6 more >>>>> 2016-05-13 09:56:03,865 - ERROR [main:QuorumPeerMain@89] - = Unexpected >>>>> exception, exiting abnormally >>>>> java.lang.RuntimeException: Unable to run quorum server >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java= :454) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeer= Main.java:151) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumP= eerMain.java:111) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java= :78) >>>>> Caused by: java.io.IOException: Failed to process transaction = type: 1 >>>>> error: KeeperErrorCode =3D NoNode for /brokers/ids >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnap= Log.java:153) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java= :417) >>>>> ... 4 more >>>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >>>>> KeeperErrorCode =3D NoNode for /brokers/ids >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(= FileTxnSnapLog.java:211) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnap= Log.java:151) >>>>> ... 6 more >>>>>=20 >>>>>=20 >>>>> 2016-05-13 09:57:29,084 - ERROR [main:QuorumPeerMain@89] - = Unexpected >>>>> exception, exiting abnormally >>>>> java.lang.RuntimeException: Unable to run quorum server >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java= :454) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeer= Main.java:151) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumP= eerMain.java:111) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java= :78) >>>>> Caused by: java.io.IOException: Failed to process transaction = type: 1 >>>>> error: KeeperErrorCode =3D NoNode for /brokers/ids >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnap= Log.java:153) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java= :417) >>>>> ... 4 more >>>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >>>>> KeeperErrorCode =3D NoNode for /brokers/ids >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(= FileTxnSnapLog.java:211) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnap= Log.java:151) >>>>> ... 6 more >>>>>=20 >>>>>=20 >>>>> SECOND: 76 >>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: >>>>> ING (n.state), 3 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >>>>> 2016-05-13 09:42:40,650 - WARN >>>>> [RecvWorker:1:QuorumCnxManager$RecvWorker@762] - Connection broken >>>> for id >>>>> 1, my id =3D 2, error =3D >>>>> java.io.EOFException >>>>> at java.io.DataInputStream.readInt(DataInputStream.java:392) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumC= nxManager.java:747) >>>>> 2016-05-13 09:42:40,650 - WARN >>>>> [RecvWorker:1:QuorumCnxManager$RecvWorker@765] - Interrupting >>>> SendWorker >>>>> 2016-05-13 09:42:40,651 - WARN >>>>> [SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while >>>> waiting >>>>> for message on queue >>>>> java.lang.InterruptedException >>>>> at >>>>>=20 >>>> = java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.repo= rtInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) >>>>> at >>>>>=20 >>>> = java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awai= tNanos(AbstractQueuedSynchronizer.java:2095) >>>>> at >>>>>=20 >>>> = java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCn= xManager.java:831) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxMa= nager.java:62) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumC= nxManager.java:667) >>>>> 2016-05-13 09:42:40,651 - WARN >>>>> [SendWorker:1:QuorumCnxManager$SendWorker@688] - Send worker = leaving >>>> threa >>>>>=20 >>>>>=20 >>>>> ..... then these ............... >>>>>=20 >>>>> =3D=3D> /var/log/zookeeper/zookeeper.log <=3D=3D >>>>> 2016-05-13 10:01:20,334 - INFO [NIOServerCxn.Factory: >>>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket >>>> connection >>>>> from /X.Y.Z.75:58954 >>>>> 2016-05-13 10:01:20,334 - WARN [NIOServerCxn.Factory: >>>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close = of >>>>> session 0x0 due to java.io.IOException: ZooKeeperServer not = running >>>>> 2016-05-13 10:01:20,335 - INFO [NIOServerCxn.Factory: >>>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket = connection >>>> for >>>>> client /X.Y.Z.75:58954 (no session established for client) >>>>>=20 >>>>> =3D=3D> /home/kafka/kafka/kafka.log <=3D=3D >>>>> [2016-05-13 10:01:20,412] INFO Opening socket connection to server >>>>> X.Y.Z.75/X.Y.Z.75:2181. Will not attempt to authenticate using = SASL >>>>> (unknown error) (org.apache.zookeeper.ClientCnxn) >>>>> [2016-05-13 10:01:20,413] INFO Socket connection established to >>>>> X.Y.Z.75/X.Y.Z.75:2181, initiating session >>>> (org.apache.zookeeper.ClientCnxn) >>>>> [2016-05-13 10:01:20,637] WARN Session 0x254a9245fc00000 for = server >>>>> X.Y.Z.75/X.Y.Z.75:2181, unexpected error, closing socket = connection and >>>>> attempting reconnect (org.apache.zookeeper.ClientCnxn) >>>>> java.io.IOException: Connection reset by peer >>>>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >>>>> at = sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >>>>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) >>>>> at sun.nio.ch.IOUtil.read(IOUtil.java:192) >>>>> at = sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)= >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.j= ava:366) >>>>> at >>>>> = org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) >>>>> [2016-05-13 10:01:21,782] INFO Opening socket connection to server >>>>> X.Y.Z.76/X.Y.Z.76:2181. Will not attempt to authenticate using = SASL >>>>> (unknown error) (org.apache.zookeeper.ClientCnxn) >>>>>=20 >>>>>=20 >>>>>=20 >>>>>=20 >>>>>=20 >>>>> THIRD - added last:::::::::::::::::::::::::::::::::::::::: >>>>>=20 >>>>> LOWING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >>>>> 2016-05-13 03:03:39,540 - INFO >>>>> [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] = - >>>>> Notification time out: 25600 >>>>> 2016-05-13 03:03:39,569 - WARN >>>> [WorkerSender[myid=3D3]:QuorumCnxManager@368] >>>>> - Cannot open channel to 1 at election address /X.Y.Z.75:3888 >>>>> java.net.ConnectException: Connection refused >>>>> at java.net.PlainSocketImpl.socketConnect(Native Method) >>>>> at >>>>>=20 >>>> = java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:33= 9) >>>>> at >>>>>=20 >>>> = java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.= java:200) >>>>> at >>>>>=20 >>>> = java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)= >>>>> at = java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) >>>>> at java.net.Socket.connect(Socket.java:579) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxMa= nager.java:354) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManage= r.java:327) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSend= er.process(FastLeaderElection.java:393) >>>>> at >>>>>=20 >>>> = org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSend= er.run(FastLeaderElection.java:365) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> 2016-05-13 03:03:39,570 - INFO >>>>> [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: = 3 >>>>> (n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING >>>> (n.state), 3 >>>>> (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >>>>> 2016-05-13 03:03:39,596 - INFO >>>>> [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: = 3 >>>>> (n.leader), 0x100000052 (n.zxid), 0x108d1 (n.round), FOLLOWING >>>> (n.state), 2 >>>>> (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >>>>> 2016-05-13 03:03:47,801 - INFO >>>>> [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: = 2 >>>>> (n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING >>>> (n.state), 2 >>>>> (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >>>>> 2016-05-13 03:03:48,013 - INFO >>>>> [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: = 2 >>>>> (n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING >>>> (n.state), 2 >>>>> (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >>>>> 2016-05-13 03:03:48,415 - INFO >>>>> [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: = 2 >>>>> (n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING >>>> (n.state), 2 >>>>> (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >>>>> 2016-05-13 03:03:49,216 - INFO >>>>> [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: = 2 >>>>> (n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING >>>> (n.state), 2 >>>>> (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >>>>> 2016-05-13 03:03:50,818 - INFO >>>>> [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: = 2 >>>>> (n.leader), 0x100000052 (n.zxid), 0x108d2 (n.round), LOOKING >>>> (n.state), 2 >>>>> (n.sid), 0x1 (n.peerEPoch), LOOKING (my state) >>>>=20 >>>>=20 >>>=20 >>>=20 >>> -- >>> Radha Krishna, Proddaturi >>> 253-234-5657 >>>=20 >>=20 >>=20 >>=20 >> -- >> Radha Krishna, Proddaturi >> 253-234-5657 >>=20 >=20 >=20 >=20 > --=20 > Radha Krishna, Proddaturi > 253-234-5657