From user-return-11201-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Wed Jan 10 10:05:35 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 0172D18062E for ; Wed, 10 Jan 2018 10:05:35 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E551A160C2E; Wed, 10 Jan 2018 09:05:34 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0FD21160C23 for ; Wed, 10 Jan 2018 10:05:33 +0100 (CET) Received: (qmail 73135 invoked by uid 500); 10 Jan 2018 09:05:32 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 73119 invoked by uid 99); 10 Jan 2018 09:05:32 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Jan 2018 09:05:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 9A6381A075C for ; Wed, 10 Jan 2018 09:05:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.692 X-Spam-Level: *** X-Spam-Status: No, score=3.692 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URI_HEX=1.313] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id BzU4-rtAx9x3 for ; Wed, 10 Jan 2018 09:05:27 +0000 (UTC) Received: from mail-ot0-f179.google.com (mail-ot0-f179.google.com [74.125.82.179]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 808BB5F2C4 for ; Wed, 10 Jan 2018 09:05:27 +0000 (UTC) Received: by mail-ot0-f179.google.com with SMTP id o1so13690470oti.12 for ; Wed, 10 Jan 2018 01:05:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=1slhDrfF5f5jnmYbKhBG/LsCI76CSoRU1TdU65nEL4Y=; b=Rizl1GQ5U+/UAQCO56J1zYWcrSZww4DcE4XwYY5U9Bab3D29sU5g7mVqlp/U7Zefxk N+f7Ww/Ah6KBi0zco6sVb+FEMUj9xhC3+OYa1izGzevtcwqOVd1wL6VbBv/IzPb0faih /vCTZgxKe+yYv8KyS6z0aqF67tqcwzW2X5EabBuYRR3x6VujGhrmEDRgIXGSv6yL9mGF tQU6HNtE1ybq0uegqu4K02GnMD8/l4gsDcny5lYvxeWkPCdff9s497EE9bfaaO14/+4m 0ThyJcaJqnmXp8ET2mdzKkLjcW23xrw5T94/T1J8BwzAb/X/1KPaS4KI2FeFs47YbVm5 Df6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=1slhDrfF5f5jnmYbKhBG/LsCI76CSoRU1TdU65nEL4Y=; b=KalbCsCmib8UphWod8z+CzffM6sF0YtA+lsiQH9OGFfb2qFQ1dV1p03Yn/NMz8s8ni ql5CsI5wy1VXM8yHZ9wUiBrB95JzJVZGBf5pDwKkeJ5plAx3wSd2eLiAEN9OBcRYox9Q Ey9WcDY+A35/k3i9xWOba6p0hkAb+LKCj/vDEFQ+Ty+CA+BRIMi3HcY+AwvRuGjbYhh9 4ahSYu71CfLpOi6aUI8vNbO5CdJ4E6bQgKI1DmOmLf1es+M9ttzL0JIj0mTHlH4TZtrK f0oyXlXgCpOYRRSzucROYvfaOnNzUWGBXuxRgGzBPyfZF/E7cy7ySZp87PmWYhHupD/G I/SQ== X-Gm-Message-State: AKwxytc1KpvYaGT7K2rUFyoErwuDysA7UxpnZklA/NuipPWuJ9WXwlqO u2xmwspKziVuiPC4JSh+cQtctKkdSfieBJOD/CWHstEY X-Google-Smtp-Source: ACJfBosTBc5TkR3whWnhMOqqxJXPWoU8Yzct+eTbUrwbaXOwE06IQIUNDh4NmpSVxqn+h6bzwgeoz9KiStOE7XwLFSY= X-Received: by 10.157.11.151 with SMTP id 23mr708822oth.179.1515575120796; Wed, 10 Jan 2018 01:05:20 -0800 (PST) MIME-Version: 1.0 Received: by 10.157.35.1 with HTTP; Wed, 10 Jan 2018 01:05:20 -0800 (PST) In-Reply-To: <1515547011855-0.post@n2.nabble.com> References: <1515547011855-0.post@n2.nabble.com> From: Andor Molnar Date: Wed, 10 Jan 2018 10:05:20 +0100 Message-ID: Subject: Re: Unable to connect node to ensemble after restart of node zookeeper 3.4.6 To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary="001a113598d0b4cd1b05626854c1" --001a113598d0b4cd1b05626854c1 Content-Type: text/plain; charset="UTF-8" Hi hkwan, java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:197) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.net.SocketInputStream.read(SocketInputStream.java:211) at java.io.DataInputStream.readInt(DataInputStream.java:387) This looks like a network issue to me. Have you tried connecting a client from server 2 to the leader? Regards, Andor On Wed, Jan 10, 2018 at 2:16 AM, hkwan wrote: > I have a 3 node ensemble in production and after restarting one node it can > no longer connect to the ensemble. I am getting this error below: > > 2018-01-10 00:49:32,492 [myid:2] - INFO > [WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server > identifier, so dropping the connection: (3, 2) > 2018-01-10 00:50:20,342 [myid:2] - WARN > [RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id > 1, > my id = 2, error = > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:197) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.net.SocketInputStream.read(SocketInputStream.java:211) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run( > QuorumCnxManager.java:765) > 2018-01-10 00:50:20,343 [myid:2] - WARN > [RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker > 2018-01-10 00:50:20,343 [myid:2] - WARN > [SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting > for message on queue > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject. > reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ > ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095) > at > java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue( > QuorumCnxManager.java:849) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager. > access$500(QuorumCnxManager.java:64) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run( > QuorumCnxManager.java:685) > 2018-01-10 00:50:20,343 [myid:2] - WARN > [SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving > thread > 2018-01-10 00:50:32,491 [myid:2] - INFO > [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - > Notification time out: 60000 > 2018-01-10 00:50:32,493 [myid:2] - INFO > [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message > format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING > (n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state) > 2018-01-10 00:50:32,495 [myid:2] - INFO > [WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server > identifier, so dropping the connection: (3, 2) > 2018-01-10 00:51:32,494 [myid:2] - INFO > [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - > Notification time out: 60000 > 2018-01-10 00:51:32,494 [myid:2] - INFO > [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message > format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING > (n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state) > 2018-01-10 00:51:32,496 [myid:2] - INFO > [WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server > identifier, so dropping the connection: (3, 2) > 2018-01-10 00:52:19,126 [myid:2] - WARN > [RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id > 1, > my id = 2, error = > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:197) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.net.SocketInputStream.read(SocketInputStream.java:211) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run( > QuorumCnxManager.java:765) > 2018-01-10 00:52:19,127 [myid:2] - WARN > [RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker > 2018-01-10 00:52:19,127 [myid:2] - WARN > [SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting > for message on queue > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject. > reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ > ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095) > at > java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue( > QuorumCnxManager.java:849) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager. > access$500(QuorumCnxManager.java:64) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run( > QuorumCnxManager.java:685) > 2018-01-10 00:52:19,128 [myid:2] - WARN > [SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving > thread > 2018-01-10 00:52:32,495 [myid:2] - INFO > [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - > Notification time out: 60000 > 2018-01-10 00:52:32,497 [myid:2] - INFO > [WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message > format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING > (n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state) > 2018-01-10 00:52:32,499 [myid:2] - INFO > [WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server > identifier, so dropping the connection: (3, 2) > > > my configuration on all three servers are: > > clientPort=2181 > dataDir=/var/opt/zookeeper/data > tickTime=2000 > autopurge.purgeInterval=24 > initLimit=10 > syncLimit=5 > server.1=10.1.0.122:2888:3888 > server.2=10.1.1.75:2888:3888 > server.3=10.1.2.221:2888:3888 > > server 3 is currently leader > server 1 is currently follower > server 2 currently cannot rejoin the ensemble > > myid files are correctly configured for all three servers. this is a > production cluster so I would like to know if there was a way to force the > node back into the cluster without anything drastic that would cause the > quorum to be lost. > > > > > > -- > Sent from: http://zookeeper-user.578899.n2.nabble.com/ > --001a113598d0b4cd1b05626854c1--