From user-return-11231-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Thu Jan 25 16:19:20 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 5F4E2180651 for ; Thu, 25 Jan 2018 16:19:20 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 4F3E8160C3D; Thu, 25 Jan 2018 15:19:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6EDB5160C13 for ; Thu, 25 Jan 2018 16:19:19 +0100 (CET) Received: (qmail 92831 invoked by uid 500); 25 Jan 2018 15:19:13 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 92813 invoked by uid 99); 25 Jan 2018 15:19:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jan 2018 15:19:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 2D7ED1805C7 for ; Thu, 25 Jan 2018 15:19:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.129 X-Spam-Level: ** X-Spam-Status: No, score=2.129 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id voHMhm4aSErG for ; Thu, 25 Jan 2018 15:19:10 +0000 (UTC) Received: from mail-it0-f44.google.com (mail-it0-f44.google.com [209.85.214.44]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 34E635F3CC for ; Thu, 25 Jan 2018 15:19:09 +0000 (UTC) Received: by mail-it0-f44.google.com with SMTP id 196so9994231iti.5 for ; Thu, 25 Jan 2018 07:19:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=i1hXU3jQyKQf5JgpuRNeQ9YBWTysIqYVBrOEoj/ih6I=; b=TworZk7XZHMqs2wpNiwxe4hUEiXNKl9VioSCqjyMd6FbfuihND9SuOYUnTwUEiRvkP HroaINDnYIqRxhDTRWVCE7dA2JKqqkxHyMzmRl77mEOHHmeNOnOiA2ub6vINH9zI9ggX 3uUjl4TN6DO3x3qGn4vwXM0E5U5krwBYwonJ1UwvgztXcaXc8JcqvUHAIa3mg280a/oH pGlO52BmFnB62q8nVTnPn7g/OXgLhwAzCYfdBpHeSGnXm3vIKYfRba/aUSmL7YKNfCZ4 MGq9+LLz+nE+1/+lwQvnM+NEymPzo9e+JteoOYebH8sd9Cd4Aur75qGZfKElI+AugTz9 LrPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=i1hXU3jQyKQf5JgpuRNeQ9YBWTysIqYVBrOEoj/ih6I=; b=q7zuz++6JydkgfSvKUtHMMGBETRr50n0iXmqOp/76HbjtNA9f01a3B9rCTjepRM6c0 o+xw4sWJK2z76rXc0ja1eVDaGeoI46GJpwFw9mU3NXwhS7Tb0p8RDjLm4TS4eQ+/nuqY 6KGvKCs2KW3EjBq+vCNunXg/iARA5uS9JRARISgmD4vihW6nLHZMhX8nwATNscOo7NXP v0dJMBh0fSNkZI9f86c9BQTIfV3Yarqb2UCY324/GT+C1ChrOIhPNgNPTuxFotOXn5yc IQkYcl/YYNwC8rDNeP+W/CtPWYFwuFcY2EKa5U94UO3WiColOKeYzNwTUa5BlVDacD55 vGVA== X-Gm-Message-State: AKwxytclMOeTfj/waT9XB+zEG9LoCblEYuWR7jqRRd49hClLJgFR6myj MqiJ3l8roasFOu5gBQ/2I+ILtKoYvv4rdqKnoA== X-Google-Smtp-Source: AH8x227TobM1HOIJdTsgRrwFPcDPQ3dfEspSCsQptopc3uCzNfUz0VALoxQy3QDbon+0qdodbXRBUkThHUPAIHPeNOk= X-Received: by 10.36.50.205 with SMTP id j196mr13627555ita.110.1516893541792; Thu, 25 Jan 2018 07:19:01 -0800 (PST) MIME-Version: 1.0 Received: by 10.79.39.140 with HTTP; Thu, 25 Jan 2018 07:19:01 -0800 (PST) In-Reply-To: References: From: upendar devu Date: Thu, 25 Jan 2018 10:19:01 -0500 Message-ID: Subject: Re: Zookeeper -java.net.SocketException: Socket closed To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary="001a114ab0d0b8809405639b4c6f" --001a114ab0d0b8809405639b4c6f Content-Type: text/plain; charset="UTF-8" Thanks for sharing analysis , the instances running on EC2 instances and we have kafka,zk,storm and es instances as well but not seen such error in those components if there is network latency then there should be socket error in other components as data is being processed every sec. Lets hear from zookeeper dev team , hope they will respond On Thu, Jan 25, 2018 at 6:39 AM, Andor Molnar wrote: > No, this is not the bug I was thinking of. > > Looks like your network connection is poor between the leader and the > follower which the logs was attached. Do you have any other network > monitoring tools in place or do you see any network related error messages > in your kernel logs? > Follower lost the connection to the leader: > 2018-01-23 07:40:21,709 [myid:3] - WARN > [SyncThread:3:SendAckRequestProcessor@64] - Closing connection to leader, > exception during packet send > > ...and took ages to recover: 944 secs!! > 2018-01-23 07:56:05,742 [myid:3] - INFO > [QuorumPeer[myid=3]/XX.XX.XX:2181:Follower@63] - FOLLOWING - LEADER > ELECTION TOOK - 944020 > > Additionally, a disk write has taken too long as well: > 2018-01-23 07:40:21,706 [myid:3] - WARN [SyncThread:3:FileTxnLog@334] - > fsync-ing the write ahead log in SyncThread:3 took 13638ms which will > adversely effect operation latency. See the ZooKeeper troubleshooting guide > > I believe this stuff is worth to take a closer look, though I'm not an > expert of Zookeeper, maybe somebody else can give you more insight. > > Regards, > Andor > > > On Wed, Jan 24, 2018 at 7:47 PM, upendar devu > wrote: > > > Thanks Andor for the reply. > > > > We are using zookeeper version 3.4.6; we have 3 instances ; please see > > below configuration , I believe we are using default configuration and > > attached zk log and issue is occurred at First Occurrence: 01/23/2018 > > 07:42:22 Last Occurrence: 01/23/2018 07:43:22 > > > > > > The issue occurs 3 to 4 times in a month and get auto resolved in few > mins > > but this is really annoying our operations team. please let me know if > you > > need any additional details > > > > > > > > # The number of milliseconds of each tick > > tickTime=2000 > > > > # The number of ticks that the initial synchronization phase can take > > initLimit=10 > > > > # The number of ticks that can pass between sending a request and getting > > an acknowledgement > > syncLimit=5 > > > > # The directory where the snapshot is stored. > > dataDir=/opt/zookeeper/current/data > > > > # The port at which the clients will connect > > clientPort=2181 > > > > # This is the list of Zookeeper peers: > > server.1=zookeeper1:2888:3888 > > server.2=zookeeper2:2888:3888 > > server.3=zookeeper3:2888:3888 > > > > # The interface IP address(es) from which zookeeper will listen from > > clientPortAddress= > > > > # The number of snapshots to retain in dataDir > > autopurge.snapRetainCount=3 > > > > # Purge task interval in hours > > # Set to "0" to disable auto purge feature > > autopurge.purgeInterval=1 > > > > > > On Wed, Jan 24, 2018 at 4:51 AM, Andor Molnar > wrote: > > > >> Hi Upendar, > >> > >> Thanks for reporting the issue. > >> I've a gut feeling which existing bug you've run into, but would you > >> please > >> share some more detail (version of ZK, log context, config files, etc.) > to > >> get confidence? > >> > >> Thanks, > >> Andor > >> > >> > >> On Wed, Jan 17, 2018 at 4:36 PM, upendar devu > >> wrote: > >> > >> > we are getting below error twice in a month , though its auto resolved > >> but > >> > anyone can explain why this error occurring and what needs to be done > to > >> > prevent the error , is this common error and can be ignored? > >> > > >> > Please suggest. > >> > > >> > > >> > 2018-01-16 20:36:17,378 [myid:2] - WARN > >> > [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken > for > >> id > >> > 3, my id = 2, error = java.net.SocketException: Socket closed at > >> > java.net.SocketInputStream.socketRead0(Native Method) at > >> > java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at > >> > java.net.SocketInputStream.read(SocketInputStream.java:171) at > >> > java.net.SocketInputStream.read(SocketInputStream.java:141) at > >> > java.net.SocketInputStream.read(SocketInputStream.java:224) at > >> > java.io.DataInputStream.readInt(DataInputStream.java:387) at > >> > org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run( > >> > QuorumCnxManager.java:765) > >> > > >> > > > > > --001a114ab0d0b8809405639b4c6f--