From user-return-11234-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Thu Jan 25 17:10:30 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id B6013180651 for ; Thu, 25 Jan 2018 17:10:30 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A5ED6160C3D; Thu, 25 Jan 2018 16:10:30 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C3EE1160C13 for ; Thu, 25 Jan 2018 17:10:29 +0100 (CET) Received: (qmail 65739 invoked by uid 500); 25 Jan 2018 16:10:28 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 65727 invoked by uid 99); 25 Jan 2018 16:10:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jan 2018 16:10:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 3C9E51A0247 for ; Thu, 25 Jan 2018 16:10:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id SR7oZ1kycuOS for ; Thu, 25 Jan 2018 16:10:21 +0000 (UTC) Received: from mail-oi0-f53.google.com (mail-oi0-f53.google.com [209.85.218.53]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E34675F23E for ; Thu, 25 Jan 2018 16:10:20 +0000 (UTC) Received: by mail-oi0-f53.google.com with SMTP id b11so5568978oif.2 for ; Thu, 25 Jan 2018 08:10:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=kpwaujsfgv9vyVnlZme8W0al53q6K6n6E1ng4P1JW28=; b=cnzr+JsmERBER+OL0JK1+Fmlp3AMw+lUlozUu90xLomOJApX1ZMX6e6icfKk1aqSuT kYpIuGOWXvAcle04QHZK7GCF0u6u6zL/QkJohINODXCT0cYLSBOZy59QYCBOfOb9pz/R lkcSVFFY18SRX6I2GstS3rIwfmZVwUZIPhyT1lEhKMxFc2DY6s64UPIGnORXRBEAHdnb PQCARry27mrtWwVGuyypDSEI7oYBuhtEqQ9rkTjOTenxQbNBV0v1wyVfYx9XbQon+A8A +X1020gkXeLxFicDqH8WDAhBCRr8pj6pb1c6+RGCwMMi2Psy41nh55BFPuGV2kBsBL2N 2qpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=kpwaujsfgv9vyVnlZme8W0al53q6K6n6E1ng4P1JW28=; b=s7SyX8SviMwtCJweeA21RyUoNxQ8Ba7R1WPVAObH4uj+9cVqv1FW8b+CEGKgCo2ZOM liLydLI11BZuSCxIsoi1A8X3tgtxSsMPm+TlqsMszIsRKozhHO8/U3LE08ETNy3es4cn GYbZYMBexJYOgz69oPbmh49PzMpke6mqj4fLzBnRO1ZcekzRh+LrmDPaL9/GyBnaJVKW bdJEqB10Q6vI6VN1eAuN0nU9ktkFcrp9fLyBgWUGbiH4C6Rr12E9W+XyL9dZR4kZciXz TxnfPTtKwIyd1fSDmhVOhvdX+kKzVTnld4f74oQn6qyst0JDgk5HGmb522Ijg4JdfRd/ FDwA== X-Gm-Message-State: AKwxytc8fyIftABtTEYGRNn4lgPlJ1ewuZdjpqCdV51nTGsBmCyqT/zd fBItP/RZmM/dUeE2AlKR0qUl0sFtLnUgC17uHsHA9g== X-Google-Smtp-Source: AH8x225saCBxDZwZRHnT3dnVbdRNtkO+OIGAO9UNILwgVunMQX0pQDx9ac0F13eLoX5c182fKBHaiTxnrYi13ot2dmQ= X-Received: by 10.202.236.85 with SMTP id k82mr11277907oih.215.1516896620046; Thu, 25 Jan 2018 08:10:20 -0800 (PST) MIME-Version: 1.0 Received: by 10.157.91.23 with HTTP; Thu, 25 Jan 2018 08:10:19 -0800 (PST) In-Reply-To: References: From: Andor Molnar Date: Thu, 25 Jan 2018 17:10:19 +0100 Message-ID: Subject: Re: Zookeeper -java.net.SocketException: Socket closed To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary="001a1137c6de33292005639c04d2" --001a1137c6de33292005639c04d2 Content-Type: text/plain; charset="UTF-8" Use EBS drives and make sure you allocate enough IOPS for the load. Andor On Thu, Jan 25, 2018 at 4:21 PM, upendar devu wrote: > a disk write has taken too long as well: I will check on this, thanks for > finding it. zk logs really bit diff to understand for me. > > On Thu, Jan 25, 2018 at 10:19 AM, upendar devu > wrote: > > > Thanks for sharing analysis , the instances running on EC2 instances and > > we have kafka,zk,storm and es instances as well but not seen such error > in > > those components if there is network latency then there should be socket > > error in other components as data is being processed every sec. > > > > Lets hear from zookeeper dev team , hope they will respond > > > > On Thu, Jan 25, 2018 at 6:39 AM, Andor Molnar > wrote: > > > >> No, this is not the bug I was thinking of. > >> > >> Looks like your network connection is poor between the leader and the > >> follower which the logs was attached. Do you have any other network > >> monitoring tools in place or do you see any network related error > messages > >> in your kernel logs? > >> Follower lost the connection to the leader: > >> 2018-01-23 07:40:21,709 [myid:3] - WARN > >> [SyncThread:3:SendAckRequestProcessor@64] - Closing connection to > leader, > >> exception during packet send > >> > >> ...and took ages to recover: 944 secs!! > >> 2018-01-23 07:56:05,742 [myid:3] - INFO > >> [QuorumPeer[myid=3]/XX.XX.XX:2181:Follower@63] - FOLLOWING - LEADER > >> ELECTION TOOK - 944020 > >> > >> Additionally, a disk write has taken too long as well: > >> 2018-01-23 07:40:21,706 [myid:3] - WARN [SyncThread:3:FileTxnLog@334] > - > >> fsync-ing the write ahead log in SyncThread:3 took 13638ms which will > >> adversely effect operation latency. See the ZooKeeper troubleshooting > >> guide > >> > >> I believe this stuff is worth to take a closer look, though I'm not an > >> expert of Zookeeper, maybe somebody else can give you more insight. > >> > >> Regards, > >> Andor > >> > >> > >> On Wed, Jan 24, 2018 at 7:47 PM, upendar devu > >> wrote: > >> > >> > Thanks Andor for the reply. > >> > > >> > We are using zookeeper version 3.4.6; we have 3 instances ; please see > >> > below configuration , I believe we are using default configuration and > >> > attached zk log and issue is occurred at First Occurrence: 01/23/2018 > >> > 07:42:22 Last Occurrence: 01/23/2018 07:43:22 > >> > > >> > > >> > The issue occurs 3 to 4 times in a month and get auto resolved in few > >> mins > >> > but this is really annoying our operations team. please let me know if > >> you > >> > need any additional details > >> > > >> > > >> > > >> > # The number of milliseconds of each tick > >> > tickTime=2000 > >> > > >> > # The number of ticks that the initial synchronization phase can take > >> > initLimit=10 > >> > > >> > # The number of ticks that can pass between sending a request and > >> getting > >> > an acknowledgement > >> > syncLimit=5 > >> > > >> > # The directory where the snapshot is stored. > >> > dataDir=/opt/zookeeper/current/data > >> > > >> > # The port at which the clients will connect > >> > clientPort=2181 > >> > > >> > # This is the list of Zookeeper peers: > >> > server.1=zookeeper1:2888:3888 > >> > server.2=zookeeper2:2888:3888 > >> > server.3=zookeeper3:2888:3888 > >> > > >> > # The interface IP address(es) from which zookeeper will listen from > >> > clientPortAddress= > >> > > >> > # The number of snapshots to retain in dataDir > >> > autopurge.snapRetainCount=3 > >> > > >> > # Purge task interval in hours > >> > # Set to "0" to disable auto purge feature > >> > autopurge.purgeInterval=1 > >> > > >> > > >> > On Wed, Jan 24, 2018 at 4:51 AM, Andor Molnar > >> wrote: > >> > > >> >> Hi Upendar, > >> >> > >> >> Thanks for reporting the issue. > >> >> I've a gut feeling which existing bug you've run into, but would you > >> >> please > >> >> share some more detail (version of ZK, log context, config files, > >> etc.) to > >> >> get confidence? > >> >> > >> >> Thanks, > >> >> Andor > >> >> > >> >> > >> >> On Wed, Jan 17, 2018 at 4:36 PM, upendar devu < > devulapalli8@gmail.com> > >> >> wrote: > >> >> > >> >> > we are getting below error twice in a month , though its auto > >> resolved > >> >> but > >> >> > anyone can explain why this error occurring and what needs to be > >> done to > >> >> > prevent the error , is this common error and can be ignored? > >> >> > > >> >> > Please suggest. > >> >> > > >> >> > > >> >> > 2018-01-16 20:36:17,378 [myid:2] - WARN > >> >> > [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken > >> for > >> >> id > >> >> > 3, my id = 2, error = java.net.SocketException: Socket closed at > >> >> > java.net.SocketInputStream.socketRead0(Native Method) at > >> >> > java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at > >> >> > java.net.SocketInputStream.read(SocketInputStream.java:171) at > >> >> > java.net.SocketInputStream.read(SocketInputStream.java:141) at > >> >> > java.net.SocketInputStream.read(SocketInputStream.java:224) at > >> >> > java.io.DataInputStream.readInt(DataInputStream.java:387) at > >> >> > org.apache.zookeeper.server.quorum.QuorumCnxManager$ > RecvWorker.run( > >> >> > QuorumCnxManager.java:765) > >> >> > > >> >> > >> > > >> > > >> > > > > > --001a1137c6de33292005639c04d2--