Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E60A3200C41 for ; Fri, 10 Mar 2017 06:31:21 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E49AD160B84; Fri, 10 Mar 2017 05:31:21 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3A123160B75 for ; Fri, 10 Mar 2017 06:31:21 +0100 (CET) Received: (qmail 10370 invoked by uid 500); 10 Mar 2017 05:31:20 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 10356 invoked by uid 99); 10 Mar 2017 05:31:19 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Mar 2017 05:31:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 21108C047B for ; Fri, 10 Mar 2017 05:31:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.479 X-Spam-Level: ** X-Spam-Status: No, score=2.479 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id MC-9SrCW0neS for ; Fri, 10 Mar 2017 05:31:17 +0000 (UTC) Received: from mail-ua0-f173.google.com (mail-ua0-f173.google.com [209.85.217.173]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id C39185F1EE for ; Fri, 10 Mar 2017 05:31:16 +0000 (UTC) Received: by mail-ua0-f173.google.com with SMTP id f54so98422886uaa.1 for ; Thu, 09 Mar 2017 21:31:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=ysrS8jJTGFPQcQpbJwVr7GVYmUGM8zghLk3P6PUjnNI=; b=gA4X0Wicsr0RvwH1HxKvlaZXk+Gpu01JpoMSJl5bY9V8jJgzYzWEYzDwSHYpKXN69W N4jcgUDCpZaiYrAW3fZaGX5mIJI91dHmpHGiuHohsvDGqxaAoaeW2310xIKe6hrmc/ZI 4+CLGcak7mgnlwc+b/D+khjZ5BeCzjppd/iIX/ud47vrhOUuGLdfUvt4Pzfzdiznuczm iNqbvzYLi7lfBgKNw7XopAQDzDomMGNgESdJZbYn0TH671JKcgR4a2gMUW10de1fDO3m OEOZH+AMSsUr1Fd6k0nrZWDm8fQ4VEtfisEu6K74q8KodVK+glCFF+/pVPNYuWi+phS6 OE5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=ysrS8jJTGFPQcQpbJwVr7GVYmUGM8zghLk3P6PUjnNI=; b=pX4hZ2zAq2GjdfWNiC3tq+B1+KkXkutJL+O199KPSJuxcFM/cE/fygVs1gs0olVgeO G27SgrkqaO/d3PSWpxy2SQmeBt/n25l4uiBl5ScWzPpdXalIKo6ET5JL13YswXp590c0 oci6XJotwl57Anxj02g15brIOf00K9nM82d28y6bfi1t+y+w8Jf9QPFZ03+u/JUBJmeA jQaiQMXI4OFLOqEpod3qToYh+6isRTMddzvUvi0AQriL1fu+nGh4VsOvPOedxgsUUKjh 8ojGt9drtqjU2l5HM41+pqoSOGBODD9qt2F2eBJSQq5Lr5QmidmG/ZB+ePezQZJZw8yi 5+Hg== X-Gm-Message-State: AMke39nfnL61bvCaqL/HQxrXH/xGEMMkV/vUU3sM5HtPwEJ1gcHqIQ6QuTrsH5bj+RyPVv4Yq8sG7geec+cSRVZC X-Received: by 10.31.55.197 with SMTP id e188mr8731467vka.171.1489123872915; Thu, 09 Mar 2017 21:31:12 -0800 (PST) MIME-Version: 1.0 Received: by 10.176.66.66 with HTTP; Thu, 9 Mar 2017 21:30:42 -0800 (PST) In-Reply-To: References: From: Michael Han Date: Thu, 9 Mar 2017 21:30:42 -0800 Message-ID: Subject: Re: shutdown Observer To: UserZooKeeper Content-Type: multipart/alternative; boundary=001a1144abc478eaea054a59ab4a archived-at: Fri, 10 Mar 2017 05:31:22 -0000 --001a1144abc478eaea054a59ab4a Content-Type: text/plain; charset=UTF-8 It helps. An extreme case is network partition and packet loss is 100%. ZK rely on TCP for communications between quorum peers, so the lost packet will be retransmitted by TCP, so unless your network is partitioned forever, the system will move forward once the partition heals. There is no worries about a packet loss forever because of the TCP guarantee. In this case the timeout can be set to infinite (pass 0 to setSoTimeout) so socket IO will block indefinitely until partition heals. The socket timeout is really just to provide an opportunity for ZK server to take action when we think we should bail out for a bad network condition rather than blocking indefinitely, as ZK needs to satisfy some basic liveness guarantee. On Thu, Mar 9, 2017 at 3:12 PM, Jai Bheemsen Rao Dhanwada < jaibheemsen@gmail.com> wrote: > If there is packet loss, does increasing the initLimit value help? > > ref: http://efod.se/blog/archive/2013/02/09/zookeeper-initlimit > > Any thoughts? > > On Thu, Mar 9, 2017 at 10:12 AM, Dan Benediktson < > dbenediktson@twitter.com.invalid> wrote: > > > It's also likely you have a fair bit of packet loss between your > > datacenters, unless you know you have a solid network between them. If > your > > observers are falling offline "randomly", packet loss is a pretty likely > > culprit. > > > > On Thu, Mar 9, 2017 at 9:54 AM, Michael Han wrote: > > > > > The log indicates that your server socket on observer timed out after > > > syncing with leader. It could simply because that the latency between > > your > > > DCs exceeds the socket timeout configuration ZK uses. The timeout is > > > calculated as tickTime * syncLimit so you might want tweak these values > > to > > > fit the latency between your DCs. > > > > > > On Thu, Mar 9, 2017 at 9:00 AM, rammohan ganapavarapu < > > > rammohanganap@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > We have a multi data-center zk cluster with all the followers are in > > one > > > > data-center and observers in other data-centers, for some reason > > > observers > > > > are going down with the following exception and i am not sure what > > could > > > be > > > > the reason and how to avoid this issue, any thoughts? > > > > > > > > Ram > > > > > > > > > > > > > > > > 2017-03-09 09:00:18,305 - WARN > > > > [QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception > > when > > > > observing the leader > > > > java.net.SocketTimeoutException: Read timed out > > > > at java.net.SocketInputStream.socketRead0(Native Method) > > > > at java.net.SocketInputStream.read(SocketInputStream.java: > 152) > > > > at java.net.SocketInputStream.read(SocketInputStream.java: > 122) > > > > at java.io.BufferedInputStream. > fill(BufferedInputStream.java: > > > 235) > > > > at java.io.BufferedInputStream. > read(BufferedInputStream.java: > > > 254) > > > > at java.io.DataInputStream.readInt(DataInputStream.java:387) > > > > at > > > > org.apache.jute.BinaryInputArchive.readInt( > BinaryInputArchive.java:63) > > > > at > > > > org.apache.zookeeper.server.quorum.QuorumPacket. > > > > deserialize(QuorumPacket.java:83) > > > > at > > > > org.apache.jute.BinaryInputArchive.readRecord( > > > BinaryInputArchive.java:108) > > > > at > > > > org.apache.zookeeper.server.quorum.Learner.readPacket( > > Learner.java:152) > > > > at > > > > org.apache.zookeeper.server.quorum.Observer.observeLeader( > > > > Observer.java:75) > > > > at > > > > org.apache.zookeeper.server.quorum.QuorumPeer.run( > QuorumPeer.java:727) > > > > 2017-03-09 09:00:18,306 - INFO > > > > [QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown > > > called > > > > java.lang.Exception: shutdown Observer > > > > at > > > > org.apache.zookeeper.server.quorum.Observer.shutdown( > > Observer.java:137) > > > > > > > > > > > > > > > > -- > > > Cheers > > > Michael. > > > > > > -- Cheers Michael. --001a1144abc478eaea054a59ab4a--