Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 58D7E200B4F for ; Tue, 12 Jul 2016 01:43:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 57407160A7D; Mon, 11 Jul 2016 23:43:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D0A91160A78 for ; Tue, 12 Jul 2016 01:43:08 +0200 (CEST) Received: (qmail 19092 invoked by uid 500); 11 Jul 2016 23:43:08 -0000 Mailing-List: contact user-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@curator.apache.org Delivered-To: mailing list user@curator.apache.org Received: (qmail 19081 invoked by uid 99); 11 Jul 2016 23:43:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jul 2016 23:43:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 7D5A0C0718 for ; Mon, 11 Jul 2016 23:43:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.18 X-Spam-Level: ** X-Spam-Status: No, score=2.18 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id UO6Dyu9FTOWf for ; Mon, 11 Jul 2016 23:43:00 +0000 (UTC) Received: from mail-oi0-f41.google.com (mail-oi0-f41.google.com [209.85.218.41]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 89A655FBBC for ; Mon, 11 Jul 2016 23:42:54 +0000 (UTC) Received: by mail-oi0-f41.google.com with SMTP id s66so170785606oif.1 for ; Mon, 11 Jul 2016 16:42:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=1zYW2IpOAh3wikKTMMBTuFiqLNwZ4nLLsVfeVjrL8uY=; b=e9oAKc0hCHd+fVUTaeG9WuUDTyuaWweyI7qxFPagyCv6Z31Zx0UMLM8ddBQp8IXsU1 +hsWrq3dghR+jvVqPsW7sMsjspddNQ8zhU+EWa/WV7uSnFbARS1DcqiJDqxzXYKHlci4 CqNnzO5NlMovU6OGsjesd6MYHp95KyWKwkfLpyoUfzNpOSPGky+eDs/eDQYjqyK1qFV5 Z81fTbkH4PIkSbK/HJQNv1TfO2sSyChOQHzPPmpMQAyrB4awKjfVHLeLGMtcQ6CNmpnF cQP8kyK+snhzgV5aurMkOvBYfVE5IOEpIkAusiQztfKJ9djc9EnUiEuKB6EXNDKXpTfY FfZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=1zYW2IpOAh3wikKTMMBTuFiqLNwZ4nLLsVfeVjrL8uY=; b=A1QmxfEzTtubY+tY6M3RpABKM3wivVPsOXb+vyUNG0X5a+GLyiuVPg5yKwtqR3XdUj 3ElglBBhT31DoLqcap2FqvpSPadWzzU57lvsZFhIWdSgfQE4tku/Hqb2L6TM2l7h2UX2 oh/VFvnSSpdzvoDgCvSey/bW3HhLS07Bzax61a515ndGvZKii2aNPwQliQQVqZMVYaGW BZiysXGgGN4kKRN4bOOfAfDE2skL8fPP918gSDvHvpolQZCD2PjR6BUxOCCj1fZHw6Il oNa76icbWmDwyRQFlu5BjlQv9yvICA6Vzw4t3ZqORK6jLcAEj5nW6ZJBVF/NH92rQ0i4 CRrw== X-Gm-Message-State: ALyK8tK7lzF2vHS/IvCZe8eDbajF3sAsZEhrBsYCyL4oLYfSMTkdWyht/1cCZdQWhYOQsNX2I0lnhF7GZRKMLQ== X-Received: by 10.202.172.129 with SMTP id v123mr11396654oie.132.1468280573255; Mon, 11 Jul 2016 16:42:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.18.144 with HTTP; Mon, 11 Jul 2016 16:42:52 -0700 (PDT) In-Reply-To: References: From: Cameron McKenzie Date: Tue, 12 Jul 2016 09:42:52 +1000 Message-ID: Subject: Re: Curator never goes through when ZK is read-only To: user@curator.apache.org Content-Type: multipart/alternative; boundary=001a113cdd0affe58c053764b5fa archived-at: Mon, 11 Jul 2016 23:43:10 -0000 --001a113cdd0affe58c053764b5fa Content-Type: text/plain; charset=UTF-8 It looks like a connection issue based on the logs. When you confirmed that the connection was ok via the zkCli, was that running on the same host as the read only server? Can you run it from the same host that you're running the Java client from? On Tue, Jul 12, 2016 at 9:40 AM, Benjamin Jaton wrote: > Well I was trying to let it run and it eventually fails, with a stack: > > Successfully established the connection with ZooKeeper > 2016-07-11 16:28:21,943 INFO org.apache.zookeeper.ClientCnxn:1251 - > Client session timed out, have not heard from server in 22519ms for > sessionid 0x0, closing socket connection and attempting reconnect > 2016-07-11 16:28:22,046 INFO org.apache.zookeeper.ClientCnxn:1138 - > Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not > attempt to authenticate using SASL (unknown error) > 2016-07-11 16:28:22,050 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:54702, > server: QA-E8WIN11/10.11.8.236:2181 > 2016-07-11 16:28:22,058 INFO org.apache.zookeeper.ClientCnxn:1400 - > Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, > sessionid = 0x100024f09360080, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-11 16:28:22,061 INFO org.apache.zookeeper.ClientCnxn:1297 - > Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200 > 2016-07-11 16:28:22,066 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-11 16:30:29,312 INFO org.apache.zookeeper.ClientCnxn:1253 - > Unable to read additional data from server sessionid 0x100024f09360080, > likely server has closed socket, closing socket connection and attempting > reconnect > 2016-07-11 16:30:29,416 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: SUSPENDED > 2016-07-11 16:30:31,031 INFO org.apache.zookeeper.ClientCnxn:1138 - > Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not > attempt to authenticate using SASL (unknown error) > 2016-07-11 16:30:31,032 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:55156, > server: QA-E8WIN11/10.11.8.236:2181 > 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1400 - > Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, > sessionid = 0x100024f09360083, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1297 - > Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400 > 2016-07-11 16:30:31,035 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-11 16:32:38,335 INFO org.apache.zookeeper.ClientCnxn:1253 - > Unable to read additional data from server sessionid 0x100024f09360083, > likely server has closed socket, closing socket connection and attempting > reconnect > 2016-07-11 16:32:38,441 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: SUSPENDED > 2016-07-11 16:32:40,425 INFO org.apache.zookeeper.ClientCnxn:1138 - > Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not > attempt to authenticate using SASL (unknown error) > 2016-07-11 16:32:40,426 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:55637, > server: QA-E8WIN11/10.11.8.236:2181 > 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1400 - > Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, > sessionid = 0x100024f09360087, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1297 - > Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 800 > 2016-07-11 16:32:40,428 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-11 16:34:47,615 INFO org.apache.zookeeper.ClientCnxn:1253 - > Unable to read additional data from server sessionid 0x100024f09360087, > likely server has closed socket, closing socket connection and attempting > reconnect > 2016-07-11 16:34:47,716 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: SUSPENDED > 2016-07-11 16:34:49,557 INFO org.apache.zookeeper.ClientCnxn:1138 - > Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not > attempt to authenticate using SASL (unknown error) > 2016-07-11 16:34:49,558 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:56094, > server: QA-E8WIN11/10.11.8.236:2181 > 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1400 - > Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, > sessionid = 0x100024f09360089, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1297 - > Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 1600 > 2016-07-11 16:34:49,560 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-11 16:36:56,895 INFO org.apache.zookeeper.ClientCnxn:1253 - > Unable to read additional data from server sessionid 0x100024f09360089, > likely server has closed socket, closing socket connection and attempting > reconnect > Exception in thread "main" 2016-07-11 16:36:56,996 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: SUSPENDED > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for / > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) > at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23) > > On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie > wrote: > >> So, it appears that the initial connection occurs ok, but then something >> goes wrong >> >> 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 - >> Unable to read additional data from server sessionid 0x100024f0936005b, >> likely server has closed socket, closing socket connection and attempting >> reconnect >> >> Then a reconnection occurs ok, and then the cycle starts again. >> >> Can you see where in the stack the getData() call is hanging? >> >> On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton > > wrote: >> >>> The blockUntilConnectedOrTimedOut() does return, we see the line: >>> "Successfully established the connection with ZooKeeper" >>> which happens right after. >>> >>> The getData() is where it hangs. >>> >>> We see this first: >>> 2016-07-11 15:47:08,420 INFO >>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>> change: READ_ONLY >>> >>> and then we see it again 2 minutes later: >>> 2016-07-11 15:49:17,316 INFO >>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>> change: READ_ONLY >>> >>> and then again 2 minutes after that: >>> 2016-07-11 15:51:26,470 INFO >>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>> change: READ_ONLY >>> >>> Then I stopped it. >>> >>> On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie < >>> mckenzie.cam@gmail.com> wrote: >>> >>>> hey Ben, >>>> From the logs it looks like the connection is established at the end? >>>> >>>> Are you saying that the blockUntilConnectedOrTimedOut() does not return >>>> when this occurs? >>>> cheers >>>> >>>> >>>> On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton < >>>> bjaton@radiantlogic.com> wrote: >>>> >>>>> ZK is 3.5.1-alpha >>>>> Curator is 2.11.0 >>>>> >>>>> On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton < >>>>> bjaton@radiantlogic.com> wrote: >>>>> >>>>>> I have a 3 nodes ZK cluster , readonly mode is enabled. >>>>>> 2 nodes are down, so the third one is in read-only (verified by using >>>>>> the ZK API manually). >>>>>> >>>>>> I'm using this piece of code: >>>>>> >>>>>> Builder curatorClientBuilder = CuratorFrameworkFactory.builder() >>>>>> .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181") >>>>>> .sessionTimeoutMs(45000).connectionTimeoutMs(15000) >>>>>> .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true); >>>>>> >>>>>> CuratorFramework client = curatorClientBuilder.build(); >>>>>> client.start(); >>>>>> client.getZookeeperClient().blockUntilConnectedOrTimedOut(); >>>>>> System.out.println("Successfully established the connection with >>>>>> ZooKeeper"); >>>>>> client.getData().forPath("/"); >>>>>> System.out.println("Done."); >>>>>> >>>>>> When curator pick the host that is UP first, it goes through very >>>>>> quickly. When it picks the host that is down first (QA-E8WIN12), it never >>>>>> goes through. >>>>>> >>>>>> Am I missing something here? >>>>>> >>>>>> Below is the log: >>>>>> >>>>>> 2016-07-11 15:46:45,733 INFO >>>>>> org.apache.curator.framework.imps.CuratorFrameworkImpl:235 - Starting >>>>>> 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:zookeeper.version=3.5.1-alpha-1748895, built on >>>>>> 06/17/2016 22:24 GMT >>>>>> 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:host.name=w-rli09-ben >>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:java.version=1.8.0_11 >>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:java.vendor=Oracle Corporation >>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:java.home=/usr/local/apps/jdk1.8.0_11/jre >>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:java.class.path=(..) >>>>>> 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:java.io.tmpdir=/tmp >>>>>> 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:java.compiler= >>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:os.name=Linux >>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:os.arch=amd64 >>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:os.version=3.8.0-44-generic >>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:user.name=benji >>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:user.home=(..) >>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:user.dir=(..) >>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:os.memory.free=385MB >>>>>> 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:os.memory.max=5358MB >>>>>> 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:os.memory.total=397MB >>>>>> 2016-07-11 15:46:45,750 INFO org.apache.zookeeper.ZooKeeper:716 - >>>>>> Initiating client connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 >>>>>> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@3e96bacf >>>>>> 2016-07-11 15:46:45,769 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>>> Opening socket connection to server QA-E8WIN12/10.11.8.232:2181. >>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>> Successfully established the connection with ZooKeeper >>>>>> 2016-07-11 15:47:08,300 INFO org.apache.zookeeper.ClientCnxn:1251 - >>>>>> Client session timed out, have not heard from server in 22534ms for >>>>>> sessionid 0x0, closing socket connection and attempting reconnect >>>>>> 2016-07-11 15:47:08,403 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. >>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>> 2016-07-11 15:47:08,406 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>>> Socket connection established, initiating session, client: / >>>>>> 10.11.12.4:47004, server: QA-E8WIN11/10.11.8.236:2181 >>>>>> 2016-07-11 15:47:08,413 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>>> sessionid = 0x100024f0936005b, negotiated timeout = 45000 (READ-ONLY mode) >>>>>> 2016-07-11 15:47:08,415 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout >>>>>> 200 >>>>>> 2016-07-11 15:47:08,420 INFO >>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>> change: READ_ONLY >>>>>> 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>>> Unable to read additional data from server sessionid 0x100024f0936005b, >>>>>> likely server has closed socket, closing socket connection and attempting >>>>>> reconnect >>>>>> 2016-07-11 15:49:15,815 INFO >>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>> change: SUSPENDED >>>>>> 2016-07-11 15:49:17,313 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. >>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>> 2016-07-11 15:49:17,314 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>>> Socket connection established, initiating session, client: / >>>>>> 10.11.12.4:47295, server: QA-E8WIN11/10.11.8.236:2181 >>>>>> 2016-07-11 15:49:17,316 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>>> sessionid = 0x100024f0936005f, negotiated timeout = 45000 (READ-ONLY mode) >>>>>> 2016-07-11 15:49:17,316 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout >>>>>> 400 >>>>>> 2016-07-11 15:49:17,316 INFO >>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>> change: READ_ONLY >>>>>> 2016-07-11 15:51:24,607 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>>> Unable to read additional data from server sessionid 0x100024f0936005f, >>>>>> likely server has closed socket, closing socket connection and attempting >>>>>> reconnect >>>>>> 2016-07-11 15:51:24,713 INFO >>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>> change: SUSPENDED >>>>>> 2016-07-11 15:51:26,467 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. >>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>> 2016-07-11 15:51:26,468 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>>> Socket connection established, initiating session, client: / >>>>>> 10.11.12.4:47576, server: QA-E8WIN11/10.11.8.236:2181 >>>>>> 2016-07-11 15:51:26,470 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>>> sessionid = 0x100024f09360061, negotiated timeout = 45000 (READ-ONLY mode) >>>>>> 2016-07-11 15:51:26,470 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout >>>>>> 800 >>>>>> 2016-07-11 15:51:26,470 INFO >>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>> change: READ_ONLY >>>>>> (...) >>>>>> >>>>>> Thanks >>>>>> Ben >>>>>> >>>>>> >>>>> >>>> >>> >> > --001a113cdd0affe58c053764b5fa Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
It looks like a connection issue based on the logs. When y= ou confirmed that the connection was ok via the zkCli, was that running on = the same host as the read only server? Can you run it from the same host th= at you're running the Java client from?

On Tue, Jul 12, 2016 at 9:40 AM, Benjamin J= aton <benjamin.jaton@gmail.com> wrote:
Well I was trying to let it run and it= eventually fails, with a stack:

Successfully estab= lished the connection with ZooKeeper
2016-07-11 16:28:21,943 INFO= =C2=A0 org.apache.zookeeper.ClientCnxn:1251 - Client session timed out, hav= e not heard from server in 22519ms for sessionid 0x0, closing socket connec= tion and attempting reconnect
2016-07-11 16:28:22,046 INFO=C2=A0 org.apa= che.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WI= N11/10.11.8.236:2181<= /a>. Will not attempt to authenticate using SASL (unknown error)
2016-07= -11 16:28:22,050 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:980 - Socket co= nnection established, initiating session, client: /
10.11.12.4:54702, server: QA-E8WIN11/10.11.8.236:2181
2= 016-07-11 16:28:22,058 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1400 - Se= ssion establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =3D 0x100024f0= 9360080, negotiated timeout =3D 45000 (READ-ONLY mode)
2016-07-11 16:28:= 22,061 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1297 - Checking server QA= -E8WIN12/10.11.8.232:= 2181 for being r/w. Timeout 200
2016-07-11 16:28:22,066 INFO=C2=A0 o= rg.apache.curator.framework.state.ConnectionStateManager:228 - State change= : READ_ONLY
2016-07-11 16:30:29,312 INFO=C2=A0 org.apache.zookeeper.Clie= ntCnxn:1253 - Unable to read additional data from server sessionid 0x100024= f09360080, likely server has closed socket, closing socket connection and a= ttempting reconnect
2016-07-11 16:30:29,416 INFO=C2=A0 org.apache.curato= r.framework.state.ConnectionStateManager:228 - State change: SUSPENDED
2= 016-07-11 16:30:31,031 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1138 - Op= ening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authentic= ate using SASL (unknown error)
2016-07-11 16:30:31,032 INFO=C2=A0 org.ap= ache.zookeeper.ClientCnxn:980 - Socket connection established, initiating s= ession, client: /10.1= 1.12.4:55156, server: QA-E8WIN11/10.11.8.236:2181
2016-07-11 16:30:31,034 INFO=C2=A0 = org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on se= rver QA-E8WIN11/10.11= .8.236:2181, sessionid =3D 0x100024f09360083, negotiated timeout =3D 45= 000 (READ-ONLY mode)
2016-07-11 16:30:31,034 INFO=C2=A0 org.apache.zooke= eper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 40= 0
2016-07-11 16:30:31,035 INFO=C2=A0 org.apache.curator.framework.state.= ConnectionStateManager:228 - State change: READ_ONLY
2016-07-11 16:32:38= ,335 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1253 - Unable to read addit= ional data from server sessionid 0x100024f09360083, likely server has close= d socket, closing socket connection and attempting reconnect
2016-07-11 = 16:32:38,441 INFO=C2=A0 org.apache.curator.framework.state.ConnectionStateM= anager:228 - State change: SUSPENDED
2016-07-11 16:32:40,425 INFO=C2=A0 = org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server = QA-E8WIN11/10.11.8.23= 6:2181. Will not attempt to authenticate using SASL (unknown error)
= 2016-07-11 16:32:40,426 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:980 - So= cket connection established, initiating session, client: /10.11.12.4:55637, server: QA-E8WIN= 11/10.11.8.236:2181
2016-07-11 16:32:40,428 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:14= 00 - Session establishment complete on server QA-E8WIN11/
10.11.8.236:2181, sessionid =3D 0x1= 00024f09360087, negotiated timeout =3D 45000 (READ-ONLY mode)
2016-07-11= 16:32:40,428 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1297 - Checking se= rver QA-E8WIN12/10.11= .8.232:2181 for being r/w. Timeout 800
2016-07-11 16:32:40,428 INFO= =C2=A0 org.apache.curator.framework.state.ConnectionStateManager:228 - Stat= e change: READ_ONLY
2016-07-11 16:34:47,615 INFO=C2=A0 org.apache.zookee= per.ClientCnxn:1253 - Unable to read additional data from server sessionid = 0x100024f09360087, likely server has closed socket, closing socket connecti= on and attempting reconnect
2016-07-11 16:34:47,716 INFO=C2=A0 org.apach= e.curator.framework.state.ConnectionStateManager:228 - State change: SUSPEN= DED
2016-07-11 16:34:49,557 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1= 138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to a= uthenticate using SASL (unknown error)
2016-07-11 16:34:49,558 INFO=C2= =A0 org.apache.zookeeper.ClientCnxn:980 - Socket connection established, in= itiating session, client: /10.11.12.4:56094, server: QA-E8WIN11/10.11.8.236:2181
2016-07-11 16:34:49,560 I= NFO=C2=A0 org.apache.zookeeper.ClientCnxn:1400 - Session establishment comp= lete on server QA-E8WIN11/10.11.8.236:2181, sessionid =3D 0x100024f09360089, negotiated time= out =3D 45000 (READ-ONLY mode)
2016-07-11 16:34:49,560 INFO=C2=A0 org.ap= ache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. = Timeout 1600
2016-07-11 16:34:49,560 INFO=C2=A0 org.apache.curator.frame= work.state.ConnectionStateManager:228 - State change: READ_ONLY
2016-07-= 11 16:36:56,895 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1253 - Unable to= read additional data from server sessionid 0x100024f09360089, likely serve= r has closed socket, closing socket connection and attempting reconnect
= Exception in thread "main" 2016-07-11 16:36:56,996 INFO=C2=A0 org= .apache.curator.framework.state.ConnectionStateManager:228 - State change: = SUSPENDED
org.apache.zookeeper.KeeperException$ConnectionLossException: = KeeperErrorCode =3D ConnectionLoss for /
=C2=A0=C2=A0=C2=A0 at org.apach= e.zookeeper.KeeperException.create(KeeperException.java:99)
=C2=A0=C2=A0= =C2=A0 at org.apache.zookeeper.KeeperException.create(KeeperException.java:= 51)
=C2=A0=C2=A0=C2=A0 at org.apache.zookeeper.ZooKeeper.getData(ZooKeep= er.java:1753)
=C2=A0=C2=A0=C2=A0 at org.apache.curator.framework.imps.Ge= tDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
=C2=A0=C2=A0=C2=A0 = at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuild= erImpl.java:299)
=C2=A0=C2=A0=C2=A0 at org.apache.curator.RetryLoop.call= WithRetry(RetryLoop.java:108)
=C2=A0=C2=A0=C2=A0 at org.apache.curator.f= ramework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:2= 95)
=C2=A0=C2=A0=C2=A0 at org.apache.curator.framework.imps.GetDataBuild= erImpl.forPath(GetDataBuilderImpl.java:287)
=C2=A0=C2=A0=C2=A0 at org.ap= ache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.j= ava:34)
=C2=A0=C2=A0=C2=A0 at TestCuratorReadOnly.main(TestCuratorReadOn= ly.java:23)

On Mon, Jul 11, 2016 at 4:39 PM= , Cameron McKenzie <mckenzie.cam@gmail.com> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
So, it appears that the init= ial connection occurs ok, but then something goes wrong

2016-07-11 15:49:15,712 INFO =C2=A0o= rg.apache.zookeeper.ClientCnxn:1253= - Unable to read additional data from server sessionid 0x100024f0936005b, = likely server has closed socket, closing socket connection and attempting r= econnect

<= /div>
Then a reconnection occur= s ok, and then the cycle starts again.

Can yo= u see where in the stack the getData() call is hanging?
<= div>

On Tue, = Jul 12, 2016 at 9:35 AM, Benjamin Jaton <benjamin.jaton@gmail.com> wrote:
The blockUntilConnectedOrTimedOut() does return,= we see the line:
"Successfully established the connection wi= th ZooKeeper"
which happens right after.

= The getData() is where it hangs.

We see this first:
2= 016-07-11 15:47:08,420 INFO=C2=A0 org.apache.curator.framework.state.Connec= tionStateManager:228 - State change: READ_ONLY

and then= we see it again 2 minutes later:
2016-07-11 15:49:17,316 INFO=C2= =A0 org.apache.curator.framework.state.ConnectionStateManager:228 - State c= hange: READ_ONLY

and then again 2 minutes after that:
2016-07-11 15:51:26,470 INFO=C2=A0 org.apache.curator.framework.sta= te.ConnectionStateManager:228 - State change: READ_ONLY

Then I stopped it.

On Mon, Jul 11, 2016 at 4:13 PM, Cameron M= cKenzie <mckenzie.cam@gmail.com> wrote:
hey Ben,
From the logs it looks lik= e the connection is established at the end?

Are yo= u saying that the blockUntilConnectedOrTimedOut() does not return when this= occurs?=C2=A0
cheers


On Tue, Jul 12, 2016 at = 9:10 AM, Benjamin Jaton <bjaton@radiantlogic.com> wrot= e:
ZK is 3.5.1-alphaCurator is 2.11.0

On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton <bjaton@radiantlogic.com> wrote:
I have a 3 nodes ZK cluster , readonly mode is en= abled.
2 nodes are down, so the third one is in read-only (verified by = using the ZK API manually).

I'm using this pie= ce of code:

Builder curatorClientBuilder =3D CuratorFrameworkFactory.build= er()
.connectStri= ng("QA-E8WIN11:2181,QA-E8WIN12:2181")
.sessionTimeoutMs(45000).connectionTimeoutMs= (15000)
.retryPol= icy(new RetryNTimes(3, 5000)).canBeReadOnly(true);

CuratorFramework client =3D = curatorClientBuilder.build();
client.start();
= client.getZookeeperClient().blockUntilConnectedOrTimedOut();
<= div> System.out.println("= Successfully established the connection with ZooKeeper");
client.getData().forPath("/");
System.out.println("Do= ne.");

When curator pick the host that = is UP first, it goes through very quickly. When it picks the host that is d= own first (QA-E8WIN12), it never goes through.

Am = I missing something here?

Below is the log:
<= div>
2016-07-11 15:46:45,733 INFO =C2=A0org.apache.curat= or.framework.imps.CuratorFrameworkImpl:235 - Starting
2016-07-11 = 15:46:45,743 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environ= ment:zookeeper.version=3D3.5.1-alpha-1748895, built on 06/17/2016 22:24 GMT=
2016-07-11 15:46:45,743 INFO =C2=A0org.apache.zookeeper.ZooKeepe= r:109 - Client environment:h= ost.name=3Dw-rli09-ben
2016-07-11 15:46:45,744 INFO =C2=A0org= .apache.zookeeper.ZooKeeper:109 - Client environment:java.version=3D1.8.0_1= 1
2016-07-11 15:46:45,744 INFO =C2=A0org.apache.zookeeper.ZooKeep= er:109 - Client environment:java.vendor=3DOracle Corporation
2016= -07-11 15:46:45,744 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client = environment:java.home=3D/usr/local/apps/jdk1.8.0_11/jre
2016-07-1= 1 15:46:45,744 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client envir= onment:java.class.path=3D(..)
2016-07-11 15:46:45,746 INFO =C2=A0= org.apache.zookeeper.ZooKeeper:109 - Client environment:java.io.tmpdir=3D/t= mp
2016-07-11 15:46:45,746 INFO =C2=A0org.apache.zookeeper.ZooKee= per:109 - Client environment:java.compiler=3D<NA>
2016-07-1= 1 15:46:45,747 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client envir= onment:os.name=3DLinux
2016-07-11 15:46:45,747 INFO =C2=A0org.apache.zookeeper.ZooKeeper:10= 9 - Client environment:os.arch=3Damd64
2016-07-11 15:46:45,747 IN= FO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environment:os.version= =3D3.8.0-44-generic
2016-07-11 15:46:45,747 INFO =C2=A0org.apache= .zookeeper.ZooKeeper:109 - Client environment:user.name=3Dbenji
2016-07-11 15:46:45,747 I= NFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environment:user.home= =3D(..)
2016-07-11 15:46:45,747 INFO =C2=A0org.apache.zookeeper.Z= ooKeeper:109 - Client environment:user.dir=3D(..)
2016-07-11 15:4= 6:45,747 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environment= :os.memory.free=3D385MB
2016-07-11 15:46:45,748 INFO =C2=A0org.ap= ache.zookeeper.ZooKeeper:109 - Client environment:os.memory.max=3D5358MB
2016-07-11 15:46:45,748 INFO =C2=A0org.apache.zookeeper.ZooKeeper:1= 09 - Client environment:os.memory.total=3D397MB
2016-07-11 15:46:= 45,750 INFO =C2=A0org.apache.zookeeper.ZooKeeper:716 - Initiating client co= nnection, connectString=3DQA-E8WIN11:2181,QA-E8WIN12:2181 sessionTimeout=3D= 45000 watcher=3Dorg.apache.curator.ConnectionState@3e96bacf
2016-= 07-11 15:46:45,769 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1138 - Openin= g socket connection to server QA-E8WIN12/10.11.8.232:2181. Will not attempt to authenticate = using SASL (unknown error)
Successfully established the connectio= n with ZooKeeper
2016-07-11 15:47:08,300 INFO =C2=A0org.apache.zo= okeeper.ClientCnxn:1251 - Client session timed out, have not heard from ser= ver in 22534ms for sessionid 0x0, closing socket connection and attempting = reconnect
2016-07-11 15:47:08,403 INFO =C2=A0org.apache.zookeeper= .ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will n= ot attempt to authenticate using SASL (unknown error)
2016-07-11 = 15:47:08,406 INFO =C2=A0org.apache.zookeeper.ClientCnxn:980 - Socket connec= tion established, initiating session, client: /10.11.12.4:47004, server: QA-E8WIN11/10.11.8.236:2181
2016-07-11 15:47:08,413 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1400 = - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =3D 0x1000= 24f0936005b, negotiated timeout =3D 45000 (READ-ONLY mode)
2016-0= 7-11 15:47:08,415 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1297 - Checkin= g server QA-E8WIN12/1= 0.11.8.232:2181 for being r/w. Timeout 200
2016-07-11 15:47:0= 8,420 INFO =C2=A0org.apache.curator.framework.state.ConnectionStateManager:= 228 - State change: READ_ONLY
2016-07-11 15:49:15,712 INFO =C2=A0= org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from = server sessionid 0x100024f0936005b, likely server has closed socket, closin= g socket connection and attempting reconnect
2016-07-11 15:49:15,= 815 INFO =C2=A0org.apache.curator.framework.state.ConnectionStateManager:22= 8 - State change: SUSPENDED
2016-07-11 15:49:17,313 INFO =C2=A0or= g.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA= -E8WIN11/10.11.8.236:= 2181. Will not attempt to authenticate using SASL (unknown error)
=
2016-07-11 15:49:17,314 INFO =C2=A0org.apache.zookeeper.ClientCnxn:980= - Socket connection established, initiating session, client: /10.11.12.4:47295, server: QA-= E8WIN11/10.11.8.236:2= 181
2016-07-11 15:49:17,316 INFO =C2=A0org.apache.zookeeper.C= lientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessi= onid =3D 0x100024f0936005f, negotiated timeout =3D 45000 (READ-ONLY mode)
2016-07-11 15:49:17,316 INFO =C2=A0org.apache.zookeeper.ClientCnxn= :1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400
201= 6-07-11 15:49:17,316 INFO =C2=A0org.apache.curator.framework.state.Connecti= onStateManager:228 - State change: READ_ONLY
2016-07-11 15:51:24,= 607 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1253 - Unable to read additi= onal data from server sessionid 0x100024f0936005f, likely server has closed= socket, closing socket connection and attempting reconnect
2016-= 07-11 15:51:24,713 INFO =C2=A0org.apache.curator.framework.state.Connection= StateManager:228 - State change: SUSPENDED
2016-07-11 15:51:26,46= 7 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1138 - Opening socket connecti= on to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate using SASL (unkno= wn error)
2016-07-11 15:51:26,468 INFO =C2=A0org.apache.zookeeper= .ClientCnxn:980 - Socket connection established, initiating session, client= : /10.11.12.4:47576, server: QA-E8WIN11/10.11.8.236:2181
2016-07-11 15:51:26,470 INFO =C2=A0org.apa= che.zookeeper.ClientCnxn:1400 - Session establishment complete on server QA= -E8WIN11/10.11.8.236:= 2181, sessionid =3D 0x100024f09360061, negotiated timeout =3D 45000 (RE= AD-ONLY mode)
2016-07-11 15:51:26,470 INFO =C2=A0org.apache.zooke= eper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 80= 0
2016-07-11 15:51:26,470 INFO =C2=A0org.apache.curator.framework= .state.ConnectionStateManager:228 - State change: READ_ONLY
(...)

Thanks
Ben







--001a113cdd0affe58c053764b5fa--