Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 08F9A200B4F for ; Tue, 12 Jul 2016 01:41:19 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0793F160A78; Mon, 11 Jul 2016 23:41:19 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 82DA6160A7D for ; Tue, 12 Jul 2016 01:41:17 +0200 (CEST) Received: (qmail 10964 invoked by uid 500); 11 Jul 2016 23:41:16 -0000 Mailing-List: contact user-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@curator.apache.org Delivered-To: mailing list user@curator.apache.org Received: (qmail 10951 invoked by uid 99); 11 Jul 2016 23:41:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jul 2016 23:41:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2761FC7259 for ; Mon, 11 Jul 2016 23:41:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.18 X-Spam-Level: ** X-Spam-Status: No, score=2.18 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ldbZjktf2Aiv for ; Mon, 11 Jul 2016 23:41:14 +0000 (UTC) Received: from mail-lf0-f47.google.com (mail-lf0-f47.google.com [209.85.215.47]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 4CCD55FD86 for ; Mon, 11 Jul 2016 23:41:08 +0000 (UTC) Received: by mail-lf0-f47.google.com with SMTP id b199so2333719lfe.0 for ; Mon, 11 Jul 2016 16:41:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=qQ6Aq9l4D6O9G2oXa1py5kUQi95SHJDnaOr8cjEgwy8=; b=ewl2x2TNzTqWLTYjDx/OL5dC+L9e43eqhC06KlTaPJLZ+il2YWiG8Fy9+NfVLcMsLw MMJQyox9ZUK+ZcHkOGpGdHkQ8j/zfxnSmpfb+0gAH3hGXvKEz/86fJEr43ZpppH0Y271 nhoY90tpHZWzPxYrdjE0YoY1a8Te9Dd+BkZeexk3nWZtVECgw8DwQwe/ywBBL2JUKMsl Y6m8uwUm+iRNEi0E1vyNBQ81lIYKPHso67H+Hb/eMU+Qrvd/kSeEfe3KGm+qYB+cYBq9 XmpDA4Ap4WZ9M8ZyNwcUxkQct58lI1FKuuHC41Fg5Dlq3wj1jnb/e9MU8bBPTlg9JCTf 7Eog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=qQ6Aq9l4D6O9G2oXa1py5kUQi95SHJDnaOr8cjEgwy8=; b=TFA7KbNGtVGBXE866X0cxsR77M7qUf3IqfhVnQTv/fv9fWsV8CthSReWCjpf+Saaky Di43v3Ud63qnl6aYoAU4QVWcaHtkGuAeiv6J3BIhY56V6fvTQSGnXRfATR3J4FH3ZrRU +PCXzxEqlhnT+XYuiMD0Mnj7WX8xoNDfZRzfmdQJJImalZvN5E0sqMYv4w6/6wAJXaWv LekzvoUJIRE9rOUvHzT5NikMY3A3hgQb9RFgF4OtYmFWACviKslHmmAuCfKnqTbfsNJl H39MuxvtPArG53V2tRi6EVgfFTy41MnOmAYS9uCl/jJIvcu2yDLGWyORak0KXBFUBgXL bSUg== X-Gm-Message-State: ALyK8tLiH/xemvYNmUL06cYGw5uJxh4rTc14OOVwZlynrobhijxqBA2pov8DeRPNvuBDGaTV7cB+5g+JOy/YRw== X-Received: by 10.25.38.213 with SMTP id m204mr6895276lfm.107.1468280454687; Mon, 11 Jul 2016 16:40:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.78.2 with HTTP; Mon, 11 Jul 2016 16:40:53 -0700 (PDT) In-Reply-To: References: From: Benjamin Jaton Date: Mon, 11 Jul 2016 16:40:53 -0700 Message-ID: Subject: Re: Curator never goes through when ZK is read-only To: user@curator.apache.org Content-Type: multipart/alternative; boundary=001a11403d04eeb6b1053764ae3a archived-at: Mon, 11 Jul 2016 23:41:19 -0000 --001a11403d04eeb6b1053764ae3a Content-Type: text/plain; charset=UTF-8 Well I was trying to let it run and it eventually fails, with a stack: Successfully established the connection with ZooKeeper 2016-07-11 16:28:21,943 INFO org.apache.zookeeper.ClientCnxn:1251 - Client session timed out, have not heard from server in 22519ms for sessionid 0x0, closing socket connection and attempting reconnect 2016-07-11 16:28:22,046 INFO org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate using SASL (unknown error) 2016-07-11 16:28:22,050 INFO org.apache.zookeeper.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.12.4:54702, server: QA-E8WIN11/10.11.8.236:2181 2016-07-11 16:28:22,058 INFO org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = 0x100024f09360080, negotiated timeout = 45000 (READ-ONLY mode) 2016-07-11 16:28:22,061 INFO org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200 2016-07-11 16:28:22,066 INFO org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY 2016-07-11 16:30:29,312 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server sessionid 0x100024f09360080, likely server has closed socket, closing socket connection and attempting reconnect 2016-07-11 16:30:29,416 INFO org.apache.curator.framework.state.ConnectionStateManager:228 - State change: SUSPENDED 2016-07-11 16:30:31,031 INFO org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate using SASL (unknown error) 2016-07-11 16:30:31,032 INFO org.apache.zookeeper.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.12.4:55156, server: QA-E8WIN11/10.11.8.236:2181 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = 0x100024f09360083, negotiated timeout = 45000 (READ-ONLY mode) 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400 2016-07-11 16:30:31,035 INFO org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY 2016-07-11 16:32:38,335 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server sessionid 0x100024f09360083, likely server has closed socket, closing socket connection and attempting reconnect 2016-07-11 16:32:38,441 INFO org.apache.curator.framework.state.ConnectionStateManager:228 - State change: SUSPENDED 2016-07-11 16:32:40,425 INFO org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate using SASL (unknown error) 2016-07-11 16:32:40,426 INFO org.apache.zookeeper.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.12.4:55637, server: QA-E8WIN11/10.11.8.236:2181 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = 0x100024f09360087, negotiated timeout = 45000 (READ-ONLY mode) 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 800 2016-07-11 16:32:40,428 INFO org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY 2016-07-11 16:34:47,615 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server sessionid 0x100024f09360087, likely server has closed socket, closing socket connection and attempting reconnect 2016-07-11 16:34:47,716 INFO org.apache.curator.framework.state.ConnectionStateManager:228 - State change: SUSPENDED 2016-07-11 16:34:49,557 INFO org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate using SASL (unknown error) 2016-07-11 16:34:49,558 INFO org.apache.zookeeper.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.12.4:56094, server: QA-E8WIN11/10.11.8.236:2181 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = 0x100024f09360089, negotiated timeout = 45000 (READ-ONLY mode) 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 1600 2016-07-11 16:34:49,560 INFO org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY 2016-07-11 16:36:56,895 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server sessionid 0x100024f09360089, likely server has closed socket, closing socket connection and attempting reconnect Exception in thread "main" 2016-07-11 16:36:56,996 INFO org.apache.curator.framework.state.ConnectionStateManager:228 - State change: SUSPENDED org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for / at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753) at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295) at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23) On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie wrote: > So, it appears that the initial connection occurs ok, but then something > goes wrong > > 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 - > Unable to read additional data from server sessionid 0x100024f0936005b, > likely server has closed socket, closing socket connection and attempting > reconnect > > Then a reconnection occurs ok, and then the cycle starts again. > > Can you see where in the stack the getData() call is hanging? > > On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton > wrote: > >> The blockUntilConnectedOrTimedOut() does return, we see the line: >> "Successfully established the connection with ZooKeeper" >> which happens right after. >> >> The getData() is where it hangs. >> >> We see this first: >> 2016-07-11 15:47:08,420 INFO >> org.apache.curator.framework.state.ConnectionStateManager:228 - State >> change: READ_ONLY >> >> and then we see it again 2 minutes later: >> 2016-07-11 15:49:17,316 INFO >> org.apache.curator.framework.state.ConnectionStateManager:228 - State >> change: READ_ONLY >> >> and then again 2 minutes after that: >> 2016-07-11 15:51:26,470 INFO >> org.apache.curator.framework.state.ConnectionStateManager:228 - State >> change: READ_ONLY >> >> Then I stopped it. >> >> On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie > > wrote: >> >>> hey Ben, >>> From the logs it looks like the connection is established at the end? >>> >>> Are you saying that the blockUntilConnectedOrTimedOut() does not return >>> when this occurs? >>> cheers >>> >>> >>> On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton >> > wrote: >>> >>>> ZK is 3.5.1-alpha >>>> Curator is 2.11.0 >>>> >>>> On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton < >>>> bjaton@radiantlogic.com> wrote: >>>> >>>>> I have a 3 nodes ZK cluster , readonly mode is enabled. >>>>> 2 nodes are down, so the third one is in read-only (verified by using >>>>> the ZK API manually). >>>>> >>>>> I'm using this piece of code: >>>>> >>>>> Builder curatorClientBuilder = CuratorFrameworkFactory.builder() >>>>> .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181") >>>>> .sessionTimeoutMs(45000).connectionTimeoutMs(15000) >>>>> .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true); >>>>> >>>>> CuratorFramework client = curatorClientBuilder.build(); >>>>> client.start(); >>>>> client.getZookeeperClient().blockUntilConnectedOrTimedOut(); >>>>> System.out.println("Successfully established the connection with >>>>> ZooKeeper"); >>>>> client.getData().forPath("/"); >>>>> System.out.println("Done."); >>>>> >>>>> When curator pick the host that is UP first, it goes through very >>>>> quickly. When it picks the host that is down first (QA-E8WIN12), it never >>>>> goes through. >>>>> >>>>> Am I missing something here? >>>>> >>>>> Below is the log: >>>>> >>>>> 2016-07-11 15:46:45,733 INFO >>>>> org.apache.curator.framework.imps.CuratorFrameworkImpl:235 - Starting >>>>> 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:zookeeper.version=3.5.1-alpha-1748895, built on >>>>> 06/17/2016 22:24 GMT >>>>> 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:host.name=w-rli09-ben >>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:java.version=1.8.0_11 >>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:java.vendor=Oracle Corporation >>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:java.home=/usr/local/apps/jdk1.8.0_11/jre >>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:java.class.path=(..) >>>>> 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:java.io.tmpdir=/tmp >>>>> 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:java.compiler= >>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:os.name=Linux >>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:os.arch=amd64 >>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:os.version=3.8.0-44-generic >>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:user.name=benji >>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:user.home=(..) >>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:user.dir=(..) >>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:os.memory.free=385MB >>>>> 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:os.memory.max=5358MB >>>>> 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>> Client environment:os.memory.total=397MB >>>>> 2016-07-11 15:46:45,750 INFO org.apache.zookeeper.ZooKeeper:716 - >>>>> Initiating client connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 >>>>> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@3e96bacf >>>>> 2016-07-11 15:46:45,769 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>> Opening socket connection to server QA-E8WIN12/10.11.8.232:2181. Will >>>>> not attempt to authenticate using SASL (unknown error) >>>>> Successfully established the connection with ZooKeeper >>>>> 2016-07-11 15:47:08,300 INFO org.apache.zookeeper.ClientCnxn:1251 - >>>>> Client session timed out, have not heard from server in 22534ms for >>>>> sessionid 0x0, closing socket connection and attempting reconnect >>>>> 2016-07-11 15:47:08,403 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will >>>>> not attempt to authenticate using SASL (unknown error) >>>>> 2016-07-11 15:47:08,406 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>> Socket connection established, initiating session, client: / >>>>> 10.11.12.4:47004, server: QA-E8WIN11/10.11.8.236:2181 >>>>> 2016-07-11 15:47:08,413 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>> sessionid = 0x100024f0936005b, negotiated timeout = 45000 (READ-ONLY mode) >>>>> 2016-07-11 15:47:08,415 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200 >>>>> 2016-07-11 15:47:08,420 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>> change: READ_ONLY >>>>> 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>> Unable to read additional data from server sessionid 0x100024f0936005b, >>>>> likely server has closed socket, closing socket connection and attempting >>>>> reconnect >>>>> 2016-07-11 15:49:15,815 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>> change: SUSPENDED >>>>> 2016-07-11 15:49:17,313 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will >>>>> not attempt to authenticate using SASL (unknown error) >>>>> 2016-07-11 15:49:17,314 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>> Socket connection established, initiating session, client: / >>>>> 10.11.12.4:47295, server: QA-E8WIN11/10.11.8.236:2181 >>>>> 2016-07-11 15:49:17,316 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>> sessionid = 0x100024f0936005f, negotiated timeout = 45000 (READ-ONLY mode) >>>>> 2016-07-11 15:49:17,316 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400 >>>>> 2016-07-11 15:49:17,316 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>> change: READ_ONLY >>>>> 2016-07-11 15:51:24,607 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>> Unable to read additional data from server sessionid 0x100024f0936005f, >>>>> likely server has closed socket, closing socket connection and attempting >>>>> reconnect >>>>> 2016-07-11 15:51:24,713 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>> change: SUSPENDED >>>>> 2016-07-11 15:51:26,467 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will >>>>> not attempt to authenticate using SASL (unknown error) >>>>> 2016-07-11 15:51:26,468 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>> Socket connection established, initiating session, client: / >>>>> 10.11.12.4:47576, server: QA-E8WIN11/10.11.8.236:2181 >>>>> 2016-07-11 15:51:26,470 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>> sessionid = 0x100024f09360061, negotiated timeout = 45000 (READ-ONLY mode) >>>>> 2016-07-11 15:51:26,470 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 800 >>>>> 2016-07-11 15:51:26,470 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>> change: READ_ONLY >>>>> (...) >>>>> >>>>> Thanks >>>>> Ben >>>>> >>>>> >>>> >>> >> > --001a11403d04eeb6b1053764ae3a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Well I was trying to let it run and it eventually fails, w= ith a stack:

Successfully established the connection with ZooKeeper<= br>2016-07-11 16:28:21,943 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1251 = - Client session timed out, have not heard from server in 22519ms for sessi= onid 0x0, closing socket connection and attempting reconnect
2016-07-11 = 16:28:22,046 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1138 - Opening sock= et connection to server QA-E8WIN11/10.1= 1.8.236:2181. Will not attempt to authenticate using SASL (unknown erro= r)
2016-07-11 16:28:22,050 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:98= 0 - Socket connection established, initiating session, client: /10.11.12.4:54702, server: QA-E8WIN11/10.11.8.236:2181
2016-07-11 16:28:22,05= 8 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1400 - Session establishment c= omplete on server QA-E8WIN11/10.11.8.23= 6:2181, sessionid =3D 0x100024f09360080, negotiated timeout =3D 45000 (= READ-ONLY mode)
2016-07-11 16:28:22,061 INFO=C2=A0 org.apache.zookeeper.= ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200
2016-07-11 16:28:2= 2,066 INFO=C2=A0 org.apache.curator.framework.state.ConnectionStateManager:= 228 - State change: READ_ONLY
2016-07-11 16:30:29,312 INFO=C2=A0 org.apa= che.zookeeper.ClientCnxn:1253 - Unable to read additional data from server = sessionid 0x100024f09360080, likely server has closed socket, closing socke= t connection and attempting reconnect
2016-07-11 16:30:29,416 INFO=C2=A0= org.apache.curator.framework.state.ConnectionStateManager:228 - State chan= ge: SUSPENDED
2016-07-11 16:30:31,031 INFO=C2=A0 org.apache.zookeeper.Cl= ientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authentic= ate using SASL (unknown error)
2016-07-11 16:30:31,032 INFO=C2=A0 org.ap= ache.zookeeper.ClientCnxn:980 - Socket connection established, initiating s= ession, client: /10.11.12.4:55156, = server: QA-E8WIN11/10.11.8.236:2181=
2016-07-11 16:30:31,034 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1400= - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =3D 0x100024f09360083, nego= tiated timeout =3D 45000 (READ-ONLY mode)
2016-07-11 16:30:31,034 INFO= =C2=A0 org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeo= ut 400
2016-07-11 16:30:31,035 INFO=C2=A0 org.apache.curator.framework.s= tate.ConnectionStateManager:228 - State change: READ_ONLY
2016-07-11 16:= 32:38,335 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1253 - Unable to read = additional data from server sessionid 0x100024f09360083, likely server has = closed socket, closing socket connection and attempting reconnect
2016-0= 7-11 16:32:38,441 INFO=C2=A0 org.apache.curator.framework.state.ConnectionS= tateManager:228 - State change: SUSPENDED
2016-07-11 16:32:40,425 INFO= =C2=A0 org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to = server QA-E8WIN11/10.11.8.236:2181.= Will not attempt to authenticate using SASL (unknown error)
2016-07-11 = 16:32:40,426 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:980 - Socket connec= tion established, initiating session, client: /10.11.12.4:55637, server: QA-E8WIN11/10.11.8.236:2181
2016-07-11 16:32:40,428 INFO=C2=A0 org.ap= ache.zookeeper.ClientCnxn:1400 - Session establishment complete on server Q= A-E8WIN11/10.11.8.236:2181, session= id =3D 0x100024f09360087, negotiated timeout =3D 45000 (READ-ONLY mode)
= 2016-07-11 16:32:40,428 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1297 - C= hecking server QA-E8WIN12/10.11.8.232:2= 181 for being r/w. Timeout 800
2016-07-11 16:32:40,428 INFO=C2=A0 or= g.apache.curator.framework.state.ConnectionStateManager:228 - State change:= READ_ONLY
2016-07-11 16:34:47,615 INFO=C2=A0 org.apache.zookeeper.Clien= tCnxn:1253 - Unable to read additional data from server sessionid 0x100024f= 09360087, likely server has closed socket, closing socket connection and at= tempting reconnect
2016-07-11 16:34:47,716 INFO=C2=A0 org.apache.curator= .framework.state.ConnectionStateManager:228 - State change: SUSPENDED
20= 16-07-11 16:34:49,557 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1138 - Ope= ning socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate using SASL (unk= nown error)
2016-07-11 16:34:49,558 INFO=C2=A0 org.apache.zookeeper.Clie= ntCnxn:980 - Socket connection established, initiating session, client: /10.11.12.4:56094, server: QA-E8WIN11/= 10.11.8.236:2181
2016-07-11 16:3= 4:49,560 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1400 - Session establis= hment complete on server QA-E8WIN11/10.= 11.8.236:2181, sessionid =3D 0x100024f09360089, negotiated timeout =3D = 45000 (READ-ONLY mode)
2016-07-11 16:34:49,560 INFO=C2=A0 org.apache.zoo= keeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 1600
2016-07-11= 16:34:49,560 INFO=C2=A0 org.apache.curator.framework.state.ConnectionState= Manager:228 - State change: READ_ONLY
2016-07-11 16:36:56,895 INFO=C2=A0= org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from= server sessionid 0x100024f09360089, likely server has closed socket, closi= ng socket connection and attempting reconnect
Exception in thread "= main" 2016-07-11 16:36:56,996 INFO=C2=A0 org.apache.curator.framework.= state.ConnectionStateManager:228 - State change: SUSPENDED
org.apache.zo= okeeper.KeeperException$ConnectionLossException: KeeperErrorCode =3D Connec= tionLoss for /
=C2=A0=C2=A0=C2=A0 at org.apache.zookeeper.KeeperExceptio= n.create(KeeperException.java:99)
=C2=A0=C2=A0=C2=A0 at org.apache.zooke= eper.KeeperException.create(KeeperException.java:51)
=C2=A0=C2=A0=C2=A0 = at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753)
=C2=A0=C2= =A0=C2=A0 at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(Ge= tDataBuilderImpl.java:310)
=C2=A0=C2=A0=C2=A0 at org.apache.curator.fram= ework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
=C2=A0= =C2=A0=C2=A0 at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:1= 08)
=C2=A0=C2=A0=C2=A0 at org.apache.curator.framework.imps.GetDataBuild= erImpl.pathInForeground(GetDataBuilderImpl.java:295)
=C2=A0=C2=A0=C2=A0 = at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuil= derImpl.java:287)
=C2=A0=C2=A0=C2=A0 at org.apache.curator.framework.imp= s.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
=C2=A0=C2=A0=C2= =A0 at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23)

On Mon, Jul 11, 2016 a= t 4:39 PM, Cameron McKenzie <mckenzie.cam@gmail.com> wr= ote:
So, it appears that= the initial connection occurs ok, but then something goes wrong

2016-07-11 15:49= :15,712 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server sessionid= 0x100024f0936005b, likely server has closed socket, closing socket connect= ion and attempting reconnect

Then = a reconnection occurs ok, and then the cycle starts again.

Can you see where in the stack the getData() call is hanging?=

On Tue, Jul 12, 2016 at 9:35 AM, = Benjamin Jaton <benjamin.jaton@gmail.com> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
The blockUntilConnectedOrTimedOut() does return, we see the line:"Successfully established the connection with ZooKeeper"
which happens right after.

The getData() is where it= hangs.

We see this first:
2016-07-11 15:47:08,420 IN= FO=C2=A0 org.apache.curator.framework.state.ConnectionStateManager:228 - St= ate change: READ_ONLY

and then we see it again 2 minute= s later:
2016-07-11 15:49:17,316 INFO=C2=A0 org.apache.curator.fra= mework.state.ConnectionStateManager:228 - State change: READ_ONLY

and then again 2 minutes after that:
2016-07-11 15:51:2= 6,470 INFO=C2=A0 org.apache.curator.framework.state.ConnectionStateManager:= 228 - State change: READ_ONLY

Then I stopped it.

On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie = <mckenzie.ca= m@gmail.com> wrote:
hey Ben,
From the logs it looks like the connection is establ= ished at the end?

Are you saying that the blockUnt= ilConnectedOrTimedOut() does not return when this occurs?=C2=A0
c= heers


On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton <bjaton@radiantlogic.com> wrote:
ZK is 3.5.1-alpha
Curator is 2.11.0

On = Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton <bjaton@radiantlogic.c= om> wrote:
I have a 3 nodes ZK cluster , readonly mode is enabled.
2 nodes are do= wn, so the third one is in read-only (verified by using the ZK API manually= ).

I'm using this piece of code:
Builder cura= torClientBuilder =3D CuratorFrameworkFactory.builder()
.connectString("QA-E8WIN11:2181,= QA-E8WIN12:2181")
<= /span>.sessionTimeoutMs(45000).connectionTimeoutMs(15000)
.retryPolicy(new RetryNTimes(3, 50= 00)).canBeReadOnly(true);

CuratorFramework client =3D curatorClientBuilder.buil= d();
client.start()= ;
client.getZookeep= erClient().blockUntilConnectedOrTimedOut();
System.out.println("Successfully established = the connection with ZooKeeper");
client.getData().forPath("/");
System.out.println("Done.");

When curator pick the host that is UP first, it goes thro= ugh very quickly. When it picks the host that is down first (QA-E8WIN12), i= t never goes through.

Am I missing something here?=

Below is the log:

2= 016-07-11 15:46:45,733 INFO =C2=A0org.apache.curator.framework.imps.Curator= FrameworkImpl:235 - Starting
2016-07-11 15:46:45,743 INFO =C2=A0o= rg.apache.zookeeper.ZooKeeper:109 - Client environment:zookeeper.version=3D= 3.5.1-alpha-1748895, built on 06/17/2016 22:24 GMT
2016-07-11 15:= 46:45,743 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environmen= t:host.name=3Dw-rli09-be= n
2016-07-11 15:46:45,744 INFO =C2=A0org.apache.zookeeper.ZooKeep= er:109 - Client environment:java.version=3D1.8.0_11
2016-07-11 15= :46:45,744 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environme= nt:java.vendor=3DOracle Corporation
2016-07-11 15:46:45,744 INFO = =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environment:java.home=3D/= usr/local/apps/jdk1.8.0_11/jre
2016-07-11 15:46:45,744 INFO =C2= =A0org.apache.zookeeper.ZooKeeper:109 - Client environment:java.class.path= =3D(..)
2016-07-11 15:46:45,746 INFO =C2=A0org.apache.zookeeper.Z= ooKeeper:109 - Client environment:java.io.tmpdir=3D/tmp
2016-07-1= 1 15:46:45,746 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client envir= onment:java.compiler=3D<NA>
2016-07-11 15:46:45,747 INFO = =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environment:os.name=3DLinux
2016-07-11 15= :46:45,747 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environme= nt:os.arch=3Damd64
2016-07-11 15:46:45,747 INFO =C2=A0org.apache.= zookeeper.ZooKeeper:109 - Client environment:os.version=3D3.8.0-44-generic<= /div>
2016-07-11 15:46:45,747 INFO =C2=A0org.apache.zookeeper.ZooKeeper= :109 - Client environment:us= er.name=3Dbenji
2016-07-11 15:46:45,747 INFO =C2=A0org.apache= .zookeeper.ZooKeeper:109 - Client environment:user.home=3D(..)
20= 16-07-11 15:46:45,747 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Clien= t environment:user.dir=3D(..)
2016-07-11 15:46:45,747 INFO =C2=A0= org.apache.zookeeper.ZooKeeper:109 - Client environment:os.memory.free=3D38= 5MB
2016-07-11 15:46:45,748 INFO =C2=A0org.apache.zookeeper.ZooKe= eper:109 - Client environment:os.memory.max=3D5358MB
2016-07-11 1= 5:46:45,748 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environm= ent:os.memory.total=3D397MB
2016-07-11 15:46:45,750 INFO =C2=A0or= g.apache.zookeeper.ZooKeeper:716 - Initiating client connection, connectStr= ing=3DQA-E8WIN11:2181,QA-E8WIN12:2181 sessionTimeout=3D45000 watcher=3Dorg.= apache.curator.ConnectionState@3e96bacf
2016-07-11 15:46:45,769 I= NFO =C2=A0org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection = to server QA-E8WIN12/= 10.11.8.232:2181. Will not attempt to authenticate using SASL (unknown = error)
Successfully established the connection with ZooKeeper
2016-07-11 15:47:08,300 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1= 251 - Client session timed out, have not heard from server in 22534ms for s= essionid 0x0, closing socket connection and attempting reconnect
= 2016-07-11 15:47:08,403 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1138 - O= pening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenti= cate using SASL (unknown error)
2016-07-11 15:47:08,406 INFO =C2= =A0org.apache.zookeeper.ClientCnxn:980 - Socket connection established, ini= tiating session, client: /10.11.12.4:47004, server: QA-E8WIN11/10.11.8.236:2181
2016-07-11 15:47:08= ,413 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1400 - Session establishmen= t complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =3D 0x100024f0936005b, negotiat= ed timeout =3D 45000 (READ-ONLY mode)
2016-07-11 15:47:08,415 INF= O =C2=A0org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/<= a href=3D"http://10.11.8.232:2181" target=3D"_blank">10.11.8.232:2181 f= or being r/w. Timeout 200
2016-07-11 15:47:08,420 INFO =C2=A0org.= apache.curator.framework.state.ConnectionStateManager:228 - State change: R= EAD_ONLY
2016-07-11 15:49:15,712 INFO =C2=A0org.apache.zookeeper.= ClientCnxn:1253 - Unable to read additional data from server sessionid 0x10= 0024f0936005b, likely server has closed socket, closing socket connection a= nd attempting reconnect
2016-07-11 15:49:15,815 INFO =C2=A0org.ap= ache.curator.framework.state.ConnectionStateManager:228 - State change: SUS= PENDED
2016-07-11 15:49:17,313 INFO =C2=A0org.apache.zookeeper.Cl= ientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not at= tempt to authenticate using SASL (unknown error)
2016-07-11 15:49= :17,314 INFO =C2=A0org.apache.zookeeper.ClientCnxn:980 - Socket connection = established, initiating session, client: /10.11.12.4:47295, server: QA-E8WIN11/10.11.8.236:2181
201= 6-07-11 15:49:17,316 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1400 - Sess= ion establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =3D 0x100024f093= 6005f, negotiated timeout =3D 45000 (READ-ONLY mode)
2016-07-11 1= 5:49:17,316 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1297 - Checking serv= er QA-E8WIN12/10.11.8= .232:2181 for being r/w. Timeout 400
2016-07-11 15:49:17,316 = INFO =C2=A0org.apache.curator.framework.state.ConnectionStateManager:228 - = State change: READ_ONLY
2016-07-11 15:51:24,607 INFO =C2=A0org.ap= ache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server= sessionid 0x100024f0936005f, likely server has closed socket, closing sock= et connection and attempting reconnect
2016-07-11 15:51:24,713 IN= FO =C2=A0org.apache.curator.framework.state.ConnectionStateManager:228 - St= ate change: SUSPENDED
2016-07-11 15:51:26,467 INFO =C2=A0org.apac= he.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN= 11/10.11.8.236:2181. Will not attempt to authenticate using SASL (unknown error)
<= div>2016-07-11 15:51:26,470 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1297= - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 800
2016-07-= 11 15:51:26,470 INFO =C2=A0org.apache.curator.framework.state.ConnectionSta= teManager:228 - State change: READ_ONLY
(...)
Thanks
Ben






--001a11403d04eeb6b1053764ae3a--