Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 89DC7200B58 for ; Wed, 13 Jul 2016 01:12:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 88639160A75; Tue, 12 Jul 2016 23:12:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8A324160A56 for ; Wed, 13 Jul 2016 01:12:05 +0200 (CEST) Received: (qmail 93287 invoked by uid 500); 12 Jul 2016 23:12:04 -0000 Mailing-List: contact user-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@curator.apache.org Delivered-To: mailing list user@curator.apache.org Received: (qmail 93277 invoked by uid 99); 12 Jul 2016 23:12:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jul 2016 23:12:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 1ACA61809EB for ; Tue, 12 Jul 2016 23:12:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.18 X-Spam-Level: ** X-Spam-Status: No, score=2.18 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id POo-PYixNj3s for ; Tue, 12 Jul 2016 23:11:57 +0000 (UTC) Received: from mail-lf0-f53.google.com (mail-lf0-f53.google.com [209.85.215.53]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 76D6960D10 for ; Tue, 12 Jul 2016 23:11:56 +0000 (UTC) Received: by mail-lf0-f53.google.com with SMTP id b199so25567688lfe.0 for ; Tue, 12 Jul 2016 16:11:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=Gao7YrldqOp8/N+fqcvEB3BA5pK6uLIbsNKiZqHoXiE=; b=nH4PC22mNC6pY2RVp94jqHxJV/AIlnZ0YMH7KluaJSJljC57TgxMBSccq5OiFeyuZe A6ng0pHxE0Pkz2oHLxsUlws2kp23oubYFzwPMWoVWyJtcN3PS0ZY0ahFAT/dzVfni2xU bJKvXJxpq8qQW9codz6AX2/quAek5RuhYG8oN8IPWRtZalp7myaJdgyvWvag7PdKEOTI L/fVGn/5IvoK8OorIbc07gTQyhp4+A2RMAKBfRbLe7twrroyato9fq0TUOd/uoIC4v7M JWh8xmpDJ6LIIx81UrAmJwQroF+oSLlSswmgJxImw8xoSMQaZsU2ug6UpvKh+ayZ3oy3 DQgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=Gao7YrldqOp8/N+fqcvEB3BA5pK6uLIbsNKiZqHoXiE=; b=HkrxrHnZaOH58v01RIo/biBwftepF3XQpCElrBcXjG6Q9xS7vSrFtI4ofx2Tnqw6ln Rsl95QozEF9cnlnG+lyOwupkNzmuz2+IoAptBllnsl/cE7wFyNp9G67Vo458vKJuki3P CmPlUemOZumHcxxABxQXVJXaxAAS/G9t4c6X6e6F3u27zTvlzA5DFQ71REcR2tbeX/x+ bFvN+77wu+eHi3/swzD99YnBv6NkDceqm5bo3MWHL2b2KjE1+u4uud0VCQS3BaEz3p0+ lKYRSd0fBCauKj4FtcBTSV2v5OORGnH1PU2V9wjJogVcYOyJC3Ce/xOmS+iwqGq1x1/x A87g== X-Gm-Message-State: ALyK8tJClsE7FV25IbaeH3Dw9Gx1APc5+FaPhvNkeBMB241ir9Uv065fMisxRJY0khHm+39M/t4UaBK/8XBbnA== X-Received: by 10.25.89.8 with SMTP id n8mr2476586lfb.225.1468365115487; Tue, 12 Jul 2016 16:11:55 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.78.2 with HTTP; Tue, 12 Jul 2016 16:11:54 -0700 (PDT) In-Reply-To: References: From: Benjamin Jaton Date: Tue, 12 Jul 2016 16:11:54 -0700 Message-ID: Subject: Re: Curator never goes through when ZK is read-only To: user@curator.apache.org Content-Type: multipart/alternative; boundary=001a114210b41bfc3105377865de archived-at: Tue, 12 Jul 2016 23:12:07 -0000 --001a114210b41bfc3105377865de Content-Type: text/plain; charset=UTF-8 Cameron, you seem to be right, the simple zookeeper client also fails sometimes. I will bring it up to the ZK team, thank you! On Tue, Jul 12, 2016 at 3:09 PM, Cameron McKenzie wrote: > I'm at a bit of a loss then. Looking at the stack trace, it's in the ZK > code (not the Curator code) that it appears to be blocking. Maybe try and > create a test that removes Curator and just uses the ZK client directly and > see if you still have the same problem? > > On Wed, Jul 13, 2016 at 7:54 AM, Benjamin Jaton > wrote: > >> Yes, I mean the ping goes through properly >> >> On Tue, Jul 12, 2016 at 2:50 PM, Cameron McKenzie > > wrote: >> >>> Are the 'dead' hosts reachable? I seem to recall having problems with >>> the ZK client blocking for a long time if you had a host configured that >>> was resolvable but not reachable. >>> >>> On Wed, Jul 13, 2016 at 4:30 AM, Benjamin Jaton < >>> benjamin.jaton@gmail.com> wrote: >>> >>>> More to the story: >>>> >>>> While it consistently works on the same machine than the sole running >>>> ZK node when I specify only 2 of the 3 servers in the connection string, >>>> it doesn't work when I specify 3 hosts: >>>> >>>> >>>> 2016-07-12 11:23:46 INFO o.a.z.ZooKeeper:716 - Initiating client >>>> connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181,QA-E8WIN13:2181 >>>> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@c86b9e3 >>>> 2016-07-12 11:23:46 INFO o.a.z.ClientCnxn:1138 - Opening socket >>>> connection to server QA-E8WIN13/10.11.8.230:2181. Will not attempt to >>>> authenticate using SASL (unknown error) >>>> 2016-07-12 11:24:01 INFO o.a.z.ClientCnxn:1251 - Client session timed >>>> out, have not heard from server in 15018ms for sessionid 0x0, closing >>>> socket connection and attempting reconnect >>>> Successfully established the connection with ZooKeeper >>>> 2016-07-12 11:24:01 INFO o.a.z.ClientCnxn:1138 - Opening socket >>>> connection to server QA-E8WIN12/10.11.8.232:2181. Will not attempt to >>>> authenticate using SASL (unknown error) >>>> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1251 - Client session timed >>>> out, have not heard from server in 15000ms for sessionid 0x0, closing >>>> socket connection and attempting reconnect >>>> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1138 - Opening socket >>>> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to >>>> authenticate using SASL (unknown error) >>>> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:980 - Socket connection >>>> established, initiating session, client: /10.11.8.236:50423, server: >>>> QA-E8WIN11/10.11.8.236:2181 >>>> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1400 - Session establishment >>>> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = >>>> 0x10003c8af64001b, negotiated timeout = 45000 (READ-ONLY mode) >>>> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1297 - Checking server >>>> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 200 >>>> 2016-07-12 11:24:16 INFO o.a.c.f.s.ConnectionStateManager:228 - State >>>> change: READ_ONLY >>>> 2016-07-12 11:24:37 INFO o.a.z.ClientCnxn:1297 - Checking server >>>> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400 >>>> 2016-07-12 11:24:58 INFO o.a.z.ClientCnxn:1251 - Client session timed >>>> out, have not heard from server in 42023ms for sessionid 0x10003c8af64001b, >>>> closing socket connection and attempting reconnect >>>> 2016-07-12 11:24:58 INFO o.a.c.f.s.ConnectionStateManager:228 - State >>>> change: SUSPENDED >>>> 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1138 - Opening socket >>>> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to >>>> authenticate using SASL (unknown error) >>>> 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:980 - Socket connection >>>> established, initiating session, client: /10.11.8.236:50428, server: >>>> QA-E8WIN11/10.11.8.236:2181 >>>> 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1400 - Session establishment >>>> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = >>>> 0x10003c8af64001c, negotiated timeout = 45000 (READ-ONLY mode) >>>> 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1297 - Checking server >>>> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 800 >>>> 2016-07-12 11:25:00 INFO o.a.c.f.s.ConnectionStateManager:228 - State >>>> change: READ_ONLY >>>> 2016-07-12 11:25:21 INFO o.a.z.ClientCnxn:1297 - Checking server >>>> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 1600 >>>> 2016-07-12 11:25:42 INFO o.a.z.ClientCnxn:1251 - Client session timed >>>> out, have not heard from server in 42011ms for sessionid 0x10003c8af64001c, >>>> closing socket connection and attempting reconnect >>>> 2016-07-12 11:25:42 INFO o.a.c.f.s.ConnectionStateManager:228 - State >>>> change: SUSPENDED >>>> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1138 - Opening socket >>>> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to >>>> authenticate using SASL (unknown error) >>>> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:980 - Socket connection >>>> established, initiating session, client: /10.11.8.236:50435, server: >>>> QA-E8WIN11/10.11.8.236:2181 >>>> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1400 - Session establishment >>>> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = >>>> 0x10003c8af64001d, negotiated timeout = 45000 (READ-ONLY mode) >>>> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1297 - Checking server >>>> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 3200 >>>> 2016-07-12 11:25:44 INFO o.a.c.f.s.ConnectionStateManager:228 - State >>>> change: READ_ONLY >>>> 2016-07-12 11:26:05 INFO o.a.z.ClientCnxn:1297 - Checking server >>>> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 6400 >>>> 2016-07-12 11:26:26 INFO o.a.z.ClientCnxn:1251 - Client session timed >>>> out, have not heard from server in 42003ms for sessionid 0x10003c8af64001d, >>>> closing socket connection and attempting reconnect >>>> 2016-07-12 11:26:26 INFO o.a.c.f.s.ConnectionStateManager:228 - State >>>> change: SUSPENDED >>>> Exception in thread "main" >>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>>> KeeperErrorCode = ConnectionLoss for / >>>> at >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >>>> at >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >>>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753) >>>> at >>>> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) >>>> at >>>> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) >>>> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) >>>> at >>>> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295) >>>> at >>>> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) >>>> at >>>> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) >>>> at test.Test.main(Test.java:25) >>>> >>>> >>>> That seems to suggest that adding 1 host to check somehow messes up >>>> something in the timeouts. >>>> I am suspicious of those: >>>> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1297 - Checking server >>>> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 3200 >>>> They seem to be "blocking" the existing connection we have to the >>>> healthy ZK server, which goes stale. Is that possible? >>>> >>>> >>>> On Tue, Jul 12, 2016 at 11:17 AM, Benjamin Jaton < >>>> bjaton@radiantlogic.com> wrote: >>>> >>>>> I did not use the zkCli tool but the ZK java API directly. And other >>>>> tools like ZooInspector (contrib). They were run from a remote machine. >>>>> I ran it on the same machine and the one read only server is running >>>>> properly, I can browse it using those methods. >>>>> >>>>> Trying to make sense of those client-server interactions is difficult >>>>> but here is what I got: >>>>> >>>>> Client: >>>>> 2016-07-12 10:55:01,508 INFO org.apache.zookeeper.ClientCnxn:1138 - *Opening >>>>> socket connection to server QA-E8WIN11*/10.11.8.236:2181. Will not >>>>> attempt to authenticate using SASL (unknown error) >>>>> 2016-07-12 10:55:01,512 INFO org.apache.zookeeper.ClientCnxn:980 - *Socket >>>>> connection established*, initiating session, client: /10.11.12.4:60586, >>>>> server: QA-E8WIN11/10.11.8.236:2181 >>>>> 2016-07-12 10:55:01,521 INFO org.apache.zookeeper.ClientCnxn:1400 - *Session >>>>> establishment complete on server QA-E8WIN11*/10.11.8.236:2181, *sessionid >>>>> = 0x10003c8af640007*, negotiated timeout = 45000 (READ-ONLY mode) >>>>> 2016-07-12 10:55:01,523 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200 >>>>> 2016-07-12 10:55:01,528 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - *State >>>>> change: READ_ONLY* >>>>> 2016-07-12 10:57:08,800 INFO org.apache.zookeeper.ClientCnxn:1253 - *Unable >>>>> to read additional data from server sessionid 0x10003c8af640007, likely >>>>> server has closed socket, closing socket connection and attempting >>>>> reconnect* >>>>> 2016-07-12 10:57:08,903 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - *State >>>>> change: SUSPENDED* >>>>> >>>>> >>>>> Server: >>>>> 2016-07-12T10:55:56,468 [myid:] - INFO [ProcessThread(sid:1 >>>>> cport:-1)::ZooKeeperServer@678] - *Established session >>>>> 0x10003c8af640007 with negotiated timeout 45000* for client / >>>>> 10.11.12.4:60586 >>>>> (...) >>>>> 2016-07-12T10:56:41,756 [myid:] - INFO >>>>> [SessionTracker:ZooKeeperServer@384] - *Expiring session >>>>> 0x10003c8af640007, timeout of 45000ms exceeded* >>>>> >>>>> >>>>> Note: Don't worry about the time being about 1 min off between the >>>>> machines. >>>>> >>>>> What I don't understand is why we are losing the read-only connection >>>>> we successfully established. ZK ro server says it timed out? >>>>> >>>>> Now interestingly, when I run it on the same machine it works and >>>>> completes pretty quickly: >>>>> >>>>> >>>>> 2016-07-12 10:43:49 INFO o.a.z.ZooKeeper:716 - Initiating client >>>>> connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 >>>>> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@e4487af >>>>> 2016-07-12 10:43:49 INFO o.a.z.ClientCnxn:1138 - Opening socket >>>>> connection to server QA-E8WIN12/10.11.8.232:2181. Will not attempt to >>>>> authenticate using SASL (unknown error) >>>>> Successfully established the connection with ZooKeeper >>>>> 2016-07-12 10:44:10 WARN o.a.z.ClientCnxn:1257 - Session 0x0 for >>>>> server null, unexpected error, closing socket connection and attempting >>>>> reconnect >>>>> java.net.ConnectException: Connection timed out: no further information >>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>> ~[?:1.8.0_91] >>>>> at >>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) >>>>> ~[?:1.8.0_91] >>>>> at >>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) >>>>> ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] >>>>> at >>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236) >>>>> [zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] >>>>> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1138 - Opening socket >>>>> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to >>>>> authenticate using SASL (unknown error) >>>>> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:980 - Socket connection >>>>> established, initiating session, client: /10.11.8.236:50147, server: >>>>> QA-E8WIN11/10.11.8.236:2181 >>>>> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1400 - Session >>>>> establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>> sessionid = 0x10003b315b30012, negotiated timeout = 45000 (READ-ONLY mode) >>>>> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1297 - Checking server >>>>> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200 >>>>> 2016-07-12 10:44:11 INFO o.a.c.f.s.ConnectionStateManager:228 - State >>>>> change: READ_ONLY >>>>> 2016-07-12 10:44:32 INFO o.a.z.ClientCnxn:1297 - Checking server >>>>> QA-E8WIN11/10.11.8.236:2181 for being r/w. Timeout 400 >>>>> Done. >>>>> >>>>> >>>>> On Mon, Jul 11, 2016 at 4:42 PM, Cameron McKenzie < >>>>> mckenzie.cam@gmail.com> wrote: >>>>> >>>>>> It looks like a connection issue based on the logs. When you >>>>>> confirmed that the connection was ok via the zkCli, was that running on the >>>>>> same host as the read only server? Can you run it from the same host that >>>>>> you're running the Java client from? >>>>>> >>>>>> On Tue, Jul 12, 2016 at 9:40 AM, Benjamin Jaton < >>>>>> benjamin.jaton@gmail.com> wrote: >>>>>> >>>>>>> Well I was trying to let it run and it eventually fails, with a >>>>>>> stack: >>>>>>> >>>>>>> Successfully established the connection with ZooKeeper >>>>>>> 2016-07-11 16:28:21,943 INFO org.apache.zookeeper.ClientCnxn:1251 - >>>>>>> Client session timed out, have not heard from server in 22519ms for >>>>>>> sessionid 0x0, closing socket connection and attempting reconnect >>>>>>> 2016-07-11 16:28:22,046 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. >>>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>>> 2016-07-11 16:28:22,050 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>>>> Socket connection established, initiating session, client: / >>>>>>> 10.11.12.4:54702, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>> 2016-07-11 16:28:22,058 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>>>> sessionid = 0x100024f09360080, negotiated timeout = 45000 (READ-ONLY mode) >>>>>>> 2016-07-11 16:28:22,061 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout >>>>>>> 200 >>>>>>> 2016-07-11 16:28:22,066 INFO >>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>> change: READ_ONLY >>>>>>> 2016-07-11 16:30:29,312 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>>>> Unable to read additional data from server sessionid 0x100024f09360080, >>>>>>> likely server has closed socket, closing socket connection and attempting >>>>>>> reconnect >>>>>>> 2016-07-11 16:30:29,416 INFO >>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>> change: SUSPENDED >>>>>>> 2016-07-11 16:30:31,031 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. >>>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>>> 2016-07-11 16:30:31,032 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>>>> Socket connection established, initiating session, client: / >>>>>>> 10.11.12.4:55156, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>> 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>>>> sessionid = 0x100024f09360083, negotiated timeout = 45000 (READ-ONLY mode) >>>>>>> 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout >>>>>>> 400 >>>>>>> 2016-07-11 16:30:31,035 INFO >>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>> change: READ_ONLY >>>>>>> 2016-07-11 16:32:38,335 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>>>> Unable to read additional data from server sessionid 0x100024f09360083, >>>>>>> likely server has closed socket, closing socket connection and attempting >>>>>>> reconnect >>>>>>> 2016-07-11 16:32:38,441 INFO >>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>> change: SUSPENDED >>>>>>> 2016-07-11 16:32:40,425 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. >>>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>>> 2016-07-11 16:32:40,426 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>>>> Socket connection established, initiating session, client: / >>>>>>> 10.11.12.4:55637, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>> 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>>>> sessionid = 0x100024f09360087, negotiated timeout = 45000 (READ-ONLY mode) >>>>>>> 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout >>>>>>> 800 >>>>>>> 2016-07-11 16:32:40,428 INFO >>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>> change: READ_ONLY >>>>>>> 2016-07-11 16:34:47,615 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>>>> Unable to read additional data from server sessionid 0x100024f09360087, >>>>>>> likely server has closed socket, closing socket connection and attempting >>>>>>> reconnect >>>>>>> 2016-07-11 16:34:47,716 INFO >>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>> change: SUSPENDED >>>>>>> 2016-07-11 16:34:49,557 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. >>>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>>> 2016-07-11 16:34:49,558 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>>>> Socket connection established, initiating session, client: / >>>>>>> 10.11.12.4:56094, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>> 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>>>> sessionid = 0x100024f09360089, negotiated timeout = 45000 (READ-ONLY mode) >>>>>>> 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout >>>>>>> 1600 >>>>>>> 2016-07-11 16:34:49,560 INFO >>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>> change: READ_ONLY >>>>>>> 2016-07-11 16:36:56,895 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>>>> Unable to read additional data from server sessionid 0x100024f09360089, >>>>>>> likely server has closed socket, closing socket connection and attempting >>>>>>> reconnect >>>>>>> Exception in thread "main" 2016-07-11 16:36:56,996 INFO >>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>> change: SUSPENDED >>>>>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>>>>>> KeeperErrorCode = ConnectionLoss for / >>>>>>> at >>>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >>>>>>> at >>>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >>>>>>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753) >>>>>>> at >>>>>>> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) >>>>>>> at >>>>>>> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) >>>>>>> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) >>>>>>> at >>>>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295) >>>>>>> at >>>>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) >>>>>>> at >>>>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) >>>>>>> at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23) >>>>>>> >>>>>>> On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie < >>>>>>> mckenzie.cam@gmail.com> wrote: >>>>>>> >>>>>>>> So, it appears that the initial connection occurs ok, but then >>>>>>>> something goes wrong >>>>>>>> >>>>>>>> 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 >>>>>>>> - Unable to read additional data from server sessionid 0x100024f0936005b, >>>>>>>> likely server has closed socket, closing socket connection and attempting >>>>>>>> reconnect >>>>>>>> >>>>>>>> Then a reconnection occurs ok, and then the cycle starts again. >>>>>>>> >>>>>>>> Can you see where in the stack the getData() call is hanging? >>>>>>>> >>>>>>>> On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton < >>>>>>>> benjamin.jaton@gmail.com> wrote: >>>>>>>> >>>>>>>>> The blockUntilConnectedOrTimedOut() does return, we see the line: >>>>>>>>> "Successfully established the connection with ZooKeeper" >>>>>>>>> which happens right after. >>>>>>>>> >>>>>>>>> The getData() is where it hangs. >>>>>>>>> >>>>>>>>> We see this first: >>>>>>>>> 2016-07-11 15:47:08,420 INFO >>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>>>> change: READ_ONLY >>>>>>>>> >>>>>>>>> and then we see it again 2 minutes later: >>>>>>>>> 2016-07-11 15:49:17,316 INFO >>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>>>> change: READ_ONLY >>>>>>>>> >>>>>>>>> and then again 2 minutes after that: >>>>>>>>> 2016-07-11 15:51:26,470 INFO >>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>>>> change: READ_ONLY >>>>>>>>> >>>>>>>>> Then I stopped it. >>>>>>>>> >>>>>>>>> On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie < >>>>>>>>> mckenzie.cam@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> hey Ben, >>>>>>>>>> From the logs it looks like the connection is established at the >>>>>>>>>> end? >>>>>>>>>> >>>>>>>>>> Are you saying that the blockUntilConnectedOrTimedOut() does not >>>>>>>>>> return when this occurs? >>>>>>>>>> cheers >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton < >>>>>>>>>> bjaton@radiantlogic.com> wrote: >>>>>>>>>> >>>>>>>>>>> ZK is 3.5.1-alpha >>>>>>>>>>> Curator is 2.11.0 >>>>>>>>>>> >>>>>>>>>>> On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton < >>>>>>>>>>> bjaton@radiantlogic.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> I have a 3 nodes ZK cluster , readonly mode is enabled. >>>>>>>>>>>> 2 nodes are down, so the third one is in read-only (verified by >>>>>>>>>>>> using the ZK API manually). >>>>>>>>>>>> >>>>>>>>>>>> I'm using this piece of code: >>>>>>>>>>>> >>>>>>>>>>>> Builder curatorClientBuilder = CuratorFrameworkFactory.builder() >>>>>>>>>>>> .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181") >>>>>>>>>>>> .sessionTimeoutMs(45000).connectionTimeoutMs(15000) >>>>>>>>>>>> .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true); >>>>>>>>>>>> >>>>>>>>>>>> CuratorFramework client = curatorClientBuilder.build(); >>>>>>>>>>>> client.start(); >>>>>>>>>>>> client.getZookeeperClient().blockUntilConnectedOrTimedOut(); >>>>>>>>>>>> System.out.println("Successfully established the connection >>>>>>>>>>>> with ZooKeeper"); >>>>>>>>>>>> client.getData().forPath("/"); >>>>>>>>>>>> System.out.println("Done."); >>>>>>>>>>>> >>>>>>>>>>>> When curator pick the host that is UP first, it goes through >>>>>>>>>>>> very quickly. When it picks the host that is down first (QA-E8WIN12), it >>>>>>>>>>>> never goes through. >>>>>>>>>>>> >>>>>>>>>>>> Am I missing something here? >>>>>>>>>>>> >>>>>>>>>>>> Below is the log: >>>>>>>>>>>> >>>>>>>>>>>> 2016-07-11 15:46:45,733 INFO >>>>>>>>>>>> org.apache.curator.framework.imps.CuratorFrameworkImpl:235 - Starting >>>>>>>>>>>> 2016-07-11 15:46:45,743 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client >>>>>>>>>>>> environment:zookeeper.version=3.5.1-alpha-1748895, built on 06/17/2016 >>>>>>>>>>>> 22:24 GMT >>>>>>>>>>>> 2016-07-11 15:46:45,743 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client environment: >>>>>>>>>>>> host.name=w-rli09-ben >>>>>>>>>>>> 2016-07-11 15:46:45,744 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client >>>>>>>>>>>> environment:java.version=1.8.0_11 >>>>>>>>>>>> 2016-07-11 15:46:45,744 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client environment:java.vendor=Oracle >>>>>>>>>>>> Corporation >>>>>>>>>>>> 2016-07-11 15:46:45,744 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client >>>>>>>>>>>> environment:java.home=/usr/local/apps/jdk1.8.0_11/jre >>>>>>>>>>>> 2016-07-11 15:46:45,744 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client >>>>>>>>>>>> environment:java.class.path=(..) >>>>>>>>>>>> 2016-07-11 15:46:45,746 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client environment:java.io.tmpdir=/tmp >>>>>>>>>>>> 2016-07-11 15:46:45,746 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client environment:java.compiler= >>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client environment: >>>>>>>>>>>> os.name=Linux >>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client environment:os.arch=amd64 >>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client >>>>>>>>>>>> environment:os.version=3.8.0-44-generic >>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client environment: >>>>>>>>>>>> user.name=benji >>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client environment:user.home=(..) >>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client environment:user.dir=(..) >>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client >>>>>>>>>>>> environment:os.memory.free=385MB >>>>>>>>>>>> 2016-07-11 15:46:45,748 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client >>>>>>>>>>>> environment:os.memory.max=5358MB >>>>>>>>>>>> 2016-07-11 15:46:45,748 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 - Client >>>>>>>>>>>> environment:os.memory.total=397MB >>>>>>>>>>>> 2016-07-11 15:46:45,750 INFO >>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:716 - Initiating client connection, >>>>>>>>>>>> connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 sessionTimeout=45000 >>>>>>>>>>>> watcher=org.apache.curator.ConnectionState@3e96bacf >>>>>>>>>>>> 2016-07-11 15:46:45,769 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server >>>>>>>>>>>> QA-E8WIN12/10.11.8.232:2181. Will not attempt to authenticate >>>>>>>>>>>> using SASL (unknown error) >>>>>>>>>>>> Successfully established the connection with ZooKeeper >>>>>>>>>>>> 2016-07-11 15:47:08,300 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1251 - Client session timed out, have not >>>>>>>>>>>> heard from server in 22534ms for sessionid 0x0, closing socket connection >>>>>>>>>>>> and attempting reconnect >>>>>>>>>>>> 2016-07-11 15:47:08,403 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server >>>>>>>>>>>> QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate >>>>>>>>>>>> using SASL (unknown error) >>>>>>>>>>>> 2016-07-11 15:47:08,406 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:980 - Socket connection established, >>>>>>>>>>>> initiating session, client: /10.11.12.4:47004, server: >>>>>>>>>>>> QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>>>> 2016-07-11 15:47:08,413 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on >>>>>>>>>>>> server QA-E8WIN11/10.11.8.236:2181, sessionid = >>>>>>>>>>>> 0x100024f0936005b, negotiated timeout = 45000 (READ-ONLY mode) >>>>>>>>>>>> 2016-07-11 15:47:08,415 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/ >>>>>>>>>>>> 10.11.8.232:2181 for being r/w. Timeout 200 >>>>>>>>>>>> 2016-07-11 15:47:08,420 INFO >>>>>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>>>>>>> change: READ_ONLY >>>>>>>>>>>> 2016-07-11 15:49:15,712 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from >>>>>>>>>>>> server sessionid 0x100024f0936005b, likely server has closed socket, >>>>>>>>>>>> closing socket connection and attempting reconnect >>>>>>>>>>>> 2016-07-11 15:49:15,815 INFO >>>>>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>>>>>>> change: SUSPENDED >>>>>>>>>>>> 2016-07-11 15:49:17,313 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server >>>>>>>>>>>> QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate >>>>>>>>>>>> using SASL (unknown error) >>>>>>>>>>>> 2016-07-11 15:49:17,314 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:980 - Socket connection established, >>>>>>>>>>>> initiating session, client: /10.11.12.4:47295, server: >>>>>>>>>>>> QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>>>> 2016-07-11 15:49:17,316 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on >>>>>>>>>>>> server QA-E8WIN11/10.11.8.236:2181, sessionid = >>>>>>>>>>>> 0x100024f0936005f, negotiated timeout = 45000 (READ-ONLY mode) >>>>>>>>>>>> 2016-07-11 15:49:17,316 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/ >>>>>>>>>>>> 10.11.8.232:2181 for being r/w. Timeout 400 >>>>>>>>>>>> 2016-07-11 15:49:17,316 INFO >>>>>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>>>>>>> change: READ_ONLY >>>>>>>>>>>> 2016-07-11 15:51:24,607 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from >>>>>>>>>>>> server sessionid 0x100024f0936005f, likely server has closed socket, >>>>>>>>>>>> closing socket connection and attempting reconnect >>>>>>>>>>>> 2016-07-11 15:51:24,713 INFO >>>>>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>>>>>>> change: SUSPENDED >>>>>>>>>>>> 2016-07-11 15:51:26,467 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server >>>>>>>>>>>> QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate >>>>>>>>>>>> using SASL (unknown error) >>>>>>>>>>>> 2016-07-11 15:51:26,468 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:980 - Socket connection established, >>>>>>>>>>>> initiating session, client: /10.11.12.4:47576, server: >>>>>>>>>>>> QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>>>> 2016-07-11 15:51:26,470 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on >>>>>>>>>>>> server QA-E8WIN11/10.11.8.236:2181, sessionid = >>>>>>>>>>>> 0x100024f09360061, negotiated timeout = 45000 (READ-ONLY mode) >>>>>>>>>>>> 2016-07-11 15:51:26,470 INFO >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/ >>>>>>>>>>>> 10.11.8.232:2181 for being r/w. Timeout 800 >>>>>>>>>>>> 2016-07-11 15:51:26,470 INFO >>>>>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>>>>>>> change: READ_ONLY >>>>>>>>>>>> (...) >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> Ben >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > --001a114210b41bfc3105377865de Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Cameron, you seem to be right, the simple zookeeper c= lient also fails sometimes.
I will bring it up to the ZK team, tha= nk you!

= On Tue, Jul 12, 2016 at 3:09 PM, Cameron McKenzie <mckenzie.cam@gmail= .com> wrote:
I'm at a bit of a loss then. Looking at the stack trace, it's in= the ZK code (not the Curator code) that it appears to be blocking. Maybe t= ry and create a test that removes Curator and just uses the ZK client direc= tly and see if you still have the same problem?
=

On Wed, Jul 13, 2016 at 7:54 AM, Benjamin Jaton <benjamin.jaton@gm= ail.com> wrote:
Yes, I mean the ping goes through properly

On Tue, Jul 12, 2016 at = 2:50 PM, Cameron McKenzie <mckenzie.cam@gmail.com> wrot= e:
Are the 'dead'= ; hosts reachable? I seem to recall having problems with the ZK client bloc= king for a long time if you had a host configured that was resolvable but n= ot reachable.

On Wed, Jul 13, 2016 at 4:30 AM, Benjamin Jaton <benjam= in.jaton@gmail.com> wrote:
=
More to the story:

While= it consistently works on the same machine than the sole running ZK node wh= en I specify only 2 of the 3 servers in the connection string,
it = doesn't work when I specify 3 hosts:


2016-07-12 11:23:46 INF= O=C2=A0 o.a.z.ZooKeeper:716 - Initiating client connection, connectString= =3DQA-E8WIN11:2181,QA-E8WIN12:2181,QA-E8WIN13:2181 sessionTimeout=3D45000 w= atcher=3Dorg.apache.curator.ConnectionState@c86b9e3
2016-07-12 11:23:46 = INFO=C2=A0 o.a.z.ClientCnxn:1138 - Opening socket connection to server QA-E= 8WIN13/10.11.8.230:21= 81. Will not attempt to authenticate using SASL (unknown error)
2016= -07-12 11:24:01 INFO=C2=A0 o.a.z.ClientCnxn:1251 - Client session timed out= , have not heard from server in 15018ms for sessionid 0x0, closing socket c= onnection and attempting reconnect
Successfully established the co= nnection with ZooKeeper
2016-07-12 11:24:01 INFO=C2=A0 o.a.z.Clie= ntCnxn:1138 - Opening socket connection to server QA-E8WIN12/10.11.8.232:2181. Will not atte= mpt to authenticate using SASL (unknown error)
2016-07-12 11:24:16 INFO= =C2=A0 o.a.z.ClientCnxn:1251 - Client session timed out, have not heard fro= m server in 15000ms for sessionid 0x0, closing socket connection and attemp= ting reconnect
2016-07-12 11:24:16 INFO=C2=A0 o.a.z.ClientCnxn:1138 - Op= ening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authentic= ate using SASL (unknown error)
2016-07-12 11:24:16 INFO=C2=A0 o.a.z.Clie= ntCnxn:980 - Socket connection established, initiating session, client: /10.11.8.236:50423,= server: QA-E8WIN11/1= 0.11.8.236:2181
2016-07-12 11:24:16 INFO=C2=A0 o.a.z.ClientCnxn:1400= - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =3D 0x100= 03c8af64001b, negotiated timeout =3D 45000 (READ-ONLY mode)
2016-07-12 1= 1:24:16 INFO=C2=A0 o.a.z.ClientCnxn:1297 - Checking server QA-E8WIN13/10.11.8.230:2181 for b= eing r/w. Timeout 200
2016-07-12 11:24:16 INFO=C2=A0 o.a.c.f.s.Connectio= nStateManager:228 - State change: READ_ONLY
2016-07-12 11:24:37 INFO=C2= =A0 o.a.z.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeo= ut 400
2016-07-12 11:24:58 INFO=C2=A0 o.a.z.ClientCnxn:1251 - Client ses= sion timed out, have not heard from server in 42023ms for sessionid 0x10003= c8af64001b, closing socket connection and attempting reconnect
2016-07-1= 2 11:24:58 INFO=C2=A0 o.a.c.f.s.ConnectionStateManager:228 - State change: = SUSPENDED
2016-07-12 11:25:00 INFO=C2=A0 o.a.z.ClientCnxn:1138 - Opening= socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate u= sing SASL (unknown error)
2016-07-12 11:25:00 INFO=C2=A0 o.a.z.ClientCnx= n:980 - Socket connection established, initiating session, client: /10.11.8.236:50428, serv= er: QA-E8WIN11/10.11.= 8.236:2181
2016-07-12 11:25:00 INFO=C2=A0 o.a.z.ClientCnxn:1400 - Se= ssion establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =3D 0x10003c8a= f64001c, negotiated timeout =3D 45000 (READ-ONLY mode)
2016-07-12 11:25:= 00 INFO=C2=A0 o.a.z.ClientCnxn:1297 - Checking server QA-E8WIN13/10.11.8.230:2181 for being = r/w. Timeout 800
2016-07-12 11:25:00 INFO=C2=A0 o.a.c.f.s.ConnectionStat= eManager:228 - State change: READ_ONLY
2016-07-12 11:25:21 INFO=C2=A0 o.= a.z.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 160= 0
2016-07-12 11:25:42 INFO=C2=A0 o.a.z.ClientCnxn:1251 - Client session = timed out, have not heard from server in 42011ms for sessionid 0x10003c8af6= 4001c, closing socket connection and attempting reconnect
2016-07-12 11:= 25:42 INFO=C2=A0 o.a.c.f.s.ConnectionStateManager:228 - State change: SUSPE= NDED
2016-07-12 11:25:44 INFO=C2=A0 o.a.z.ClientCnxn:1138 - Opening sock= et connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate using = SASL (unknown error)
2016-07-12 11:25:44 INFO=C2=A0 o.a.z.ClientCnxn:980= - Socket connection established, initiating session, client: /10.11.8.236:50435, server: Q= A-E8WIN11/10.11.8.236= :2181
2016-07-12 11:25:44 INFO=C2=A0 o.a.z.ClientCnxn:1400 - Session= establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =3D 0x10003c8af6400= 1d, negotiated timeout =3D 45000 (READ-ONLY mode)
2016-07-12 11:25:44 IN= FO=C2=A0 o.a.z.ClientCnxn:1297 - Checking server QA-E8WIN13/10.11.8.230:2181 for being r/w. = Timeout 3200
2016-07-12 11:25:44 INFO=C2=A0 o.a.c.f.s.ConnectionStateMan= ager:228 - State change: READ_ONLY
2016-07-12 11:26:05 INFO=C2=A0 o.a.z.= ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 64002016-07-12 11:26:26 INFO=C2=A0 o.a.z.ClientCnxn:1251 - Client session time= d out, have not heard from server in 42003ms for sessionid 0x10003c8af64001= d, closing socket connection and attempting reconnect
2016-07-12 11:26:2= 6 INFO=C2=A0 o.a.c.f.s.ConnectionStateManager:228 - State change: SUSPENDED=
Exception in thread "main" org.apache.zookeeper.KeeperExcepti= on$ConnectionLossException: KeeperErrorCode =3D ConnectionLoss for /<= br>=C2=A0=C2=A0=C2=A0 at org.apache.zookeeper.KeeperException.create(Keeper= Exception.java:99)
=C2=A0=C2=A0=C2=A0 at org.apache.zookeeper.KeeperExce= ption.create(KeeperException.java:51)
=C2=A0=C2=A0=C2=A0 at org.apache.z= ookeeper.ZooKeeper.getData(ZooKeeper.java:1753)
=C2=A0=C2=A0=C2=A0 at or= g.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImp= l.java:310)
=C2=A0=C2=A0=C2=A0 at org.apache.curator.framework.imps.GetD= ataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
=C2=A0=C2=A0=C2=A0 at= org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
=C2=A0= =C2=A0=C2=A0 at org.apache.curator.framework.imps.GetDataBuilderImpl.pathIn= Foreground(GetDataBuilderImpl.java:295)
=C2=A0=C2=A0=C2=A0 at org.apache= .curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:= 287)
=C2=A0=C2=A0=C2=A0 at org.apache.curator.framework.imps.GetDataBuil= derImpl.forPath(GetDataBuilderImpl.java:34)
=C2=A0=C2=A0=C2=A0 at= test.Test.main(Test.java:25)


That seems to suggest = that adding 1 host to check somehow messes up something in the timeouts.
I am suspicious of those:
2016-07-12 11:25:44 INFO=C2=A0 o.a.= z.ClientCnxn:1297 - Checking server QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 3200<= br>
They seem to be "blocking" the existing connection = we have to the healthy ZK server, which goes stale. Is that possible?


On Tue, Jul 12, 2016 at 11:17 AM, Benjamin Jaton <bjaton@radiantlogic.com> wrote:
I did not use the zkCli tool but the ZK java AP= I directly. And other tools like ZooInspector (contrib). They were run from= a remote machine.
I ran it on the same machine and the one read only s= erver is running properly, I can browse it using those methods.

Trying to make sens= e of those client-server interactions is difficult but here is what I got:<= /div>

Client= :
2016-07-12 10:= 55:01,508 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1138 - Opening sock= et connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate us= ing SASL (unknown error)
2016-07-12 10:55:0= 1,512 INFO =C2=A0org.apache.zookeeper.ClientCnxn:980 - Socket connection= established, initiating session, client: /10.11.12.4:60586, server: QA-E8WIN11/10.11.8.236:2181
2016-07-12 10:55:01,521 INFO =C2=A0org.apache.zook= eeper.ClientCnxn:1400 - Session establishment complete on server QA-E8WI= N11/10.11.8.236:2= 181, sessionid =3D 0x10003c8af640007, negotiated timeout =3D 450= 00 (READ-ONLY mode)
2016-07-12 10:55:01,523= INFO =C2=A0org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN= 12/10.11.8.232:2181 for being r/w. Timeout 200


(...)
2016-07-12T10:56:41,756 [myid:] - INFO =C2=A0[SessionTracker:ZooKe= eperServer@384] - Expiring session 0x10003c8af640007, timeout of 45000ms= exceeded


Note: Don't worry about the time being about 1 min off between t= he machines.

What I don't understand is why we are losing the read-only conne= ction we successfully established. ZK ro server says it timed out?

Now interestin= gly, when I run it on the same machine it works and completes pretty quickl= y:


2016-07-12 10:= 43:49 INFO =C2=A0o.a.z.ZooKeeper:716 - Initiating client connection, connec= tString=3DQA-E8WIN11:2181,QA-E8WIN12:2181 sessionTimeout=3D45000 watcher=3D= org.apache.curator.ConnectionState@e4487af
= 2016-07-12 10:43:49 INFO =C2=A0o.a.z.ClientCnxn:1138 - Opening socket conne= ction to server QA-E8WIN12/10.11.8.232:2181. Will not attempt to authenticate using SASL (un= known error)
Successfully established= the connection with ZooKeeper
2016-= 07-12 10:44:10 WARN =C2=A0o.a.z.ClientCnxn:1257 - Session 0x0 for server nu= ll, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection timed o= ut: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(N= ative Method) ~[?:1.8.0_91]
at sun.nio.ch.SocketChannelImpl.finishConnect= (SocketChannelImpl.java:717) ~[?:1.8.0_91]
= at org.apache.zookeeper.Client= CnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) ~[zookeeper-3.5.0-a= lpha.jar:3.5.2-alpha-SNAPSHOT-1702518]
at org.apache.zookeeper.ClientCnxn= $SendThread.run(ClientCnxn.java:1236) [zookeeper-3.5.0-alpha.jar:3.5.2-alph= a-SNAPSHOT-1702518]
2016-07-12 10:44:11 INF= O =C2=A0o.a.z.ClientCnxn:1138 - Opening socket connection to server QA-E8WI= N11/10.11.8.236:2181<= /a>. Will not attempt to authenticate using SASL (unknown error)
2016-07-12 10:44:11 INFO =C2=A0o.a.= z.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, se= ssionid =3D 0x10003b315b30012, negotiated timeout =3D 45000 (READ-ONLY mode= )
2016-07-12 10:44:11 INFO =C2=A0o.a.z.Clie= ntCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200
2016-07-12 10:44:11 INFO =C2=A0o.a.c.f.s.Connectio= nStateManager:228 - State change: READ_ONLY
2016-07-12 10:44:32 INFO =C2=A0o.a.z.ClientCnxn:1297 - Checking server QA-= E8WIN11/10.11.8.236:2= 181 for being r/w. Timeout 400
Done.


On Mon, Jul 11, 2016 at 4:42 PM, Cam= eron McKenzie <mckenzie.cam@gmail.com> wrote:
It looks like a connection issue based on the l= ogs. When you confirmed that the connection was ok via the zkCli, was that = running on the same host as the read only server? Can you run it from the s= ame host that you're running the Java client from?

On Tue, Jul 12, 2016 a= t 9:40 AM, Benjamin Jaton <benjamin.jaton@gmail.com> = wrote:
Well I was trying to let it run= and it eventually fails, with a stack:

Successfully establish= ed the connection with ZooKeeper
2016-07-11 16:28:21,943 INFO=C2= =A0 org.apache.zookeeper.ClientCnxn:1251 - Client session timed out, have n= ot heard from server in 22519ms for sessionid 0x0, closing socket connectio= n and attempting reconnect
2016-07-11 16:28:22,046 INFO=C2=A0 org.apache= .zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11= /10.11.8.236:2181= . Will not attempt to authenticate using SASL (unknown error)
2016-07-11= 16:28:22,050 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:980 - Socket conne= ction established, initiating session, client: /10.11.12.4:54702, server: QA-E8WIN11/10.11.8.236:2181
2016= -07-11 16:28:22,058 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1400 - Sessi= on establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =3D 0x100024f0936= 0080, negotiated timeout =3D 45000 (READ-ONLY mode)
2016-07-11 16:28:22,= 061 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8= WIN12/10.11.8.232:218= 1 for being r/w. Timeout 200
2016-07-11 16:28:22,066 INFO=C2=A0 org.= apache.curator.framework.state.ConnectionStateManager:228 - State change: R= EAD_ONLY
2016-07-11 16:30:29,312 INFO=C2=A0 org.apache.zookeeper.ClientC= nxn:1253 - Unable to read additional data from server sessionid 0x100024f09= 360080, likely server has closed socket, closing socket connection and atte= mpting reconnect
2016-07-11 16:30:29,416 INFO=C2=A0 org.apache.curator.f= ramework.state.ConnectionStateManager:228 - State change: SUSPENDED
2016= -07-11 16:30:31,031 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1138 - Openi= ng socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate= using SASL (unknown error)
2016-07-11 16:30:31,032 INFO=C2=A0 org.apach= e.zookeeper.ClientCnxn:980 - Socket connection established, initiating sess= ion, client: /10.11.1= 2.4:55156, server: QA-E8WIN11/10.11.8.236:2181
2016-07-11 16:30:31,034 INFO=C2=A0 org= .apache.zookeeper.ClientCnxn:1400 - Session establishment complete on serve= r QA-E8WIN11/10.11.8.= 236:2181, sessionid =3D 0x100024f09360083, negotiated timeout =3D 45000= (READ-ONLY mode)
2016-07-11 16:30:31,034 INFO=C2=A0 org.apache.zookeepe= r.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 4002016-07-11 16:30:31,035 INFO=C2=A0 org.apache.curator.framework.state.Con= nectionStateManager:228 - State change: READ_ONLY
2016-07-11 16:32:38,33= 5 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1253 - Unable to read addition= al data from server sessionid 0x100024f09360083, likely server has closed s= ocket, closing socket connection and attempting reconnect
2016-07-11 16:= 32:38,441 INFO=C2=A0 org.apache.curator.framework.state.ConnectionStateMana= ger:228 - State change: SUSPENDED
2016-07-11 16:32:40,425 INFO=C2=A0 org= .apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-= E8WIN11/10.11.8.236:2= 181. Will not attempt to authenticate using SASL (unknown error)
201= 6-07-11 16:32:40,426 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:980 - Socke= t connection established, initiating session, client: /10.11.12.4:55637, server: QA-E8WIN11/= 10.11.8.236:2181<= br>2016-07-11 16:32:40,428 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1400 = - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =3D 0x1000= 24f09360087, negotiated timeout =3D 45000 (READ-ONLY mode)
2016-07-11 16= :32:40,428 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1297 - Checking serve= r QA-E8WIN12/10.11.8.= 232:2181 for being r/w. Timeout 800
2016-07-11 16:32:40,428 INFO=C2= =A0 org.apache.curator.framework.state.ConnectionStateManager:228 - State c= hange: READ_ONLY
2016-07-11 16:34:47,615 INFO=C2=A0 org.apache.zookeeper= .ClientCnxn:1253 - Unable to read additional data from server sessionid 0x1= 00024f09360087, likely server has closed socket, closing socket connection = and attempting reconnect
2016-07-11 16:34:47,716 INFO=C2=A0 org.apache.c= urator.framework.state.ConnectionStateManager:228 - State change: SUSPENDED=
2016-07-11 16:34:49,557 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1138= - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to auth= enticate using SASL (unknown error)
2016-07-11 16:34:49,558 INFO=C2=A0 o= rg.apache.zookeeper.ClientCnxn:980 - Socket connection established, initiat= ing session, client: /10.11.12.4:56094, server: QA-E8WIN11/10.11.8.236:2181
2016-07-11 16:34:49,560 INFO= =C2=A0 org.apache.zookeeper.ClientCnxn:1400 - Session establishment complet= e on server QA-E8WIN11/10.11.8.236:2181, sessionid =3D 0x100024f09360089, negotiated timeout= =3D 45000 (READ-ONLY mode)
2016-07-11 16:34:49,560 INFO=C2=A0 org.apach= e.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Tim= eout 1600
2016-07-11 16:34:49,560 INFO=C2=A0 org.apache.curator.framewor= k.state.ConnectionStateManager:228 - State change: READ_ONLY
2016-07-11 = 16:36:56,895 INFO=C2=A0 org.apache.zookeeper.ClientCnxn:1253 - Unable to re= ad additional data from server sessionid 0x100024f09360089, likely server h= as closed socket, closing socket connection and attempting reconnect
Exc= eption in thread "main" 2016-07-11 16:36:56,996 INFO=C2=A0 org.ap= ache.curator.framework.state.ConnectionStateManager:228 - State change: SUS= PENDED
org.apache.zookeeper.KeeperException$ConnectionLossException: Kee= perErrorCode =3D ConnectionLoss for /
=C2=A0=C2=A0=C2=A0 at org.apache.z= ookeeper.KeeperException.create(KeeperException.java:99)
=C2=A0=C2=A0=C2= =A0 at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)=
=C2=A0=C2=A0=C2=A0 at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.= java:1753)
=C2=A0=C2=A0=C2=A0 at org.apache.curator.framework.imps.GetDa= taBuilderImpl$4.call(GetDataBuilderImpl.java:310)
=C2=A0=C2=A0=C2=A0 at = org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderI= mpl.java:299)
=C2=A0=C2=A0=C2=A0 at org.apache.curator.RetryLoop.callWit= hRetry(RetryLoop.java:108)
=C2=A0=C2=A0=C2=A0 at org.apache.curator.fram= ework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)=
=C2=A0=C2=A0=C2=A0 at org.apache.curator.framework.imps.GetDataBuilderI= mpl.forPath(GetDataBuilderImpl.java:287)
=C2=A0=C2=A0=C2=A0 at org.apach= e.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java= :34)
=C2=A0=C2=A0=C2=A0 at TestCuratorReadOnly.main(TestCuratorReadOnly.= java:23)

On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie <mckenzi= e.cam@gmail.com> wrote:
= So, it appears that the initial connection occurs ok, but then something go= es wrong

2016-07-= 11 15:49:15,712 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server s= essionid 0x100024f0936005b, likely server has closed socket, closing socket= connection and attempting reconnect

Then a reconnection occurs ok, and then the cycle starts again.

Can you see where in the stack the getData() call is = hanging?

On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton <benjamin.jaton@gmail.com> wrote:
The blockUntilConnectedOrTimedOut(= ) does return, we see the line:
"Successfully established the= connection with ZooKeeper"
which happens right after.=

The getData() is where it hangs.

We see this fir= st:
2016-07-11 15:47:08,420 INFO=C2=A0 org.apache.curator.framewor= k.state.ConnectionStateManager:228 - State change: READ_ONLY

=
and then we see it again 2 minutes later:
2016-07-11 15:49:1= 7,316 INFO=C2=A0 org.apache.curator.framework.state.ConnectionStateManager:= 228 - State change: READ_ONLY

and then again 2 minutes = after that:
2016-07-11 15:51:26,470 INFO=C2=A0 org.apache.curator.= framework.state.ConnectionStateManager:228 - State change: READ_ONLY
Then I stopped it.

On Mon, Jul 11, 2016 at 4:1= 3 PM, Cameron McKenzie <mckenzie.cam@gmail.com> wrote:<= br>
hey Ben,
From the logs it looks li= ke the connection is established at the end?

Are y= ou saying that the blockUntilConnectedOrTimedOut() does not return when thi= s occurs?=C2=A0
cheers


On Tue, Jul 12, 2016 at= 9:10 AM, Benjamin Jaton <bjaton@radiantlogic.com> wro= te:
ZK is 3.5.1-alpha
Curator is 2= .11.0

On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton <bjaton@ra= diantlogic.com> wrote:
I= have a 3 nodes ZK cluster , readonly mode is enabled.
2 nodes are down= , so the third one is in read-only (verified by using the ZK API manually).=

I'm using this piece of code:

<= /div>
Builder curato= rClientBuilder =3D CuratorFrameworkFactory.builder()
.connectString("QA-E8WIN11:2181,Q= A-E8WIN12:2181")
.sessionTimeoutMs(45000).connectionTimeoutMs(15000)
.retryPolicy(new RetryNTimes(3, 500= 0)).canBeReadOnly(true);

CuratorFramework client =3D curatorClientBuilder.build= ();
client.start();=
client.getZookeepe= rClient().blockUntilConnectedOrTimedOut();
System.out.println("Successfully established t= he connection with ZooKeeper");
= client.getData().forPath("/");
System.out.println("Done.");

When curator pick the host that is UP first, it goes throu= gh very quickly. When it picks the host that is down first (QA-E8WIN12), it= never goes through.

Am I missing something here?<= /div>

Below is the log:

20= 16-07-11 15:46:45,733 INFO =C2=A0org.apache.curator.framework.imps.CuratorF= rameworkImpl:235 - Starting
2016-07-11 15:46:45,743 INFO =C2=A0or= g.apache.zookeeper.ZooKeeper:109 - Client environment:zookeeper.version=3D3= .5.1-alpha-1748895, built on 06/17/2016 22:24 GMT
2016-07-11 15:4= 6:45,743 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environment= :host.name=3Dw-rli09-ben=
2016-07-11 15:46:45,744 INFO =C2=A0org.apache.zookeeper.ZooKeepe= r:109 - Client environment:java.version=3D1.8.0_11
2016-07-11 15:= 46:45,744 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environmen= t:java.vendor=3DOracle Corporation
2016-07-11 15:46:45,744 INFO = =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environment:java.home=3D/= usr/local/apps/jdk1.8.0_11/jre
2016-07-11 15:46:45,744 INFO =C2= =A0org.apache.zookeeper.ZooKeeper:109 - Client environment:java.class.path= =3D(..)
2016-07-11 15:46:45,746 INFO =C2=A0org.apache.zookeeper.Z= ooKeeper:109 - Client environment:java.io.tmpdir=3D/tmp
2016-07-1= 1 15:46:45,746 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client envir= onment:java.compiler=3D<NA>
2016-07-11 15:46:45,747 INFO = =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environment:os.name=3DLinux
2016-07-11 15= :46:45,747 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environme= nt:os.arch=3Damd64
2016-07-11 15:46:45,747 INFO =C2=A0org.apache.= zookeeper.ZooKeeper:109 - Client environment:os.version=3D3.8.0-44-generic<= /div>
2016-07-11 15:46:45,747 INFO =C2=A0org.apache.zookeeper.ZooKeeper= :109 - Client environment:us= er.name=3Dbenji
2016-07-11 15:46:45,747 INFO =C2=A0org.apache= .zookeeper.ZooKeeper:109 - Client environment:user.home=3D(..)
20= 16-07-11 15:46:45,747 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Clien= t environment:user.dir=3D(..)
2016-07-11 15:46:45,747 INFO =C2=A0= org.apache.zookeeper.ZooKeeper:109 - Client environment:os.memory.free=3D38= 5MB
2016-07-11 15:46:45,748 INFO =C2=A0org.apache.zookeeper.ZooKe= eper:109 - Client environment:os.memory.max=3D5358MB
2016-07-11 1= 5:46:45,748 INFO =C2=A0org.apache.zookeeper.ZooKeeper:109 - Client environm= ent:os.memory.total=3D397MB
2016-07-11 15:46:45,750 INFO =C2=A0or= g.apache.zookeeper.ZooKeeper:716 - Initiating client connection, connectStr= ing=3DQA-E8WIN11:2181,QA-E8WIN12:2181 sessionTimeout=3D45000 watcher=3Dorg.= apache.curator.ConnectionState@3e96bacf
2016-07-11 15:46:45,769 I= NFO =C2=A0org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection = to server QA-E8WIN12/= 10.11.8.232:2181. Will not attempt to authenticate using SASL (unknown = error)
Successfully established the connection with ZooKeeper
2016-07-11 15:47:08,300 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1= 251 - Client session timed out, have not heard from server in 22534ms for s= essionid 0x0, closing socket connection and attempting reconnect
= 2016-07-11 15:47:08,403 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1138 - O= pening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenti= cate using SASL (unknown error)
2016-07-11 15:47:08,406 INFO =C2= =A0org.apache.zookeeper.ClientCnxn:980 - Socket connection established, ini= tiating session, client: /10.11.12.4:47004, server: QA-E8WIN11/10.11.8.236:2181
2016-07-11 15:47:08= ,413 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1400 - Session establishmen= t complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =3D 0x100024f0936005b, negotiat= ed timeout =3D 45000 (READ-ONLY mode)
2016-07-11 15:47:08,415 INF= O =C2=A0org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/<= a href=3D"http://10.11.8.232:2181" target=3D"_blank">10.11.8.232:2181 f= or being r/w. Timeout 200
2016-07-11 15:47:08,420 INFO =C2=A0org.= apache.curator.framework.state.ConnectionStateManager:228 - State change: R= EAD_ONLY
2016-07-11 15:49:15,712 INFO =C2=A0org.apache.zookeeper.= ClientCnxn:1253 - Unable to read additional data from server sessionid 0x10= 0024f0936005b, likely server has closed socket, closing socket connection a= nd attempting reconnect
2016-07-11 15:49:15,815 INFO =C2=A0org.ap= ache.curator.framework.state.ConnectionStateManager:228 - State change: SUS= PENDED
2016-07-11 15:49:17,313 INFO =C2=A0org.apache.zookeeper.Cl= ientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not at= tempt to authenticate using SASL (unknown error)
2016-07-11 15:49= :17,314 INFO =C2=A0org.apache.zookeeper.ClientCnxn:980 - Socket connection = established, initiating session, client: /10.11.12.4:47295, server: QA-E8WIN11/10.11.8.236:2181
201= 6-07-11 15:49:17,316 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1400 - Sess= ion establishment complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =3D 0x100024f093= 6005f, negotiated timeout =3D 45000 (READ-ONLY mode)
2016-07-11 1= 5:49:17,316 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1297 - Checking serv= er QA-E8WIN12/10.11.8= .232:2181 for being r/w. Timeout 400
2016-07-11 15:49:17,316 = INFO =C2=A0org.apache.curator.framework.state.ConnectionStateManager:228 - = State change: READ_ONLY
2016-07-11 15:51:24,607 INFO =C2=A0org.ap= ache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server= sessionid 0x100024f0936005f, likely server has closed socket, closing sock= et connection and attempting reconnect
2016-07-11 15:51:24,713 IN= FO =C2=A0org.apache.curator.framework.state.ConnectionStateManager:228 - St= ate change: SUSPENDED
2016-07-11 15:51:26,467 INFO =C2=A0org.apac= he.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN= 11/10.11.8.236:2181. Will not attempt to authenticate using SASL (unknown error)
<= div>2016-07-11 15:51:26,470 INFO =C2=A0org.apache.zookeeper.ClientCnxn:1297= - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 800
2016-07-= 11 15:51:26,470 INFO =C2=A0org.apache.curator.framework.state.ConnectionSta= teManager:228 - State change: READ_ONLY
(...)
Thanks
Ben













--001a114210b41bfc3105377865de--