curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: Curator never goes through when ZK is read-only
Date Thu, 06 Oct 2016 21:12:49 GMT
I don’t see it in the code. The connectionTimeoutMs is total time for connection to any node.

Did you post your test somewhere? Maybe I can try to debug it.

-JZ

> On Oct 6, 2016, at 11:11 PM, Benjamin Jaton <benjamin.jaton@gmail.com> wrote:
> 
> I was thinking that connectionTimeoutMs is the time it would take for 1 node to timeout.
> But then once you've checked 1 node, waitTimeMs becomes < 0 and the while loop exits without checking the other nodes.
> 
> But that's just a guess, maybe the problem is somewhere else.
> 
> On Thu, Oct 6, 2016 at 2:04 PM, Jordan Zimmerman <jordan@jordanzimmerman.com <mailto:jordan@jordanzimmerman.com>> wrote:
> > It seems like it should be more than that since connectionTimeoutMs will be the time it takes to check only 1 node.
> 
> Why do you think it will only check 1 node? I haven’t looked at that code in a long time but a quick check suggests to me that internalBlockUntilConnectedOrTimedOut() waits for a successful connection irrespective of which node it is.
> 
>> On Jul 30, 2016, at 12:06 AM, Benjamin Jaton <benjamin.jaton@gmail.com <mailto:benjamin.jaton@gmail.com>> wrote:
>> 
>> The internalBlockUntilConnectedOrTimedOut() method will timeout after connectionTimeoutMs.
>> 
>> It seems like it should be more than that since connectionTimeoutMs will be the time it takes to check only 1 node.
>> 
>> Thoughts?
>> 
>> On Fri, Jul 29, 2016 at 2:55 PM, Benjamin Jaton <benjamin.jaton@gmail.com <mailto:benjamin.jaton@gmail.com>> wrote:
>> Adding the end of the log:
>> 
>> 570438 [main] DEBUG org.apache.curator.RetryLoop  - Retry policy not allowing retry
>> 570438 [main-EventThread] INFO org.apache.curator.framework.state.ConnectionStateManager  - State change: SUSPENDED
>> Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /
>>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1956)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>>     at testzkclient.TestCuratorReadOnlyClient.main(TestCuratorReadOnlyClient.java:25)
>> 
>> On Fri, Jul 29, 2016 at 2:49 PM, Benjamin Jaton <benjamin.jaton@gmail.com <mailto:benjamin.jaton@gmail.com>> wrote:
>> I was making a mistake in the ZK test, the ZK client works properly ( I was not waiting for the watcher to give me a ConnectedReadOnly event ).
>> 
>> So I retried with zk 3.5.2 and curator 2.11.0, I have more logs (see below)
>> 
>> 10.11.9.70:2181 <http://10.11.9.70:2181/> is the functioning node,
>> 10.11.157.4:2181 <http://10.11.157.4:2181/> and 10.11.157.210:2181 <http://10.11.157.210:2181/> don't exist.
>> 
>> I stopped capturing at 300sec.
>> 
>> This line:
>> 15043 [main] DEBUG org.apache.curator.CuratorZookeeperClient  - blockUntilConnectedOrTimedOut() end. isConnected: false
>> is not right. Why did it stop blocking?
>> Is there a way to get more info?
>> 
>> 0 [main] INFO org.apache.curator.framework.imps.CuratorFrameworkImpl  - Starting
>> 4 [main] DEBUG org.apache.curator.CuratorZookeeperClient  - Starting
>> 4 [main] DEBUG org.apache.curator.ConnectionState  - Starting
>> 4 [main] DEBUG org.apache.curator.ConnectionState  - reset
>> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:zookeeper.version=3.5.2-alpha-1753710, built on 07/21/2016 16:24 GMT
>> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:host.name <http://host.name/>=w-rli09-ben
>> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:java.version=1.8.0_11
>> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:java.vendor=Oracle Corporation
>> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:java.home=/usr/local/apps/jdk1.8.0_11/jre
>> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:java.class.path=/home/benji/workspace-trunk/testzkclient/bin:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/commons-cli-1.2.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jackson-core-asl-1.9.11.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jackson-mapper-asl-1.9.11.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/javacc.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jetty-6.1.26.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jetty-util-6.1.26.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jline-2.11.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/log4j-1.2.17.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/netty-3.10.5.Final.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/servlet-api-2.5-20081211.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/slf4j-api-1.7.5.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/slf4j-log4j12-1.7.5.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/zookeeper-3.5.2-alpha.jar:/home/benji/workspace-trunk/apache-curator-3.2.0/guava-16.0.1.jar:/home/benji/workspace-trunk/apache-curator-3.2.0/curator-client-2.11.0.jar:/home/benji/workspace-trunk/apache-curator-3.2.0/curator-framework-2.11.0.jar
>> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
>> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:java.io.tmpdir=/tmp
>> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:java.compiler=<NA>
>> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:os.name <http://os.name/>=Linux
>> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:os.arch=amd64
>> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:os.version=3.8.0-44-generic
>> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:user.name <http://user.name/>=benji
>> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:user.home=/home/benji
>> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:user.dir=/home/benji/workspace-trunk/testzkclient
>> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:os.memory.free=378MB
>> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:os.memory.max=5358MB
>> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:os.memory.total=381MB
>> 16 [main] INFO org.apache.zookeeper.ZooKeeper  - Initiating client connection, connectString=10.11.9.70:2181 <http://10.11.9.70:2181/>,10.11.157.4:2181 <http://10.11.157.4:2181/>,10.11.157.210:2181 <http://10.11.157.210:2181/> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@1ff8b8f
>> 27 [main] DEBUG org.apache.zookeeper.ClientCnxnSocket  - jute.maxbuffer is 4194304
>> 39 [main] DEBUG org.apache.curator.CuratorZookeeperClient  - blockUntilConnectedOrTimedOut() start
>> 39 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Opening socket connection to server 10.11.157.210/10.11.157.210:2181 <http://10.11.157.210/10.11.157.210:2181>. Will not attempt to authenticate using SASL (unknown error)
>> 15043 [main] DEBUG org.apache.curator.CuratorZookeeperClient  - blockUntilConnectedOrTimedOut() end. isConnected: false
>> Successfully established the connection with ZooKeeper
>> 15058 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] WARN org.apache.zookeeper.ClientCnxn  - Client session timed out, have not heard from server in 15023ms for sessionid 0x0
>> 15058 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Client session timed out, have not heard from server in 15023ms for sessionid 0x0, closing socket connection and attempting reconnect
>> 15059 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] DEBUG org.apache.zookeeper.ClientCnxnSocketNIO  - Ignoring exception during shutdown input
>> java.net.SocketException: Socket is not connected
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateToSocketException(Net.java:143)
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateException(Net.java:177)
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateException(Net.java:183)
>>     at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:404)
>>     at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:198)
>>     at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325)
>>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245)
>> Caused by: java.nio.channels.NotYetConnectedException
>>     at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:777)
>>     at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:402)
>>     ... 3 more
>> 15062 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] DEBUG org.apache.zookeeper.ClientCnxnSocketNIO  - Ignoring exception during shutdown output
>> java.net.SocketException: Socket is not connected
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateToSocketException(Net.java:143)
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateException(Net.java:177)
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateException(Net.java:183)
>>     at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:412)
>>     at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:205)
>>     at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325)
>>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245)
>> Caused by: java.nio.channels.NotYetConnectedException
>>     at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:794)
>>     at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:410)
>>     ... 3 more
>> 15169 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Opening socket connection to server 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>. Will not attempt to authenticate using SASL (unknown error)
>> 15170 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Socket connection established, initiating session, client: /10.11.12.4:44742 <http://10.11.12.4:44742/>, server: 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>
>> 15173 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] DEBUG org.apache.zookeeper.ClientCnxn  - Session establishment request sent on 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>
>> 15180 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Session establishment complete on server 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>, sessionid = 0x10003fa439a0060, negotiated timeout = 40000 (READ-ONLY mode)
>> 15182 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Checking server 10.11.157.4/10.11.157.4:2181 <http://10.11.157.4/10.11.157.4:2181> for being r/w. Timeout 200
>> 15187 [main-EventThread] INFO org.apache.curator.framework.state.ConnectionStateManager  - State change: READ_ONLY
>> 142434 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Unable to read additional data from server sessionid 0x10003fa439a0060, likely server has closed socket, closing socket connection and attempting reconnect
>> 142538 [main-EventThread] INFO org.apache.curator.framework.state.ConnectionStateManager  - State change: SUSPENDED
>> 142545 [main] DEBUG org.apache.curator.RetryLoop  - Retry-able exception received
>> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /
>>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1956)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>>     at testzkclient.TestCuratorReadOnlyClient.main(TestCuratorReadOnlyClient.java:25)
>> 143211 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Opening socket connection to server 10.11.157.210/10.11.157.210:2181 <http://10.11.157.210/10.11.157.210:2181>. Will not attempt to authenticate using SASL (unknown error)
>> 147546 [main] DEBUG org.apache.curator.RetryLoop  - Retrying operation
>> 156556 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] WARN org.apache.zookeeper.ClientCnxn  - Client session timed out, have not heard from server in 14019ms for sessionid 0x10003fa439a0060
>> 156557 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Client session timed out, have not heard from server in 14019ms for sessionid 0x10003fa439a0060, closing socket connection and attempting reconnect
>> 156557 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] DEBUG org.apache.zookeeper.ClientCnxnSocketNIO  - Ignoring exception during shutdown input
>> java.net.SocketException: Socket is not connected
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateToSocketException(Net.java:143)
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateException(Net.java:177)
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateException(Net.java:183)
>>     at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:404)
>>     at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:198)
>>     at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325)
>>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245)
>> Caused by: java.nio.channels.NotYetConnectedException
>>     at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:777)
>>     at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:402)
>>     ... 3 more
>> 156557 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] DEBUG org.apache.zookeeper.ClientCnxnSocketNIO  - Ignoring exception during shutdown output
>> java.net.SocketException: Socket is not connected
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateToSocketException(Net.java:143)
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateException(Net.java:177)
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateException(Net.java:183)
>>     at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:412)
>>     at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:205)
>>     at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325)
>>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245)
>> Caused by: java.nio.channels.NotYetConnectedException
>>     at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:794)
>>     at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:410)
>>     ... 3 more
>> 157544 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState  - Connection timed out for connection string (10.11.9.70:2181 <http://10.11.9.70:2181/>,10.11.157.4:2181 <http://10.11.157.4:2181/>,10.11.157.210:2181 <http://10.11.157.210:2181/>) and timeout (15000) / elapsed (15006)
>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
>>     at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
>>     at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
>>     at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>     at java.lang.Thread.run(Thread.java:745)
>> 158163 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Opening socket connection to server 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>. Will not attempt to authenticate using SASL (unknown error)
>> 158164 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Socket connection established, initiating session, client: /10.11.12.4:45070 <http://10.11.12.4:45070/>, server: 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>
>> 158164 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] DEBUG org.apache.zookeeper.ClientCnxn  - Session establishment request sent on 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>
>> 158166 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Session establishment complete on server 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>, sessionid = 0x10003fa439a0061, negotiated timeout = 40000 (READ-ONLY mode)
>> 158166 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Checking server 10.11.157.4/10.11.157.4:2181 <http://10.11.157.4/10.11.157.4:2181> for being r/w. Timeout 400
>> 158167 [main-EventThread] INFO org.apache.curator.framework.state.ConnectionStateManager  - State change: READ_ONLY
>> 285409 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Unable to read additional data from server sessionid 0x10003fa439a0061, likely server has closed socket, closing socket connection and attempting reconnect
>> 285510 [main] DEBUG org.apache.curator.RetryLoop  - Retry-able exception received
>> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /
>>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1956)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>>     at testzkclient.TestCuratorReadOnlyClient.main(TestCuratorReadOnlyClient.java:25)
>> 285516 [main-EventThread] INFO org.apache.curator.framework.state.ConnectionStateManager  - State change: SUSPENDED
>> 286312 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Opening socket connection to server 10.11.157.210/10.11.157.210:2181 <http://10.11.157.210/10.11.157.210:2181>. Will not attempt to authenticate using SASL (unknown error)
>> 290511 [main] DEBUG org.apache.curator.RetryLoop  - Retrying operation
>> 295516 [Curator-Framework-0] INFO org.apache.curator.framework.state.ConnectionStateManager  - State change: LOST
>> 295517 [Curator-Framework-0] ERROR org.apache.curator.framework.imps.CuratorFrameworkImpl  - Background operation retry gave up
>> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
>>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>     at java.lang.Thread.run(Thread.java:745)
>> 295518 [Curator-Framework-0] ERROR org.apache.curator.framework.imps.CuratorFrameworkImpl  - Background retry gave up
>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:838)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>     at java.lang.Thread.run(Thread.java:745)
>> 299659 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] WARN org.apache.zookeeper.ClientCnxn  - Client session timed out, have not heard from server in 14149ms for sessionid 0x10003fa439a0061
>> 299659 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Client session timed out, have not heard from server in 14149ms for sessionid 0x10003fa439a0061, closing socket connection and attempting reconnect
>> 299660 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] DEBUG org.apache.zookeeper.ClientCnxnSocketNIO  - Ignoring exception during shutdown input
>> java.net.SocketException: Socket is not connected
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateToSocketException(Net.java:143)
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateException(Net.java:177)
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateException(Net.java:183)
>>     at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:404)
>>     at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:198)
>>     at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325)
>>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245)
>> Caused by: java.nio.channels.NotYetConnectedException
>>     at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:777)
>>     at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:402)
>>     ... 3 more
>> 299660 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] DEBUG org.apache.zookeeper.ClientCnxnSocketNIO  - Ignoring exception during shutdown output
>> java.net.SocketException: Socket is not connected
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateToSocketException(Net.java:143)
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateException(Net.java:177)
>>     at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateException(Net.java:183)
>>     at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:412)
>>     at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:205)
>>     at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325)
>>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245)
>> Caused by: java.nio.channels.NotYetConnectedException
>>     at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:794)
>>     at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:410)
>>     ... 3 more
>> 300520 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState  - Connection timed out for connection string (10.11.9.70:2181 <http://10.11.9.70:2181/>,10.11.157.4:2181 <http://10.11.157.4:2181/>,10.11.157.210:2181 <http://10.11.157.210:2181/>) and timeout (15000) / elapsed (15004)
>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
>>     at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
>>     at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
>>     at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>>     at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>     at java.lang.Thread.run(Thread.java:745)
>> 301077 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Opening socket connection to server 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>. Will not attempt to authenticate using SASL (unknown error)
>> 301078 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Socket connection established, initiating session, client: /10.11.12.4:45385 <http://10.11.12.4:45385/>, server: 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>
>> 301078 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] DEBUG org.apache.zookeeper.ClientCnxn  - Session establishment request sent on 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>
>> 301081 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Session establishment complete on server 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>, sessionid = 0x10003fa439a0062, negotiated timeout = 40000 (READ-ONLY mode)
>> 301082 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO org.apache.zookeeper.ClientCnxn  - Checking server 10.11.157.4/10.11.157.4:2181 <http://10.11.157.4/10.11.157.4:2181> for being r/w. Timeout 800
>> 301082 [main-EventThread] INFO org.apache.curator.framework.state.ConnectionStateManager  - State change: READ_ONLY
>> 
>> 
>> On Tue, Jul 12, 2016 at 4:11 PM, Benjamin Jaton <benjamin.jaton@gmail.com <mailto:benjamin.jaton@gmail.com>> wrote:
>> Cameron, you seem to be right, the simple zookeeper client also fails sometimes.
>> I will bring it up to the ZK team, thank you!
>> 
>> On Tue, Jul 12, 2016 at 3:09 PM, Cameron McKenzie <mckenzie.cam@gmail.com <mailto:mckenzie.cam@gmail.com>> wrote:
>> I'm at a bit of a loss then. Looking at the stack trace, it's in the ZK code (not the Curator code) that it appears to be blocking. Maybe try and create a test that removes Curator and just uses the ZK client directly and see if you still have the same problem?
>> 
>> On Wed, Jul 13, 2016 at 7:54 AM, Benjamin Jaton <benjamin.jaton@gmail.com <mailto:benjamin.jaton@gmail.com>> wrote:
>> Yes, I mean the ping goes through properly
>> 
>> On Tue, Jul 12, 2016 at 2:50 PM, Cameron McKenzie <mckenzie.cam@gmail.com <mailto:mckenzie.cam@gmail.com>> wrote:
>> Are the 'dead' hosts reachable? I seem to recall having problems with the ZK client blocking for a long time if you had a host configured that was resolvable but not reachable.
>> 
>> On Wed, Jul 13, 2016 at 4:30 AM, Benjamin Jaton <benjamin.jaton@gmail.com <mailto:benjamin.jaton@gmail.com>> wrote:
>> More to the story: 
>> 
>> While it consistently works on the same machine than the sole running ZK node when I specify only 2 of the 3 servers in the connection string,
>> it doesn't work when I specify 3 hosts:
>> 
>> 
>> 2016-07-12 11:23:46 INFO  o.a.z.ZooKeeper:716 - Initiating client connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181,QA-E8WIN13:2181 sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@c86b9e3
>> 2016-07-12 11:23:46 INFO  o.a.z.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> 2016-07-12 11:24:01 INFO  o.a.z.ClientCnxn:1251 - Client session timed out, have not heard from server in 15018ms for sessionid 0x0, closing socket connection and attempting reconnect
>> Successfully established the connection with ZooKeeper
>> 2016-07-12 11:24:01 INFO  o.a.z.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1251 - Client session timed out, have not heard from server in 15000ms for sessionid 0x0, closing socket connection and attempting reconnect
>> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.8.236:50423 <http://10.11.8.236:50423/>, server: QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>
>> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, sessionid = 0x10003c8af64001b, negotiated timeout = 45000 (READ-ONLY mode)
>> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1297 - Checking server QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/> for being r/w. Timeout 200
>> 2016-07-12 11:24:16 INFO  o.a.c.f.s.ConnectionStateManager:228 - State change: READ_ONLY
>> 2016-07-12 11:24:37 INFO  o.a.z.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 400
>> 2016-07-12 11:24:58 INFO  o.a.z.ClientCnxn:1251 - Client session timed out, have not heard from server in 42023ms for sessionid 0x10003c8af64001b, closing socket connection and attempting reconnect
>> 2016-07-12 11:24:58 INFO  o.a.c.f.s.ConnectionStateManager:228 - State change: SUSPENDED
>> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.8.236:50428 <http://10.11.8.236:50428/>, server: QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>
>> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, sessionid = 0x10003c8af64001c, negotiated timeout = 45000 (READ-ONLY mode)
>> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:1297 - Checking server QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/> for being r/w. Timeout 800
>> 2016-07-12 11:25:00 INFO  o.a.c.f.s.ConnectionStateManager:228 - State change: READ_ONLY
>> 2016-07-12 11:25:21 INFO  o.a.z.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 1600
>> 2016-07-12 11:25:42 INFO  o.a.z.ClientCnxn:1251 - Client session timed out, have not heard from server in 42011ms for sessionid 0x10003c8af64001c, closing socket connection and attempting reconnect
>> 2016-07-12 11:25:42 INFO  o.a.c.f.s.ConnectionStateManager:228 - State change: SUSPENDED
>> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.8.236:50435 <http://10.11.8.236:50435/>, server: QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>
>> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, sessionid = 0x10003c8af64001d, negotiated timeout = 45000 (READ-ONLY mode)
>> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1297 - Checking server QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/> for being r/w. Timeout 3200
>> 2016-07-12 11:25:44 INFO  o.a.c.f.s.ConnectionStateManager:228 - State change: READ_ONLY
>> 2016-07-12 11:26:05 INFO  o.a.z.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 6400
>> 2016-07-12 11:26:26 INFO  o.a.z.ClientCnxn:1251 - Client session timed out, have not heard from server in 42003ms for sessionid 0x10003c8af64001d, closing socket connection and attempting reconnect
>> 2016-07-12 11:26:26 INFO  o.a.c.f.s.ConnectionStateManager:228 - State change: SUSPENDED
>> Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /
>>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>>     at test.Test.main(Test.java:25)
>> 
>> 
>> That seems to suggest that adding 1 host to check somehow messes up something in the timeouts.
>> I am suspicious of those:
>> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1297 - Checking server QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/> for being r/w. Timeout 3200
>> They seem to be "blocking" the existing connection we have to the healthy ZK server, which goes stale. Is that possible?
>> 
>> 
>> On Tue, Jul 12, 2016 at 11:17 AM, Benjamin Jaton <bjaton@radiantlogic.com <mailto:bjaton@radiantlogic.com>> wrote:
>> I did not use the zkCli tool but the ZK java API directly. And other tools like ZooInspector (contrib). They were run from a remote machine.
>> I ran it on the same machine and the one read only server is running properly, I can browse it using those methods.
>> 
>> Trying to make sense of those client-server interactions is difficult but here is what I got:
>> 
>> Client:
>> 2016-07-12 10:55:01,508 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> 2016-07-12 10:55:01,512 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.12.4:60586 <http://10.11.12.4:60586/>, server: QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>
>> 2016-07-12 10:55:01,521 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, sessionid = 0x10003c8af640007, negotiated timeout = 45000 (READ-ONLY mode)
>> 2016-07-12 10:55:01,523 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 200
>> 2016-07-12 10:55:01,528 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY
>> 2016-07-12 10:57:08,800 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server sessionid 0x10003c8af640007, likely server has closed socket, closing socket connection and attempting reconnect
>> 2016-07-12 10:57:08,903 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: SUSPENDED
>> 
>> 
>> Server:
>> 2016-07-12T10:55:56,468 [myid:] - INFO  [ProcessThread(sid:1 cport:-1)::ZooKeeperServer@678] - Established session 0x10003c8af640007 with negotiated timeout 45000 for client /10.11.12.4:60586 <http://10.11.12.4:60586/>
>> (...)
>> 2016-07-12T10:56:41,756 [myid:] - INFO  [SessionTracker:ZooKeeperServer@384] - Expiring session 0x10003c8af640007, timeout of 45000ms exceeded
>> 
>> 
>> Note: Don't worry about the time being about 1 min off between the machines.
>> 
>> What I don't understand is why we are losing the read-only connection we successfully established. ZK ro server says it timed out?
>> 
>> Now interestingly, when I run it on the same machine it works and completes pretty quickly:
>> 
>> 
>> 2016-07-12 10:43:49 INFO  o.a.z.ZooKeeper:716 - Initiating client connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@e4487af
>> 2016-07-12 10:43:49 INFO  o.a.z.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> Successfully established the connection with ZooKeeper
>> 2016-07-12 10:44:10 WARN  o.a.z.ClientCnxn:1257 - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
>> java.net.ConnectException: Connection timed out: no further information
>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_91]
>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_91]
>> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518]
>> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236) [zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518]
>> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.8.236:50147 <http://10.11.8.236:50147/>, server: QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>
>> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, sessionid = 0x10003b315b30012, negotiated timeout = 45000 (READ-ONLY mode)
>> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 200
>> 2016-07-12 10:44:11 INFO  o.a.c.f.s.ConnectionStateManager:228 - State change: READ_ONLY
>> 2016-07-12 10:44:32 INFO  o.a.z.ClientCnxn:1297 - Checking server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/> for being r/w. Timeout 400
>> Done.
>> 
>> 
>> On Mon, Jul 11, 2016 at 4:42 PM, Cameron McKenzie <mckenzie.cam@gmail.com <mailto:mckenzie.cam@gmail.com>> wrote:
>> It looks like a connection issue based on the logs. When you confirmed that the connection was ok via the zkCli, was that running on the same host as the read only server? Can you run it from the same host that you're running the Java client from?
>> 
>> On Tue, Jul 12, 2016 at 9:40 AM, Benjamin Jaton <benjamin.jaton@gmail.com <mailto:benjamin.jaton@gmail.com>> wrote:
>> Well I was trying to let it run and it eventually fails, with a stack:
>> 
>> Successfully established the connection with ZooKeeper
>> 2016-07-11 16:28:21,943 INFO  org.apache.zookeeper.ClientCnxn:1251 - Client session timed out, have not heard from server in 22519ms for sessionid 0x0, closing socket connection and attempting reconnect
>> 2016-07-11 16:28:22,046 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> 2016-07-11 16:28:22,050 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.12.4:54702 <http://10.11.12.4:54702/>, server: QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>
>> 2016-07-11 16:28:22,058 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, sessionid = 0x100024f09360080, negotiated timeout = 45000 (READ-ONLY mode)
>> 2016-07-11 16:28:22,061 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 200
>> 2016-07-11 16:28:22,066 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY
>> 2016-07-11 16:30:29,312 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server sessionid 0x100024f09360080, likely server has closed socket, closing socket connection and attempting reconnect
>> 2016-07-11 16:30:29,416 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: SUSPENDED
>> 2016-07-11 16:30:31,031 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> 2016-07-11 16:30:31,032 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.12.4:55156 <http://10.11.12.4:55156/>, server: QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>
>> 2016-07-11 16:30:31,034 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, sessionid = 0x100024f09360083, negotiated timeout = 45000 (READ-ONLY mode)
>> 2016-07-11 16:30:31,034 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 400
>> 2016-07-11 16:30:31,035 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY
>> 2016-07-11 16:32:38,335 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server sessionid 0x100024f09360083, likely server has closed socket, closing socket connection and attempting reconnect
>> 2016-07-11 16:32:38,441 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: SUSPENDED
>> 2016-07-11 16:32:40,425 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> 2016-07-11 16:32:40,426 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.12.4:55637 <http://10.11.12.4:55637/>, server: QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>
>> 2016-07-11 16:32:40,428 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, sessionid = 0x100024f09360087, negotiated timeout = 45000 (READ-ONLY mode)
>> 2016-07-11 16:32:40,428 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 800
>> 2016-07-11 16:32:40,428 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY
>> 2016-07-11 16:34:47,615 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server sessionid 0x100024f09360087, likely server has closed socket, closing socket connection and attempting reconnect
>> 2016-07-11 16:34:47,716 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: SUSPENDED
>> 2016-07-11 16:34:49,557 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> 2016-07-11 16:34:49,558 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.12.4:56094 <http://10.11.12.4:56094/>, server: QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>
>> 2016-07-11 16:34:49,560 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, sessionid = 0x100024f09360089, negotiated timeout = 45000 (READ-ONLY mode)
>> 2016-07-11 16:34:49,560 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 1600
>> 2016-07-11 16:34:49,560 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY
>> 2016-07-11 16:36:56,895 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server sessionid 0x100024f09360089, likely server has closed socket, closing socket connection and attempting reconnect
>> Exception in thread "main" 2016-07-11 16:36:56,996 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: SUSPENDED
>> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /
>>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>>     at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>>     at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23)
>> 
>> On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie <mckenzie.cam@gmail.com <mailto:mckenzie.cam@gmail.com>> wrote:
>> So, it appears that the initial connection occurs ok, but then something goes wrong
>> 
>> 2016-07-11 15:49:15,712 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server sessionid 0x100024f0936005b, likely server has closed socket, closing socket connection and attempting reconnect
>> 
>> Then a reconnection occurs ok, and then the cycle starts again.
>> 
>> Can you see where in the stack the getData() call is hanging?
>> 
>> On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton <benjamin.jaton@gmail.com <mailto:benjamin.jaton@gmail.com>> wrote:
>> The blockUntilConnectedOrTimedOut() does return, we see the line:
>> "Successfully established the connection with ZooKeeper"
>> which happens right after.
>> 
>> The getData() is where it hangs.
>> 
>> We see this first:
>> 2016-07-11 15:47:08,420 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY
>> 
>> and then we see it again 2 minutes later:
>> 2016-07-11 15:49:17,316 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY
>> 
>> and then again 2 minutes after that:
>> 2016-07-11 15:51:26,470 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY
>> 
>> Then I stopped it.
>> 
>> On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie <mckenzie.cam@gmail.com <mailto:mckenzie.cam@gmail.com>> wrote:
>> hey Ben,
>> From the logs it looks like the connection is established at the end?
>> 
>> Are you saying that the blockUntilConnectedOrTimedOut() does not return when this occurs? 
>> cheers
>> 
>> 
>> On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton <bjaton@radiantlogic.com <mailto:bjaton@radiantlogic.com>> wrote:
>> ZK is 3.5.1-alpha
>> Curator is 2.11.0
>> 
>> On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton <bjaton@radiantlogic.com <mailto:bjaton@radiantlogic.com>> wrote:
>> I have a 3 nodes ZK cluster , readonly mode is enabled.
>> 2 nodes are down, so the third one is in read-only (verified by using the ZK API manually).
>> 
>> I'm using this piece of code:
>> 
>> 		Builder curatorClientBuilder = CuratorFrameworkFactory.builder()
>> 				.connectString("QA-E8WIN11:2181,QA-E8WIN12:2181")
>> 				.sessionTimeoutMs(45000).connectionTimeoutMs(15000)
>> 				.retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true);
>> 
>> 		CuratorFramework client = curatorClientBuilder.build();
>> 		client.start();
>> 		client.getZookeeperClient().blockUntilConnectedOrTimedOut();
>> 		System.out.println("Successfully established the connection with ZooKeeper");
>> 		
>> 		client.getData().forPath("/");
>> 		System.out.println("Done.");
>> 
>> When curator pick the host that is UP first, it goes through very quickly. When it picks the host that is down first (QA-E8WIN12), it never goes through.
>> 
>> Am I missing something here?
>> 
>> Below is the log:
>> 
>> 2016-07-11 15:46:45,733 INFO  org.apache.curator.framework.imps.CuratorFrameworkImpl:235 - Starting
>> 2016-07-11 15:46:45,743 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:zookeeper.version=3.5.1-alpha-1748895, built on 06/17/2016 22:24 GMT
>> 2016-07-11 15:46:45,743 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:host.name <http://host.name/>=w-rli09-ben
>> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:java.version=1.8.0_11
>> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:java.vendor=Oracle Corporation
>> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:java.home=/usr/local/apps/jdk1.8.0_11/jre
>> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:java.class.path=(..)
>> 2016-07-11 15:46:45,746 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:java.io.tmpdir=/tmp
>> 2016-07-11 15:46:45,746 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:java.compiler=<NA>
>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:os.name <http://os.name/>=Linux
>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:os.arch=amd64
>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:os.version=3.8.0-44-generic
>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:user.name <http://user.name/>=benji
>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:user.home=(..)
>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:user.dir=(..)
>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:os.memory.free=385MB
>> 2016-07-11 15:46:45,748 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:os.memory.max=5358MB
>> 2016-07-11 15:46:45,748 INFO  org.apache.zookeeper.ZooKeeper:109 - Client environment:os.memory.total=397MB
>> 2016-07-11 15:46:45,750 INFO  org.apache.zookeeper.ZooKeeper:716 - Initiating client connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@3e96bacf
>> 2016-07-11 15:46:45,769 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> Successfully established the connection with ZooKeeper
>> 2016-07-11 15:47:08,300 INFO  org.apache.zookeeper.ClientCnxn:1251 - Client session timed out, have not heard from server in 22534ms for sessionid 0x0, closing socket connection and attempting reconnect
>> 2016-07-11 15:47:08,403 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> 2016-07-11 15:47:08,406 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.12.4:47004 <http://10.11.12.4:47004/>, server: QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>
>> 2016-07-11 15:47:08,413 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, sessionid = 0x100024f0936005b, negotiated timeout = 45000 (READ-ONLY mode)
>> 2016-07-11 15:47:08,415 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 200
>> 2016-07-11 15:47:08,420 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY
>> 2016-07-11 15:49:15,712 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server sessionid 0x100024f0936005b, likely server has closed socket, closing socket connection and attempting reconnect
>> 2016-07-11 15:49:15,815 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: SUSPENDED
>> 2016-07-11 15:49:17,313 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> 2016-07-11 15:49:17,314 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.12.4:47295 <http://10.11.12.4:47295/>, server: QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>
>> 2016-07-11 15:49:17,316 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, sessionid = 0x100024f0936005f, negotiated timeout = 45000 (READ-ONLY mode)
>> 2016-07-11 15:49:17,316 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 400
>> 2016-07-11 15:49:17,316 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY
>> 2016-07-11 15:51:24,607 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server sessionid 0x100024f0936005f, likely server has closed socket, closing socket connection and attempting reconnect
>> 2016-07-11 15:51:24,713 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: SUSPENDED
>> 2016-07-11 15:51:26,467 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL (unknown error)
>> 2016-07-11 15:51:26,468 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket connection established, initiating session, client: /10.11.12.4:47576 <http://10.11.12.4:47576/>, server: QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>
>> 2016-07-11 15:51:26,470 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session establishment complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, sessionid = 0x100024f09360061, negotiated timeout = 45000 (READ-ONLY mode)
>> 2016-07-11 15:51:26,470 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 800
>> 2016-07-11 15:51:26,470 INFO  org.apache.curator.framework.state.ConnectionStateManager:228 - State change: READ_ONLY
>> (...)
>> 
>> Thanks
>> Ben
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 


Mime
View raw message