curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Jaton <benjamin.ja...@gmail.com>
Subject Re: Curator never goes through when ZK is read-only
Date Tue, 12 Jul 2016 23:11:54 GMT
Cameron, you seem to be right, the simple zookeeper client also fails
sometimes.
I will bring it up to the ZK team, thank you!

On Tue, Jul 12, 2016 at 3:09 PM, Cameron McKenzie <mckenzie.cam@gmail.com>
wrote:

> I'm at a bit of a loss then. Looking at the stack trace, it's in the ZK
> code (not the Curator code) that it appears to be blocking. Maybe try and
> create a test that removes Curator and just uses the ZK client directly and
> see if you still have the same problem?
>
> On Wed, Jul 13, 2016 at 7:54 AM, Benjamin Jaton <benjamin.jaton@gmail.com>
> wrote:
>
>> Yes, I mean the ping goes through properly
>>
>> On Tue, Jul 12, 2016 at 2:50 PM, Cameron McKenzie <mckenzie.cam@gmail.com
>> > wrote:
>>
>>> Are the 'dead' hosts reachable? I seem to recall having problems with
>>> the ZK client blocking for a long time if you had a host configured that
>>> was resolvable but not reachable.
>>>
>>> On Wed, Jul 13, 2016 at 4:30 AM, Benjamin Jaton <
>>> benjamin.jaton@gmail.com> wrote:
>>>
>>>> More to the story:
>>>>
>>>> While it consistently works on the same machine than the sole running
>>>> ZK node when I specify only 2 of the 3 servers in the connection string,
>>>> it doesn't work when I specify 3 hosts:
>>>>
>>>>
>>>> 2016-07-12 11:23:46 INFO  o.a.z.ZooKeeper:716 - Initiating client
>>>> connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181,QA-E8WIN13:2181
>>>> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@c86b9e3
>>>> 2016-07-12 11:23:46 INFO  o.a.z.ClientCnxn:1138 - Opening socket
>>>> connection to server QA-E8WIN13/10.11.8.230:2181. Will not attempt to
>>>> authenticate using SASL (unknown error)
>>>> 2016-07-12 11:24:01 INFO  o.a.z.ClientCnxn:1251 - Client session timed
>>>> out, have not heard from server in 15018ms for sessionid 0x0, closing
>>>> socket connection and attempting reconnect
>>>> Successfully established the connection with ZooKeeper
>>>> 2016-07-12 11:24:01 INFO  o.a.z.ClientCnxn:1138 - Opening socket
>>>> connection to server QA-E8WIN12/10.11.8.232:2181. Will not attempt to
>>>> authenticate using SASL (unknown error)
>>>> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1251 - Client session timed
>>>> out, have not heard from server in 15000ms for sessionid 0x0, closing
>>>> socket connection and attempting reconnect
>>>> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1138 - Opening socket
>>>> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to
>>>> authenticate using SASL (unknown error)
>>>> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:980 - Socket connection
>>>> established, initiating session, client: /10.11.8.236:50423, server:
>>>> QA-E8WIN11/10.11.8.236:2181
>>>> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1400 - Session establishment
>>>> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =
>>>> 0x10003c8af64001b, negotiated timeout = 45000 (READ-ONLY mode)
>>>> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1297 - Checking server
>>>> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 200
>>>> 2016-07-12 11:24:16 INFO  o.a.c.f.s.ConnectionStateManager:228 - State
>>>> change: READ_ONLY
>>>> 2016-07-12 11:24:37 INFO  o.a.z.ClientCnxn:1297 - Checking server
>>>> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400
>>>> 2016-07-12 11:24:58 INFO  o.a.z.ClientCnxn:1251 - Client session timed
>>>> out, have not heard from server in 42023ms for sessionid 0x10003c8af64001b,
>>>> closing socket connection and attempting reconnect
>>>> 2016-07-12 11:24:58 INFO  o.a.c.f.s.ConnectionStateManager:228 - State
>>>> change: SUSPENDED
>>>> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:1138 - Opening socket
>>>> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to
>>>> authenticate using SASL (unknown error)
>>>> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:980 - Socket connection
>>>> established, initiating session, client: /10.11.8.236:50428, server:
>>>> QA-E8WIN11/10.11.8.236:2181
>>>> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:1400 - Session establishment
>>>> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =
>>>> 0x10003c8af64001c, negotiated timeout = 45000 (READ-ONLY mode)
>>>> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:1297 - Checking server
>>>> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 800
>>>> 2016-07-12 11:25:00 INFO  o.a.c.f.s.ConnectionStateManager:228 - State
>>>> change: READ_ONLY
>>>> 2016-07-12 11:25:21 INFO  o.a.z.ClientCnxn:1297 - Checking server
>>>> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 1600
>>>> 2016-07-12 11:25:42 INFO  o.a.z.ClientCnxn:1251 - Client session timed
>>>> out, have not heard from server in 42011ms for sessionid 0x10003c8af64001c,
>>>> closing socket connection and attempting reconnect
>>>> 2016-07-12 11:25:42 INFO  o.a.c.f.s.ConnectionStateManager:228 - State
>>>> change: SUSPENDED
>>>> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1138 - Opening socket
>>>> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to
>>>> authenticate using SASL (unknown error)
>>>> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:980 - Socket connection
>>>> established, initiating session, client: /10.11.8.236:50435, server:
>>>> QA-E8WIN11/10.11.8.236:2181
>>>> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1400 - Session establishment
>>>> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =
>>>> 0x10003c8af64001d, negotiated timeout = 45000 (READ-ONLY mode)
>>>> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1297 - Checking server
>>>> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 3200
>>>> 2016-07-12 11:25:44 INFO  o.a.c.f.s.ConnectionStateManager:228 - State
>>>> change: READ_ONLY
>>>> 2016-07-12 11:26:05 INFO  o.a.z.ClientCnxn:1297 - Checking server
>>>> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 6400
>>>> 2016-07-12 11:26:26 INFO  o.a.z.ClientCnxn:1251 - Client session timed
>>>> out, have not heard from server in 42003ms for sessionid 0x10003c8af64001d,
>>>> closing socket connection and attempting reconnect
>>>> 2016-07-12 11:26:26 INFO  o.a.c.f.s.ConnectionStateManager:228 - State
>>>> change: SUSPENDED
>>>> Exception in thread "main"
>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>> KeeperErrorCode = ConnectionLoss for /
>>>>     at
>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>>>     at
>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>>>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753)
>>>>     at
>>>> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>>>>     at
>>>> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>>>>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>>>>     at
>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>>>>     at
>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>>>>     at
>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>>>>     at test.Test.main(Test.java:25)
>>>>
>>>>
>>>> That seems to suggest that adding 1 host to check somehow messes up
>>>> something in the timeouts.
>>>> I am suspicious of those:
>>>> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1297 - Checking server
>>>> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 3200
>>>> They seem to be "blocking" the existing connection we have to the
>>>> healthy ZK server, which goes stale. Is that possible?
>>>>
>>>>
>>>> On Tue, Jul 12, 2016 at 11:17 AM, Benjamin Jaton <
>>>> bjaton@radiantlogic.com> wrote:
>>>>
>>>>> I did not use the zkCli tool but the ZK java API directly. And other
>>>>> tools like ZooInspector (contrib). They were run from a remote machine.
>>>>> I ran it on the same machine and the one read only server is running
>>>>> properly, I can browse it using those methods.
>>>>>
>>>>> Trying to make sense of those client-server interactions is difficult
>>>>> but here is what I got:
>>>>>
>>>>> Client:
>>>>> 2016-07-12 10:55:01,508 INFO  org.apache.zookeeper.ClientCnxn:1138 -
*Opening
>>>>> socket connection to server QA-E8WIN11*/10.11.8.236:2181. Will not
>>>>> attempt to authenticate using SASL (unknown error)
>>>>> 2016-07-12 10:55:01,512 INFO  org.apache.zookeeper.ClientCnxn:980 - *Socket
>>>>> connection established*, initiating session, client: /10.11.12.4:60586,
>>>>> server: QA-E8WIN11/10.11.8.236:2181
>>>>> 2016-07-12 10:55:01,521 INFO  org.apache.zookeeper.ClientCnxn:1400 -
*Session
>>>>> establishment complete on server QA-E8WIN11*/10.11.8.236:2181, *sessionid
>>>>> = 0x10003c8af640007*, negotiated timeout = 45000 (READ-ONLY mode)
>>>>> 2016-07-12 10:55:01,523 INFO  org.apache.zookeeper.ClientCnxn:1297 -
>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200
>>>>> 2016-07-12 10:55:01,528 INFO
>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228 - *State
>>>>> change: READ_ONLY*
>>>>> 2016-07-12 10:57:08,800 INFO  org.apache.zookeeper.ClientCnxn:1253 -
*Unable
>>>>> to read additional data from server sessionid 0x10003c8af640007, likely
>>>>> server has closed socket, closing socket connection and attempting
>>>>> reconnect*
>>>>> 2016-07-12 10:57:08,903 INFO
>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228 - *State
>>>>> change: SUSPENDED*
>>>>>
>>>>>
>>>>> Server:
>>>>> 2016-07-12T10:55:56,468 [myid:] - INFO  [ProcessThread(sid:1
>>>>> cport:-1)::ZooKeeperServer@678] - *Established session
>>>>> 0x10003c8af640007 with negotiated timeout 45000* for client /
>>>>> 10.11.12.4:60586
>>>>> (...)
>>>>> 2016-07-12T10:56:41,756 [myid:] - INFO
>>>>>  [SessionTracker:ZooKeeperServer@384] - *Expiring session
>>>>> 0x10003c8af640007, timeout of 45000ms exceeded*
>>>>>
>>>>>
>>>>> Note: Don't worry about the time being about 1 min off between the
>>>>> machines.
>>>>>
>>>>> What I don't understand is why we are losing the read-only connection
>>>>> we successfully established. ZK ro server says it timed out?
>>>>>
>>>>> Now interestingly, when I run it on the same machine it works and
>>>>> completes pretty quickly:
>>>>>
>>>>>
>>>>> 2016-07-12 10:43:49 INFO  o.a.z.ZooKeeper:716 - Initiating client
>>>>> connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181
>>>>> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@e4487af
>>>>> 2016-07-12 10:43:49 INFO  o.a.z.ClientCnxn:1138 - Opening socket
>>>>> connection to server QA-E8WIN12/10.11.8.232:2181. Will not attempt to
>>>>> authenticate using SASL (unknown error)
>>>>> Successfully established the connection with ZooKeeper
>>>>> 2016-07-12 10:44:10 WARN  o.a.z.ClientCnxn:1257 - Session 0x0 for
>>>>> server null, unexpected error, closing socket connection and attempting
>>>>> reconnect
>>>>> java.net.ConnectException: Connection timed out: no further information
>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>> ~[?:1.8.0_91]
>>>>> at
>>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>>>>> ~[?:1.8.0_91]
>>>>> at
>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
>>>>> ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518]
>>>>> at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>>>> [zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518]
>>>>> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:1138 - Opening socket
>>>>> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to
>>>>> authenticate using SASL (unknown error)
>>>>> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:980 - Socket connection
>>>>> established, initiating session, client: /10.11.8.236:50147, server:
>>>>> QA-E8WIN11/10.11.8.236:2181
>>>>> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:1400 - Session
>>>>> establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>>> sessionid = 0x10003b315b30012, negotiated timeout = 45000 (READ-ONLY
mode)
>>>>> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:1297 - Checking server
>>>>> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200
>>>>> 2016-07-12 10:44:11 INFO  o.a.c.f.s.ConnectionStateManager:228 - State
>>>>> change: READ_ONLY
>>>>> 2016-07-12 10:44:32 INFO  o.a.z.ClientCnxn:1297 - Checking server
>>>>> QA-E8WIN11/10.11.8.236:2181 for being r/w. Timeout 400
>>>>> Done.
>>>>>
>>>>>
>>>>> On Mon, Jul 11, 2016 at 4:42 PM, Cameron McKenzie <
>>>>> mckenzie.cam@gmail.com> wrote:
>>>>>
>>>>>> It looks like a connection issue based on the logs. When you
>>>>>> confirmed that the connection was ok via the zkCli, was that running
on the
>>>>>> same host as the read only server? Can you run it from the same host
that
>>>>>> you're running the Java client from?
>>>>>>
>>>>>> On Tue, Jul 12, 2016 at 9:40 AM, Benjamin Jaton <
>>>>>> benjamin.jaton@gmail.com> wrote:
>>>>>>
>>>>>>> Well I was trying to let it run and it eventually fails, with
a
>>>>>>> stack:
>>>>>>>
>>>>>>> Successfully established the connection with ZooKeeper
>>>>>>> 2016-07-11 16:28:21,943 INFO  org.apache.zookeeper.ClientCnxn:1251
-
>>>>>>> Client session timed out, have not heard from server in 22519ms
for
>>>>>>> sessionid 0x0, closing socket connection and attempting reconnect
>>>>>>> 2016-07-11 16:28:22,046 INFO  org.apache.zookeeper.ClientCnxn:1138
-
>>>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181.
>>>>>>> Will not attempt to authenticate using SASL (unknown error)
>>>>>>> 2016-07-11 16:28:22,050 INFO  org.apache.zookeeper.ClientCnxn:980
-
>>>>>>> Socket connection established, initiating session, client: /
>>>>>>> 10.11.12.4:54702, server: QA-E8WIN11/10.11.8.236:2181
>>>>>>> 2016-07-11 16:28:22,058 INFO  org.apache.zookeeper.ClientCnxn:1400
-
>>>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>>>>> sessionid = 0x100024f09360080, negotiated timeout = 45000 (READ-ONLY
mode)
>>>>>>> 2016-07-11 16:28:22,061 INFO  org.apache.zookeeper.ClientCnxn:1297
-
>>>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout
>>>>>>> 200
>>>>>>> 2016-07-11 16:28:22,066 INFO
>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>> change: READ_ONLY
>>>>>>> 2016-07-11 16:30:29,312 INFO  org.apache.zookeeper.ClientCnxn:1253
-
>>>>>>> Unable to read additional data from server sessionid 0x100024f09360080,
>>>>>>> likely server has closed socket, closing socket connection and
attempting
>>>>>>> reconnect
>>>>>>> 2016-07-11 16:30:29,416 INFO
>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>> change: SUSPENDED
>>>>>>> 2016-07-11 16:30:31,031 INFO  org.apache.zookeeper.ClientCnxn:1138
-
>>>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181.
>>>>>>> Will not attempt to authenticate using SASL (unknown error)
>>>>>>> 2016-07-11 16:30:31,032 INFO  org.apache.zookeeper.ClientCnxn:980
-
>>>>>>> Socket connection established, initiating session, client: /
>>>>>>> 10.11.12.4:55156, server: QA-E8WIN11/10.11.8.236:2181
>>>>>>> 2016-07-11 16:30:31,034 INFO  org.apache.zookeeper.ClientCnxn:1400
-
>>>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>>>>> sessionid = 0x100024f09360083, negotiated timeout = 45000 (READ-ONLY
mode)
>>>>>>> 2016-07-11 16:30:31,034 INFO  org.apache.zookeeper.ClientCnxn:1297
-
>>>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout
>>>>>>> 400
>>>>>>> 2016-07-11 16:30:31,035 INFO
>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>> change: READ_ONLY
>>>>>>> 2016-07-11 16:32:38,335 INFO  org.apache.zookeeper.ClientCnxn:1253
-
>>>>>>> Unable to read additional data from server sessionid 0x100024f09360083,
>>>>>>> likely server has closed socket, closing socket connection and
attempting
>>>>>>> reconnect
>>>>>>> 2016-07-11 16:32:38,441 INFO
>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>> change: SUSPENDED
>>>>>>> 2016-07-11 16:32:40,425 INFO  org.apache.zookeeper.ClientCnxn:1138
-
>>>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181.
>>>>>>> Will not attempt to authenticate using SASL (unknown error)
>>>>>>> 2016-07-11 16:32:40,426 INFO  org.apache.zookeeper.ClientCnxn:980
-
>>>>>>> Socket connection established, initiating session, client: /
>>>>>>> 10.11.12.4:55637, server: QA-E8WIN11/10.11.8.236:2181
>>>>>>> 2016-07-11 16:32:40,428 INFO  org.apache.zookeeper.ClientCnxn:1400
-
>>>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>>>>> sessionid = 0x100024f09360087, negotiated timeout = 45000 (READ-ONLY
mode)
>>>>>>> 2016-07-11 16:32:40,428 INFO  org.apache.zookeeper.ClientCnxn:1297
-
>>>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout
>>>>>>> 800
>>>>>>> 2016-07-11 16:32:40,428 INFO
>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>> change: READ_ONLY
>>>>>>> 2016-07-11 16:34:47,615 INFO  org.apache.zookeeper.ClientCnxn:1253
-
>>>>>>> Unable to read additional data from server sessionid 0x100024f09360087,
>>>>>>> likely server has closed socket, closing socket connection and
attempting
>>>>>>> reconnect
>>>>>>> 2016-07-11 16:34:47,716 INFO
>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>> change: SUSPENDED
>>>>>>> 2016-07-11 16:34:49,557 INFO  org.apache.zookeeper.ClientCnxn:1138
-
>>>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181.
>>>>>>> Will not attempt to authenticate using SASL (unknown error)
>>>>>>> 2016-07-11 16:34:49,558 INFO  org.apache.zookeeper.ClientCnxn:980
-
>>>>>>> Socket connection established, initiating session, client: /
>>>>>>> 10.11.12.4:56094, server: QA-E8WIN11/10.11.8.236:2181
>>>>>>> 2016-07-11 16:34:49,560 INFO  org.apache.zookeeper.ClientCnxn:1400
-
>>>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>>>>> sessionid = 0x100024f09360089, negotiated timeout = 45000 (READ-ONLY
mode)
>>>>>>> 2016-07-11 16:34:49,560 INFO  org.apache.zookeeper.ClientCnxn:1297
-
>>>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout
>>>>>>> 1600
>>>>>>> 2016-07-11 16:34:49,560 INFO
>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>> change: READ_ONLY
>>>>>>> 2016-07-11 16:36:56,895 INFO  org.apache.zookeeper.ClientCnxn:1253
-
>>>>>>> Unable to read additional data from server sessionid 0x100024f09360089,
>>>>>>> likely server has closed socket, closing socket connection and
attempting
>>>>>>> reconnect
>>>>>>> Exception in thread "main" 2016-07-11 16:36:56,996 INFO
>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>> change: SUSPENDED
>>>>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>>>>> KeeperErrorCode = ConnectionLoss for /
>>>>>>>     at
>>>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>>>>>>     at
>>>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>>>>>>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753)
>>>>>>>     at
>>>>>>> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>>>>>>>     at
>>>>>>> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>>>>>>>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>>>>>>>     at
>>>>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>>>>>>>     at
>>>>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>>>>>>>     at
>>>>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>>>>>>>     at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23)
>>>>>>>
>>>>>>> On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie <
>>>>>>> mckenzie.cam@gmail.com> wrote:
>>>>>>>
>>>>>>>> So, it appears that the initial connection occurs ok, but
then
>>>>>>>> something goes wrong
>>>>>>>>
>>>>>>>> 2016-07-11 15:49:15,712 INFO  org.apache.zookeeper.ClientCnxn:1253
>>>>>>>> - Unable to read additional data from server sessionid 0x100024f0936005b,
>>>>>>>> likely server has closed socket, closing socket connection
and attempting
>>>>>>>> reconnect
>>>>>>>>
>>>>>>>> Then a reconnection occurs ok, and then the cycle starts
again.
>>>>>>>>
>>>>>>>> Can you see where in the stack the getData() call is hanging?
>>>>>>>>
>>>>>>>> On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton <
>>>>>>>> benjamin.jaton@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> The blockUntilConnectedOrTimedOut() does return, we see
the line:
>>>>>>>>> "Successfully established the connection with ZooKeeper"
>>>>>>>>> which happens right after.
>>>>>>>>>
>>>>>>>>> The getData() is where it hangs.
>>>>>>>>>
>>>>>>>>> We see this first:
>>>>>>>>> 2016-07-11 15:47:08,420 INFO
>>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>>>> change: READ_ONLY
>>>>>>>>>
>>>>>>>>> and then we see it again 2 minutes later:
>>>>>>>>> 2016-07-11 15:49:17,316 INFO
>>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>>>> change: READ_ONLY
>>>>>>>>>
>>>>>>>>> and then again 2 minutes after that:
>>>>>>>>> 2016-07-11 15:51:26,470 INFO
>>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>>>> change: READ_ONLY
>>>>>>>>>
>>>>>>>>> Then I stopped it.
>>>>>>>>>
>>>>>>>>> On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie <
>>>>>>>>> mckenzie.cam@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> hey Ben,
>>>>>>>>>> From the logs it looks like the connection is established
at the
>>>>>>>>>> end?
>>>>>>>>>>
>>>>>>>>>> Are you saying that the blockUntilConnectedOrTimedOut()
does not
>>>>>>>>>> return when this occurs?
>>>>>>>>>> cheers
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton <
>>>>>>>>>> bjaton@radiantlogic.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> ZK is 3.5.1-alpha
>>>>>>>>>>> Curator is 2.11.0
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton
<
>>>>>>>>>>> bjaton@radiantlogic.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I have a 3 nodes ZK cluster , readonly mode
is enabled.
>>>>>>>>>>>> 2 nodes are down, so the third one is in
read-only (verified by
>>>>>>>>>>>> using the ZK API manually).
>>>>>>>>>>>>
>>>>>>>>>>>> I'm using this piece of code:
>>>>>>>>>>>>
>>>>>>>>>>>> Builder curatorClientBuilder = CuratorFrameworkFactory.builder()
>>>>>>>>>>>> .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181")
>>>>>>>>>>>> .sessionTimeoutMs(45000).connectionTimeoutMs(15000)
>>>>>>>>>>>> .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true);
>>>>>>>>>>>>
>>>>>>>>>>>> CuratorFramework client = curatorClientBuilder.build();
>>>>>>>>>>>> client.start();
>>>>>>>>>>>> client.getZookeeperClient().blockUntilConnectedOrTimedOut();
>>>>>>>>>>>> System.out.println("Successfully established
the connection
>>>>>>>>>>>> with ZooKeeper");
>>>>>>>>>>>> client.getData().forPath("/");
>>>>>>>>>>>> System.out.println("Done.");
>>>>>>>>>>>>
>>>>>>>>>>>> When curator pick the host that is UP first,
it goes through
>>>>>>>>>>>> very quickly. When it picks the host that
is down first (QA-E8WIN12), it
>>>>>>>>>>>> never goes through.
>>>>>>>>>>>>
>>>>>>>>>>>> Am I missing something here?
>>>>>>>>>>>>
>>>>>>>>>>>> Below is the log:
>>>>>>>>>>>>
>>>>>>>>>>>> 2016-07-11 15:46:45,733 INFO
>>>>>>>>>>>>  org.apache.curator.framework.imps.CuratorFrameworkImpl:235
- Starting
>>>>>>>>>>>> 2016-07-11 15:46:45,743 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
>>>>>>>>>>>> environment:zookeeper.version=3.5.1-alpha-1748895,
built on 06/17/2016
>>>>>>>>>>>> 22:24 GMT
>>>>>>>>>>>> 2016-07-11 15:46:45,743 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
environment:
>>>>>>>>>>>> host.name=w-rli09-ben
>>>>>>>>>>>> 2016-07-11 15:46:45,744 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
>>>>>>>>>>>> environment:java.version=1.8.0_11
>>>>>>>>>>>> 2016-07-11 15:46:45,744 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
environment:java.vendor=Oracle
>>>>>>>>>>>> Corporation
>>>>>>>>>>>> 2016-07-11 15:46:45,744 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
>>>>>>>>>>>> environment:java.home=/usr/local/apps/jdk1.8.0_11/jre
>>>>>>>>>>>> 2016-07-11 15:46:45,744 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
>>>>>>>>>>>> environment:java.class.path=(..)
>>>>>>>>>>>> 2016-07-11 15:46:45,746 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
environment:java.io.tmpdir=/tmp
>>>>>>>>>>>> 2016-07-11 15:46:45,746 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
environment:java.compiler=<NA>
>>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
environment:
>>>>>>>>>>>> os.name=Linux
>>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
environment:os.arch=amd64
>>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
>>>>>>>>>>>> environment:os.version=3.8.0-44-generic
>>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
environment:
>>>>>>>>>>>> user.name=benji
>>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
environment:user.home=(..)
>>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
environment:user.dir=(..)
>>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
>>>>>>>>>>>> environment:os.memory.free=385MB
>>>>>>>>>>>> 2016-07-11 15:46:45,748 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
>>>>>>>>>>>> environment:os.memory.max=5358MB
>>>>>>>>>>>> 2016-07-11 15:46:45,748 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:109 - Client
>>>>>>>>>>>> environment:os.memory.total=397MB
>>>>>>>>>>>> 2016-07-11 15:46:45,750 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ZooKeeper:716 - Initiating
client connection,
>>>>>>>>>>>> connectString=QA-E8WIN11:2181,QA-E8WIN12:2181
sessionTimeout=45000
>>>>>>>>>>>> watcher=org.apache.curator.ConnectionState@3e96bacf
>>>>>>>>>>>> 2016-07-11 15:46:45,769 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:1138 - Opening
socket connection to server
>>>>>>>>>>>> QA-E8WIN12/10.11.8.232:2181. Will not attempt
to authenticate
>>>>>>>>>>>> using SASL (unknown error)
>>>>>>>>>>>> Successfully established the connection with
ZooKeeper
>>>>>>>>>>>> 2016-07-11 15:47:08,300 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:1251 - Client
session timed out, have not
>>>>>>>>>>>> heard from server in 22534ms for sessionid
0x0, closing socket connection
>>>>>>>>>>>> and attempting reconnect
>>>>>>>>>>>> 2016-07-11 15:47:08,403 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:1138 - Opening
socket connection to server
>>>>>>>>>>>> QA-E8WIN11/10.11.8.236:2181. Will not attempt
to authenticate
>>>>>>>>>>>> using SASL (unknown error)
>>>>>>>>>>>> 2016-07-11 15:47:08,406 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:980 - Socket
connection established,
>>>>>>>>>>>> initiating session, client: /10.11.12.4:47004,
server:
>>>>>>>>>>>> QA-E8WIN11/10.11.8.236:2181
>>>>>>>>>>>> 2016-07-11 15:47:08,413 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:1400 - Session
establishment complete on
>>>>>>>>>>>> server QA-E8WIN11/10.11.8.236:2181, sessionid
=
>>>>>>>>>>>> 0x100024f0936005b, negotiated timeout = 45000
(READ-ONLY mode)
>>>>>>>>>>>> 2016-07-11 15:47:08,415 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:1297 - Checking
server QA-E8WIN12/
>>>>>>>>>>>> 10.11.8.232:2181 for being r/w. Timeout 200
>>>>>>>>>>>> 2016-07-11 15:47:08,420 INFO
>>>>>>>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>>>>>>> change: READ_ONLY
>>>>>>>>>>>> 2016-07-11 15:49:15,712 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:1253 - Unable
to read additional data from
>>>>>>>>>>>> server sessionid 0x100024f0936005b, likely
server has closed socket,
>>>>>>>>>>>> closing socket connection and attempting
reconnect
>>>>>>>>>>>> 2016-07-11 15:49:15,815 INFO
>>>>>>>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>>>>>>> change: SUSPENDED
>>>>>>>>>>>> 2016-07-11 15:49:17,313 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:1138 - Opening
socket connection to server
>>>>>>>>>>>> QA-E8WIN11/10.11.8.236:2181. Will not attempt
to authenticate
>>>>>>>>>>>> using SASL (unknown error)
>>>>>>>>>>>> 2016-07-11 15:49:17,314 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:980 - Socket
connection established,
>>>>>>>>>>>> initiating session, client: /10.11.12.4:47295,
server:
>>>>>>>>>>>> QA-E8WIN11/10.11.8.236:2181
>>>>>>>>>>>> 2016-07-11 15:49:17,316 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:1400 - Session
establishment complete on
>>>>>>>>>>>> server QA-E8WIN11/10.11.8.236:2181, sessionid
=
>>>>>>>>>>>> 0x100024f0936005f, negotiated timeout = 45000
(READ-ONLY mode)
>>>>>>>>>>>> 2016-07-11 15:49:17,316 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:1297 - Checking
server QA-E8WIN12/
>>>>>>>>>>>> 10.11.8.232:2181 for being r/w. Timeout 400
>>>>>>>>>>>> 2016-07-11 15:49:17,316 INFO
>>>>>>>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>>>>>>> change: READ_ONLY
>>>>>>>>>>>> 2016-07-11 15:51:24,607 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:1253 - Unable
to read additional data from
>>>>>>>>>>>> server sessionid 0x100024f0936005f, likely
server has closed socket,
>>>>>>>>>>>> closing socket connection and attempting
reconnect
>>>>>>>>>>>> 2016-07-11 15:51:24,713 INFO
>>>>>>>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>>>>>>> change: SUSPENDED
>>>>>>>>>>>> 2016-07-11 15:51:26,467 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:1138 - Opening
socket connection to server
>>>>>>>>>>>> QA-E8WIN11/10.11.8.236:2181. Will not attempt
to authenticate
>>>>>>>>>>>> using SASL (unknown error)
>>>>>>>>>>>> 2016-07-11 15:51:26,468 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:980 - Socket
connection established,
>>>>>>>>>>>> initiating session, client: /10.11.12.4:47576,
server:
>>>>>>>>>>>> QA-E8WIN11/10.11.8.236:2181
>>>>>>>>>>>> 2016-07-11 15:51:26,470 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:1400 - Session
establishment complete on
>>>>>>>>>>>> server QA-E8WIN11/10.11.8.236:2181, sessionid
=
>>>>>>>>>>>> 0x100024f09360061, negotiated timeout = 45000
(READ-ONLY mode)
>>>>>>>>>>>> 2016-07-11 15:51:26,470 INFO
>>>>>>>>>>>>  org.apache.zookeeper.ClientCnxn:1297 - Checking
server QA-E8WIN12/
>>>>>>>>>>>> 10.11.8.232:2181 for being r/w. Timeout 800
>>>>>>>>>>>> 2016-07-11 15:51:26,470 INFO
>>>>>>>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>>>>>>> change: READ_ONLY
>>>>>>>>>>>> (...)
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Ben
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message