curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cameron McKenzie <mckenzie....@gmail.com>
Subject Re: Curator never goes through when ZK is read-only
Date Tue, 12 Jul 2016 21:50:19 GMT
Are the 'dead' hosts reachable? I seem to recall having problems with the
ZK client blocking for a long time if you had a host configured that was
resolvable but not reachable.

On Wed, Jul 13, 2016 at 4:30 AM, Benjamin Jaton <benjamin.jaton@gmail.com>
wrote:

> More to the story:
>
> While it consistently works on the same machine than the sole running ZK
> node when I specify only 2 of the 3 servers in the connection string,
> it doesn't work when I specify 3 hosts:
>
>
> 2016-07-12 11:23:46 INFO  o.a.z.ZooKeeper:716 - Initiating client
> connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181,QA-E8WIN13:2181
> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@c86b9e3
> 2016-07-12 11:23:46 INFO  o.a.z.ClientCnxn:1138 - Opening socket
> connection to server QA-E8WIN13/10.11.8.230:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 2016-07-12 11:24:01 INFO  o.a.z.ClientCnxn:1251 - Client session timed
> out, have not heard from server in 15018ms for sessionid 0x0, closing
> socket connection and attempting reconnect
> Successfully established the connection with ZooKeeper
> 2016-07-12 11:24:01 INFO  o.a.z.ClientCnxn:1138 - Opening socket
> connection to server QA-E8WIN12/10.11.8.232:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1251 - Client session timed
> out, have not heard from server in 15000ms for sessionid 0x0, closing
> socket connection and attempting reconnect
> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1138 - Opening socket
> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:980 - Socket connection
> established, initiating session, client: /10.11.8.236:50423, server:
> QA-E8WIN11/10.11.8.236:2181
> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1400 - Session establishment
> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =
> 0x10003c8af64001b, negotiated timeout = 45000 (READ-ONLY mode)
> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1297 - Checking server
> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 200
> 2016-07-12 11:24:16 INFO  o.a.c.f.s.ConnectionStateManager:228 - State
> change: READ_ONLY
> 2016-07-12 11:24:37 INFO  o.a.z.ClientCnxn:1297 - Checking server
> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400
> 2016-07-12 11:24:58 INFO  o.a.z.ClientCnxn:1251 - Client session timed
> out, have not heard from server in 42023ms for sessionid 0x10003c8af64001b,
> closing socket connection and attempting reconnect
> 2016-07-12 11:24:58 INFO  o.a.c.f.s.ConnectionStateManager:228 - State
> change: SUSPENDED
> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:1138 - Opening socket
> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:980 - Socket connection
> established, initiating session, client: /10.11.8.236:50428, server:
> QA-E8WIN11/10.11.8.236:2181
> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:1400 - Session establishment
> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =
> 0x10003c8af64001c, negotiated timeout = 45000 (READ-ONLY mode)
> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:1297 - Checking server
> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 800
> 2016-07-12 11:25:00 INFO  o.a.c.f.s.ConnectionStateManager:228 - State
> change: READ_ONLY
> 2016-07-12 11:25:21 INFO  o.a.z.ClientCnxn:1297 - Checking server
> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 1600
> 2016-07-12 11:25:42 INFO  o.a.z.ClientCnxn:1251 - Client session timed
> out, have not heard from server in 42011ms for sessionid 0x10003c8af64001c,
> closing socket connection and attempting reconnect
> 2016-07-12 11:25:42 INFO  o.a.c.f.s.ConnectionStateManager:228 - State
> change: SUSPENDED
> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1138 - Opening socket
> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:980 - Socket connection
> established, initiating session, client: /10.11.8.236:50435, server:
> QA-E8WIN11/10.11.8.236:2181
> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1400 - Session establishment
> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =
> 0x10003c8af64001d, negotiated timeout = 45000 (READ-ONLY mode)
> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1297 - Checking server
> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 3200
> 2016-07-12 11:25:44 INFO  o.a.c.f.s.ConnectionStateManager:228 - State
> change: READ_ONLY
> 2016-07-12 11:26:05 INFO  o.a.z.ClientCnxn:1297 - Checking server
> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 6400
> 2016-07-12 11:26:26 INFO  o.a.z.ClientCnxn:1251 - Client session timed
> out, have not heard from server in 42003ms for sessionid 0x10003c8af64001d,
> closing socket connection and attempting reconnect
> 2016-07-12 11:26:26 INFO  o.a.c.f.s.ConnectionStateManager:228 - State
> change: SUSPENDED
> Exception in thread "main"
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753)
>     at
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>     at
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>     at
> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>     at
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>     at
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>     at test.Test.main(Test.java:25)
>
>
> That seems to suggest that adding 1 host to check somehow messes up
> something in the timeouts.
> I am suspicious of those:
> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1297 - Checking server
> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 3200
> They seem to be "blocking" the existing connection we have to the healthy
> ZK server, which goes stale. Is that possible?
>
>
> On Tue, Jul 12, 2016 at 11:17 AM, Benjamin Jaton <bjaton@radiantlogic.com>
> wrote:
>
>> I did not use the zkCli tool but the ZK java API directly. And other
>> tools like ZooInspector (contrib). They were run from a remote machine.
>> I ran it on the same machine and the one read only server is running
>> properly, I can browse it using those methods.
>>
>> Trying to make sense of those client-server interactions is difficult but
>> here is what I got:
>>
>> Client:
>> 2016-07-12 10:55:01,508 INFO  org.apache.zookeeper.ClientCnxn:1138 - *Opening
>> socket connection to server QA-E8WIN11*/10.11.8.236:2181. Will not
>> attempt to authenticate using SASL (unknown error)
>> 2016-07-12 10:55:01,512 INFO  org.apache.zookeeper.ClientCnxn:980 - *Socket
>> connection established*, initiating session, client: /10.11.12.4:60586,
>> server: QA-E8WIN11/10.11.8.236:2181
>> 2016-07-12 10:55:01,521 INFO  org.apache.zookeeper.ClientCnxn:1400 - *Session
>> establishment complete on server QA-E8WIN11*/10.11.8.236:2181, *sessionid
>> = 0x10003c8af640007*, negotiated timeout = 45000 (READ-ONLY mode)
>> 2016-07-12 10:55:01,523 INFO  org.apache.zookeeper.ClientCnxn:1297 -
>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200
>> 2016-07-12 10:55:01,528 INFO
>>  org.apache.curator.framework.state.ConnectionStateManager:228 - *State
>> change: READ_ONLY*
>> 2016-07-12 10:57:08,800 INFO  org.apache.zookeeper.ClientCnxn:1253 - *Unable
>> to read additional data from server sessionid 0x10003c8af640007, likely
>> server has closed socket, closing socket connection and attempting
>> reconnect*
>> 2016-07-12 10:57:08,903 INFO
>>  org.apache.curator.framework.state.ConnectionStateManager:228 - *State
>> change: SUSPENDED*
>>
>>
>> Server:
>> 2016-07-12T10:55:56,468 [myid:] - INFO  [ProcessThread(sid:1
>> cport:-1)::ZooKeeperServer@678] - *Established session 0x10003c8af640007
>> with negotiated timeout 45000* for client /10.11.12.4:60586
>> (...)
>> 2016-07-12T10:56:41,756 [myid:] - INFO
>>  [SessionTracker:ZooKeeperServer@384] - *Expiring session
>> 0x10003c8af640007, timeout of 45000ms exceeded*
>>
>>
>> Note: Don't worry about the time being about 1 min off between the
>> machines.
>>
>> What I don't understand is why we are losing the read-only connection we
>> successfully established. ZK ro server says it timed out?
>>
>> Now interestingly, when I run it on the same machine it works and
>> completes pretty quickly:
>>
>>
>> 2016-07-12 10:43:49 INFO  o.a.z.ZooKeeper:716 - Initiating client
>> connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181
>> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@e4487af
>> 2016-07-12 10:43:49 INFO  o.a.z.ClientCnxn:1138 - Opening socket
>> connection to server QA-E8WIN12/10.11.8.232:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>> Successfully established the connection with ZooKeeper
>> 2016-07-12 10:44:10 WARN  o.a.z.ClientCnxn:1257 - Session 0x0 for server
>> null, unexpected error, closing socket connection and attempting reconnect
>> java.net.ConnectException: Connection timed out: no further information
>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_91]
>> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>> ~[?:1.8.0_91]
>> at
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
>> ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518]
>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>> [zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518]
>> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:1138 - Opening socket
>> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:980 - Socket connection
>> established, initiating session, client: /10.11.8.236:50147, server:
>> QA-E8WIN11/10.11.8.236:2181
>> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:1400 - Session establishment
>> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid =
>> 0x10003b315b30012, negotiated timeout = 45000 (READ-ONLY mode)
>> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:1297 - Checking server
>> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200
>> 2016-07-12 10:44:11 INFO  o.a.c.f.s.ConnectionStateManager:228 - State
>> change: READ_ONLY
>> 2016-07-12 10:44:32 INFO  o.a.z.ClientCnxn:1297 - Checking server
>> QA-E8WIN11/10.11.8.236:2181 for being r/w. Timeout 400
>> Done.
>>
>>
>> On Mon, Jul 11, 2016 at 4:42 PM, Cameron McKenzie <mckenzie.cam@gmail.com
>> > wrote:
>>
>>> It looks like a connection issue based on the logs. When you confirmed
>>> that the connection was ok via the zkCli, was that running on the same host
>>> as the read only server? Can you run it from the same host that you're
>>> running the Java client from?
>>>
>>> On Tue, Jul 12, 2016 at 9:40 AM, Benjamin Jaton <
>>> benjamin.jaton@gmail.com> wrote:
>>>
>>>> Well I was trying to let it run and it eventually fails, with a stack:
>>>>
>>>> Successfully established the connection with ZooKeeper
>>>> 2016-07-11 16:28:21,943 INFO  org.apache.zookeeper.ClientCnxn:1251 -
>>>> Client session timed out, have not heard from server in 22519ms for
>>>> sessionid 0x0, closing socket connection and attempting reconnect
>>>> 2016-07-11 16:28:22,046 INFO  org.apache.zookeeper.ClientCnxn:1138 -
>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will
>>>> not attempt to authenticate using SASL (unknown error)
>>>> 2016-07-11 16:28:22,050 INFO  org.apache.zookeeper.ClientCnxn:980 -
>>>> Socket connection established, initiating session, client: /
>>>> 10.11.12.4:54702, server: QA-E8WIN11/10.11.8.236:2181
>>>> 2016-07-11 16:28:22,058 INFO  org.apache.zookeeper.ClientCnxn:1400 -
>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>> sessionid = 0x100024f09360080, negotiated timeout = 45000 (READ-ONLY mode)
>>>> 2016-07-11 16:28:22,061 INFO  org.apache.zookeeper.ClientCnxn:1297 -
>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200
>>>> 2016-07-11 16:28:22,066 INFO
>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>> change: READ_ONLY
>>>> 2016-07-11 16:30:29,312 INFO  org.apache.zookeeper.ClientCnxn:1253 -
>>>> Unable to read additional data from server sessionid 0x100024f09360080,
>>>> likely server has closed socket, closing socket connection and attempting
>>>> reconnect
>>>> 2016-07-11 16:30:29,416 INFO
>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>> change: SUSPENDED
>>>> 2016-07-11 16:30:31,031 INFO  org.apache.zookeeper.ClientCnxn:1138 -
>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will
>>>> not attempt to authenticate using SASL (unknown error)
>>>> 2016-07-11 16:30:31,032 INFO  org.apache.zookeeper.ClientCnxn:980 -
>>>> Socket connection established, initiating session, client: /
>>>> 10.11.12.4:55156, server: QA-E8WIN11/10.11.8.236:2181
>>>> 2016-07-11 16:30:31,034 INFO  org.apache.zookeeper.ClientCnxn:1400 -
>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>> sessionid = 0x100024f09360083, negotiated timeout = 45000 (READ-ONLY mode)
>>>> 2016-07-11 16:30:31,034 INFO  org.apache.zookeeper.ClientCnxn:1297 -
>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400
>>>> 2016-07-11 16:30:31,035 INFO
>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>> change: READ_ONLY
>>>> 2016-07-11 16:32:38,335 INFO  org.apache.zookeeper.ClientCnxn:1253 -
>>>> Unable to read additional data from server sessionid 0x100024f09360083,
>>>> likely server has closed socket, closing socket connection and attempting
>>>> reconnect
>>>> 2016-07-11 16:32:38,441 INFO
>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>> change: SUSPENDED
>>>> 2016-07-11 16:32:40,425 INFO  org.apache.zookeeper.ClientCnxn:1138 -
>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will
>>>> not attempt to authenticate using SASL (unknown error)
>>>> 2016-07-11 16:32:40,426 INFO  org.apache.zookeeper.ClientCnxn:980 -
>>>> Socket connection established, initiating session, client: /
>>>> 10.11.12.4:55637, server: QA-E8WIN11/10.11.8.236:2181
>>>> 2016-07-11 16:32:40,428 INFO  org.apache.zookeeper.ClientCnxn:1400 -
>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>> sessionid = 0x100024f09360087, negotiated timeout = 45000 (READ-ONLY mode)
>>>> 2016-07-11 16:32:40,428 INFO  org.apache.zookeeper.ClientCnxn:1297 -
>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 800
>>>> 2016-07-11 16:32:40,428 INFO
>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>> change: READ_ONLY
>>>> 2016-07-11 16:34:47,615 INFO  org.apache.zookeeper.ClientCnxn:1253 -
>>>> Unable to read additional data from server sessionid 0x100024f09360087,
>>>> likely server has closed socket, closing socket connection and attempting
>>>> reconnect
>>>> 2016-07-11 16:34:47,716 INFO
>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>> change: SUSPENDED
>>>> 2016-07-11 16:34:49,557 INFO  org.apache.zookeeper.ClientCnxn:1138 -
>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will
>>>> not attempt to authenticate using SASL (unknown error)
>>>> 2016-07-11 16:34:49,558 INFO  org.apache.zookeeper.ClientCnxn:980 -
>>>> Socket connection established, initiating session, client: /
>>>> 10.11.12.4:56094, server: QA-E8WIN11/10.11.8.236:2181
>>>> 2016-07-11 16:34:49,560 INFO  org.apache.zookeeper.ClientCnxn:1400 -
>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>> sessionid = 0x100024f09360089, negotiated timeout = 45000 (READ-ONLY mode)
>>>> 2016-07-11 16:34:49,560 INFO  org.apache.zookeeper.ClientCnxn:1297 -
>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 1600
>>>> 2016-07-11 16:34:49,560 INFO
>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>> change: READ_ONLY
>>>> 2016-07-11 16:36:56,895 INFO  org.apache.zookeeper.ClientCnxn:1253 -
>>>> Unable to read additional data from server sessionid 0x100024f09360089,
>>>> likely server has closed socket, closing socket connection and attempting
>>>> reconnect
>>>> Exception in thread "main" 2016-07-11 16:36:56,996 INFO
>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>> change: SUSPENDED
>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>> KeeperErrorCode = ConnectionLoss for /
>>>>     at
>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>>>     at
>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>>>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753)
>>>>     at
>>>> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>>>>     at
>>>> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>>>>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>>>>     at
>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>>>>     at
>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>>>>     at
>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>>>>     at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23)
>>>>
>>>> On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie <
>>>> mckenzie.cam@gmail.com> wrote:
>>>>
>>>>> So, it appears that the initial connection occurs ok, but then
>>>>> something goes wrong
>>>>>
>>>>> 2016-07-11 15:49:15,712 INFO  org.apache.zookeeper.ClientCnxn:1253 -
>>>>> Unable to read additional data from server sessionid 0x100024f0936005b,
>>>>> likely server has closed socket, closing socket connection and attempting
>>>>> reconnect
>>>>>
>>>>> Then a reconnection occurs ok, and then the cycle starts again.
>>>>>
>>>>> Can you see where in the stack the getData() call is hanging?
>>>>>
>>>>> On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton <
>>>>> benjamin.jaton@gmail.com> wrote:
>>>>>
>>>>>> The blockUntilConnectedOrTimedOut() does return, we see the line:
>>>>>> "Successfully established the connection with ZooKeeper"
>>>>>> which happens right after.
>>>>>>
>>>>>> The getData() is where it hangs.
>>>>>>
>>>>>> We see this first:
>>>>>> 2016-07-11 15:47:08,420 INFO
>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>>>> change: READ_ONLY
>>>>>>
>>>>>> and then we see it again 2 minutes later:
>>>>>> 2016-07-11 15:49:17,316 INFO
>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>>>> change: READ_ONLY
>>>>>>
>>>>>> and then again 2 minutes after that:
>>>>>> 2016-07-11 15:51:26,470 INFO
>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>>>> change: READ_ONLY
>>>>>>
>>>>>> Then I stopped it.
>>>>>>
>>>>>> On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie <
>>>>>> mckenzie.cam@gmail.com> wrote:
>>>>>>
>>>>>>> hey Ben,
>>>>>>> From the logs it looks like the connection is established at
the end?
>>>>>>>
>>>>>>> Are you saying that the blockUntilConnectedOrTimedOut() does
not
>>>>>>> return when this occurs?
>>>>>>> cheers
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton <
>>>>>>> bjaton@radiantlogic.com> wrote:
>>>>>>>
>>>>>>>> ZK is 3.5.1-alpha
>>>>>>>> Curator is 2.11.0
>>>>>>>>
>>>>>>>> On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton <
>>>>>>>> bjaton@radiantlogic.com> wrote:
>>>>>>>>
>>>>>>>>> I have a 3 nodes ZK cluster , readonly mode is enabled.
>>>>>>>>> 2 nodes are down, so the third one is in read-only (verified
by
>>>>>>>>> using the ZK API manually).
>>>>>>>>>
>>>>>>>>> I'm using this piece of code:
>>>>>>>>>
>>>>>>>>> Builder curatorClientBuilder = CuratorFrameworkFactory.builder()
>>>>>>>>> .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181")
>>>>>>>>> .sessionTimeoutMs(45000).connectionTimeoutMs(15000)
>>>>>>>>> .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true);
>>>>>>>>>
>>>>>>>>> CuratorFramework client = curatorClientBuilder.build();
>>>>>>>>> client.start();
>>>>>>>>> client.getZookeeperClient().blockUntilConnectedOrTimedOut();
>>>>>>>>> System.out.println("Successfully established the connection
with
>>>>>>>>> ZooKeeper");
>>>>>>>>> client.getData().forPath("/");
>>>>>>>>> System.out.println("Done.");
>>>>>>>>>
>>>>>>>>> When curator pick the host that is UP first, it goes
through very
>>>>>>>>> quickly. When it picks the host that is down first (QA-E8WIN12),
it never
>>>>>>>>> goes through.
>>>>>>>>>
>>>>>>>>> Am I missing something here?
>>>>>>>>>
>>>>>>>>> Below is the log:
>>>>>>>>>
>>>>>>>>> 2016-07-11 15:46:45,733 INFO
>>>>>>>>>  org.apache.curator.framework.imps.CuratorFrameworkImpl:235
- Starting
>>>>>>>>> 2016-07-11 15:46:45,743 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:zookeeper.version=3.5.1-alpha-1748895,
built on
>>>>>>>>> 06/17/2016 22:24 GMT
>>>>>>>>> 2016-07-11 15:46:45,743 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:host.name=w-rli09-ben
>>>>>>>>> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:java.version=1.8.0_11
>>>>>>>>> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:java.vendor=Oracle Corporation
>>>>>>>>> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:java.home=/usr/local/apps/jdk1.8.0_11/jre
>>>>>>>>> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:java.class.path=(..)
>>>>>>>>> 2016-07-11 15:46:45,746 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:java.io.tmpdir=/tmp
>>>>>>>>> 2016-07-11 15:46:45,746 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:java.compiler=<NA>
>>>>>>>>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:os.name=Linux
>>>>>>>>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:os.arch=amd64
>>>>>>>>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:os.version=3.8.0-44-generic
>>>>>>>>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:user.name=benji
>>>>>>>>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:user.home=(..)
>>>>>>>>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:user.dir=(..)
>>>>>>>>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:os.memory.free=385MB
>>>>>>>>> 2016-07-11 15:46:45,748 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:os.memory.max=5358MB
>>>>>>>>> 2016-07-11 15:46:45,748 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>>>>> Client environment:os.memory.total=397MB
>>>>>>>>> 2016-07-11 15:46:45,750 INFO  org.apache.zookeeper.ZooKeeper:716
-
>>>>>>>>> Initiating client connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181
>>>>>>>>> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@3e96bacf
>>>>>>>>> 2016-07-11 15:46:45,769 INFO  org.apache.zookeeper.ClientCnxn:1138
>>>>>>>>> - Opening socket connection to server QA-E8WIN12/10.11.8.232:2181.
>>>>>>>>> Will not attempt to authenticate using SASL (unknown
error)
>>>>>>>>> Successfully established the connection with ZooKeeper
>>>>>>>>> 2016-07-11 15:47:08,300 INFO  org.apache.zookeeper.ClientCnxn:1251
>>>>>>>>> - Client session timed out, have not heard from server
in 22534ms for
>>>>>>>>> sessionid 0x0, closing socket connection and attempting
reconnect
>>>>>>>>> 2016-07-11 15:47:08,403 INFO  org.apache.zookeeper.ClientCnxn:1138
>>>>>>>>> - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181.
>>>>>>>>> Will not attempt to authenticate using SASL (unknown
error)
>>>>>>>>> 2016-07-11 15:47:08,406 INFO  org.apache.zookeeper.ClientCnxn:980
>>>>>>>>> - Socket connection established, initiating session,
client: /
>>>>>>>>> 10.11.12.4:47004, server: QA-E8WIN11/10.11.8.236:2181
>>>>>>>>> 2016-07-11 15:47:08,413 INFO  org.apache.zookeeper.ClientCnxn:1400
>>>>>>>>> - Session establishment complete on server QA-E8WIN11/
>>>>>>>>> 10.11.8.236:2181, sessionid = 0x100024f0936005b, negotiated
>>>>>>>>> timeout = 45000 (READ-ONLY mode)
>>>>>>>>> 2016-07-11 15:47:08,415 INFO  org.apache.zookeeper.ClientCnxn:1297
>>>>>>>>> - Checking server QA-E8WIN12/10.11.8.232:2181 for being
r/w.
>>>>>>>>> Timeout 200
>>>>>>>>> 2016-07-11 15:47:08,420 INFO
>>>>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>>>> change: READ_ONLY
>>>>>>>>> 2016-07-11 15:49:15,712 INFO  org.apache.zookeeper.ClientCnxn:1253
>>>>>>>>> - Unable to read additional data from server sessionid
0x100024f0936005b,
>>>>>>>>> likely server has closed socket, closing socket connection
and attempting
>>>>>>>>> reconnect
>>>>>>>>> 2016-07-11 15:49:15,815 INFO
>>>>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>>>> change: SUSPENDED
>>>>>>>>> 2016-07-11 15:49:17,313 INFO  org.apache.zookeeper.ClientCnxn:1138
>>>>>>>>> - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181.
>>>>>>>>> Will not attempt to authenticate using SASL (unknown
error)
>>>>>>>>> 2016-07-11 15:49:17,314 INFO  org.apache.zookeeper.ClientCnxn:980
>>>>>>>>> - Socket connection established, initiating session,
client: /
>>>>>>>>> 10.11.12.4:47295, server: QA-E8WIN11/10.11.8.236:2181
>>>>>>>>> 2016-07-11 15:49:17,316 INFO  org.apache.zookeeper.ClientCnxn:1400
>>>>>>>>> - Session establishment complete on server QA-E8WIN11/
>>>>>>>>> 10.11.8.236:2181, sessionid = 0x100024f0936005f, negotiated
>>>>>>>>> timeout = 45000 (READ-ONLY mode)
>>>>>>>>> 2016-07-11 15:49:17,316 INFO  org.apache.zookeeper.ClientCnxn:1297
>>>>>>>>> - Checking server QA-E8WIN12/10.11.8.232:2181 for being
r/w.
>>>>>>>>> Timeout 400
>>>>>>>>> 2016-07-11 15:49:17,316 INFO
>>>>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>>>> change: READ_ONLY
>>>>>>>>> 2016-07-11 15:51:24,607 INFO  org.apache.zookeeper.ClientCnxn:1253
>>>>>>>>> - Unable to read additional data from server sessionid
0x100024f0936005f,
>>>>>>>>> likely server has closed socket, closing socket connection
and attempting
>>>>>>>>> reconnect
>>>>>>>>> 2016-07-11 15:51:24,713 INFO
>>>>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>>>> change: SUSPENDED
>>>>>>>>> 2016-07-11 15:51:26,467 INFO  org.apache.zookeeper.ClientCnxn:1138
>>>>>>>>> - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181.
>>>>>>>>> Will not attempt to authenticate using SASL (unknown
error)
>>>>>>>>> 2016-07-11 15:51:26,468 INFO  org.apache.zookeeper.ClientCnxn:980
>>>>>>>>> - Socket connection established, initiating session,
client: /
>>>>>>>>> 10.11.12.4:47576, server: QA-E8WIN11/10.11.8.236:2181
>>>>>>>>> 2016-07-11 15:51:26,470 INFO  org.apache.zookeeper.ClientCnxn:1400
>>>>>>>>> - Session establishment complete on server QA-E8WIN11/
>>>>>>>>> 10.11.8.236:2181, sessionid = 0x100024f09360061, negotiated
>>>>>>>>> timeout = 45000 (READ-ONLY mode)
>>>>>>>>> 2016-07-11 15:51:26,470 INFO  org.apache.zookeeper.ClientCnxn:1297
>>>>>>>>> - Checking server QA-E8WIN12/10.11.8.232:2181 for being
r/w.
>>>>>>>>> Timeout 800
>>>>>>>>> 2016-07-11 15:51:26,470 INFO
>>>>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228
- State
>>>>>>>>> change: READ_ONLY
>>>>>>>>> (...)
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Ben
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message