curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cameron McKenzie <mckenzie....@gmail.com>
Subject Re: Curator never goes through when ZK is read-only
Date Mon, 11 Jul 2016 23:42:52 GMT
It looks like a connection issue based on the logs. When you confirmed that
the connection was ok via the zkCli, was that running on the same host as
the read only server? Can you run it from the same host that you're running
the Java client from?

On Tue, Jul 12, 2016 at 9:40 AM, Benjamin Jaton <benjamin.jaton@gmail.com>
wrote:

> Well I was trying to let it run and it eventually fails, with a stack:
>
> Successfully established the connection with ZooKeeper
> 2016-07-11 16:28:21,943 INFO  org.apache.zookeeper.ClientCnxn:1251 -
> Client session timed out, have not heard from server in 22519ms for
> sessionid 0x0, closing socket connection and attempting reconnect
> 2016-07-11 16:28:22,046 INFO  org.apache.zookeeper.ClientCnxn:1138 -
> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2016-07-11 16:28:22,050 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket
> connection established, initiating session, client: /10.11.12.4:54702,
> server: QA-E8WIN11/10.11.8.236:2181
> 2016-07-11 16:28:22,058 INFO  org.apache.zookeeper.ClientCnxn:1400 -
> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
> sessionid = 0x100024f09360080, negotiated timeout = 45000 (READ-ONLY mode)
> 2016-07-11 16:28:22,061 INFO  org.apache.zookeeper.ClientCnxn:1297 -
> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200
> 2016-07-11 16:28:22,066 INFO
> org.apache.curator.framework.state.ConnectionStateManager:228 - State
> change: READ_ONLY
> 2016-07-11 16:30:29,312 INFO  org.apache.zookeeper.ClientCnxn:1253 -
> Unable to read additional data from server sessionid 0x100024f09360080,
> likely server has closed socket, closing socket connection and attempting
> reconnect
> 2016-07-11 16:30:29,416 INFO
> org.apache.curator.framework.state.ConnectionStateManager:228 - State
> change: SUSPENDED
> 2016-07-11 16:30:31,031 INFO  org.apache.zookeeper.ClientCnxn:1138 -
> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2016-07-11 16:30:31,032 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket
> connection established, initiating session, client: /10.11.12.4:55156,
> server: QA-E8WIN11/10.11.8.236:2181
> 2016-07-11 16:30:31,034 INFO  org.apache.zookeeper.ClientCnxn:1400 -
> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
> sessionid = 0x100024f09360083, negotiated timeout = 45000 (READ-ONLY mode)
> 2016-07-11 16:30:31,034 INFO  org.apache.zookeeper.ClientCnxn:1297 -
> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400
> 2016-07-11 16:30:31,035 INFO
> org.apache.curator.framework.state.ConnectionStateManager:228 - State
> change: READ_ONLY
> 2016-07-11 16:32:38,335 INFO  org.apache.zookeeper.ClientCnxn:1253 -
> Unable to read additional data from server sessionid 0x100024f09360083,
> likely server has closed socket, closing socket connection and attempting
> reconnect
> 2016-07-11 16:32:38,441 INFO
> org.apache.curator.framework.state.ConnectionStateManager:228 - State
> change: SUSPENDED
> 2016-07-11 16:32:40,425 INFO  org.apache.zookeeper.ClientCnxn:1138 -
> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2016-07-11 16:32:40,426 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket
> connection established, initiating session, client: /10.11.12.4:55637,
> server: QA-E8WIN11/10.11.8.236:2181
> 2016-07-11 16:32:40,428 INFO  org.apache.zookeeper.ClientCnxn:1400 -
> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
> sessionid = 0x100024f09360087, negotiated timeout = 45000 (READ-ONLY mode)
> 2016-07-11 16:32:40,428 INFO  org.apache.zookeeper.ClientCnxn:1297 -
> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 800
> 2016-07-11 16:32:40,428 INFO
> org.apache.curator.framework.state.ConnectionStateManager:228 - State
> change: READ_ONLY
> 2016-07-11 16:34:47,615 INFO  org.apache.zookeeper.ClientCnxn:1253 -
> Unable to read additional data from server sessionid 0x100024f09360087,
> likely server has closed socket, closing socket connection and attempting
> reconnect
> 2016-07-11 16:34:47,716 INFO
> org.apache.curator.framework.state.ConnectionStateManager:228 - State
> change: SUSPENDED
> 2016-07-11 16:34:49,557 INFO  org.apache.zookeeper.ClientCnxn:1138 -
> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2016-07-11 16:34:49,558 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket
> connection established, initiating session, client: /10.11.12.4:56094,
> server: QA-E8WIN11/10.11.8.236:2181
> 2016-07-11 16:34:49,560 INFO  org.apache.zookeeper.ClientCnxn:1400 -
> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
> sessionid = 0x100024f09360089, negotiated timeout = 45000 (READ-ONLY mode)
> 2016-07-11 16:34:49,560 INFO  org.apache.zookeeper.ClientCnxn:1297 -
> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 1600
> 2016-07-11 16:34:49,560 INFO
> org.apache.curator.framework.state.ConnectionStateManager:228 - State
> change: READ_ONLY
> 2016-07-11 16:36:56,895 INFO  org.apache.zookeeper.ClientCnxn:1253 -
> Unable to read additional data from server sessionid 0x100024f09360089,
> likely server has closed socket, closing socket connection and attempting
> reconnect
> Exception in thread "main" 2016-07-11 16:36:56,996 INFO
> org.apache.curator.framework.state.ConnectionStateManager:228 - State
> change: SUSPENDED
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753)
>     at
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>     at
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>     at
> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>     at
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>     at
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>     at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23)
>
> On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie <mckenzie.cam@gmail.com>
> wrote:
>
>> So, it appears that the initial connection occurs ok, but then something
>> goes wrong
>>
>> 2016-07-11 15:49:15,712 INFO  org.apache.zookeeper.ClientCnxn:1253 -
>> Unable to read additional data from server sessionid 0x100024f0936005b,
>> likely server has closed socket, closing socket connection and attempting
>> reconnect
>>
>> Then a reconnection occurs ok, and then the cycle starts again.
>>
>> Can you see where in the stack the getData() call is hanging?
>>
>> On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton <benjamin.jaton@gmail.com
>> > wrote:
>>
>>> The blockUntilConnectedOrTimedOut() does return, we see the line:
>>> "Successfully established the connection with ZooKeeper"
>>> which happens right after.
>>>
>>> The getData() is where it hangs.
>>>
>>> We see this first:
>>> 2016-07-11 15:47:08,420 INFO
>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>> change: READ_ONLY
>>>
>>> and then we see it again 2 minutes later:
>>> 2016-07-11 15:49:17,316 INFO
>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>> change: READ_ONLY
>>>
>>> and then again 2 minutes after that:
>>> 2016-07-11 15:51:26,470 INFO
>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>> change: READ_ONLY
>>>
>>> Then I stopped it.
>>>
>>> On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie <
>>> mckenzie.cam@gmail.com> wrote:
>>>
>>>> hey Ben,
>>>> From the logs it looks like the connection is established at the end?
>>>>
>>>> Are you saying that the blockUntilConnectedOrTimedOut() does not return
>>>> when this occurs?
>>>> cheers
>>>>
>>>>
>>>> On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton <
>>>> bjaton@radiantlogic.com> wrote:
>>>>
>>>>> ZK is 3.5.1-alpha
>>>>> Curator is 2.11.0
>>>>>
>>>>> On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton <
>>>>> bjaton@radiantlogic.com> wrote:
>>>>>
>>>>>> I have a 3 nodes ZK cluster , readonly mode is enabled.
>>>>>> 2 nodes are down, so the third one is in read-only (verified by using
>>>>>> the ZK API manually).
>>>>>>
>>>>>> I'm using this piece of code:
>>>>>>
>>>>>> Builder curatorClientBuilder = CuratorFrameworkFactory.builder()
>>>>>> .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181")
>>>>>> .sessionTimeoutMs(45000).connectionTimeoutMs(15000)
>>>>>> .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true);
>>>>>>
>>>>>> CuratorFramework client = curatorClientBuilder.build();
>>>>>> client.start();
>>>>>> client.getZookeeperClient().blockUntilConnectedOrTimedOut();
>>>>>> System.out.println("Successfully established the connection with
>>>>>> ZooKeeper");
>>>>>> client.getData().forPath("/");
>>>>>> System.out.println("Done.");
>>>>>>
>>>>>> When curator pick the host that is UP first, it goes through very
>>>>>> quickly. When it picks the host that is down first (QA-E8WIN12),
it never
>>>>>> goes through.
>>>>>>
>>>>>> Am I missing something here?
>>>>>>
>>>>>> Below is the log:
>>>>>>
>>>>>> 2016-07-11 15:46:45,733 INFO
>>>>>>  org.apache.curator.framework.imps.CuratorFrameworkImpl:235 - Starting
>>>>>> 2016-07-11 15:46:45,743 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:zookeeper.version=3.5.1-alpha-1748895, built on
>>>>>> 06/17/2016 22:24 GMT
>>>>>> 2016-07-11 15:46:45,743 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:host.name=w-rli09-ben
>>>>>> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:java.version=1.8.0_11
>>>>>> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:java.vendor=Oracle Corporation
>>>>>> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:java.home=/usr/local/apps/jdk1.8.0_11/jre
>>>>>> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:java.class.path=(..)
>>>>>> 2016-07-11 15:46:45,746 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:java.io.tmpdir=/tmp
>>>>>> 2016-07-11 15:46:45,746 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:java.compiler=<NA>
>>>>>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:os.name=Linux
>>>>>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:os.arch=amd64
>>>>>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:os.version=3.8.0-44-generic
>>>>>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:user.name=benji
>>>>>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:user.home=(..)
>>>>>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:user.dir=(..)
>>>>>> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:os.memory.free=385MB
>>>>>> 2016-07-11 15:46:45,748 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:os.memory.max=5358MB
>>>>>> 2016-07-11 15:46:45,748 INFO  org.apache.zookeeper.ZooKeeper:109
-
>>>>>> Client environment:os.memory.total=397MB
>>>>>> 2016-07-11 15:46:45,750 INFO  org.apache.zookeeper.ZooKeeper:716
-
>>>>>> Initiating client connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181
>>>>>> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@3e96bacf
>>>>>> 2016-07-11 15:46:45,769 INFO  org.apache.zookeeper.ClientCnxn:1138
-
>>>>>> Opening socket connection to server QA-E8WIN12/10.11.8.232:2181.
>>>>>> Will not attempt to authenticate using SASL (unknown error)
>>>>>> Successfully established the connection with ZooKeeper
>>>>>> 2016-07-11 15:47:08,300 INFO  org.apache.zookeeper.ClientCnxn:1251
-
>>>>>> Client session timed out, have not heard from server in 22534ms for
>>>>>> sessionid 0x0, closing socket connection and attempting reconnect
>>>>>> 2016-07-11 15:47:08,403 INFO  org.apache.zookeeper.ClientCnxn:1138
-
>>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181.
>>>>>> Will not attempt to authenticate using SASL (unknown error)
>>>>>> 2016-07-11 15:47:08,406 INFO  org.apache.zookeeper.ClientCnxn:980
-
>>>>>> Socket connection established, initiating session, client: /
>>>>>> 10.11.12.4:47004, server: QA-E8WIN11/10.11.8.236:2181
>>>>>> 2016-07-11 15:47:08,413 INFO  org.apache.zookeeper.ClientCnxn:1400
-
>>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>>>> sessionid = 0x100024f0936005b, negotiated timeout = 45000 (READ-ONLY
mode)
>>>>>> 2016-07-11 15:47:08,415 INFO  org.apache.zookeeper.ClientCnxn:1297
-
>>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout
>>>>>> 200
>>>>>> 2016-07-11 15:47:08,420 INFO
>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228 -
State
>>>>>> change: READ_ONLY
>>>>>> 2016-07-11 15:49:15,712 INFO  org.apache.zookeeper.ClientCnxn:1253
-
>>>>>> Unable to read additional data from server sessionid 0x100024f0936005b,
>>>>>> likely server has closed socket, closing socket connection and attempting
>>>>>> reconnect
>>>>>> 2016-07-11 15:49:15,815 INFO
>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228 -
State
>>>>>> change: SUSPENDED
>>>>>> 2016-07-11 15:49:17,313 INFO  org.apache.zookeeper.ClientCnxn:1138
-
>>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181.
>>>>>> Will not attempt to authenticate using SASL (unknown error)
>>>>>> 2016-07-11 15:49:17,314 INFO  org.apache.zookeeper.ClientCnxn:980
-
>>>>>> Socket connection established, initiating session, client: /
>>>>>> 10.11.12.4:47295, server: QA-E8WIN11/10.11.8.236:2181
>>>>>> 2016-07-11 15:49:17,316 INFO  org.apache.zookeeper.ClientCnxn:1400
-
>>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>>>> sessionid = 0x100024f0936005f, negotiated timeout = 45000 (READ-ONLY
mode)
>>>>>> 2016-07-11 15:49:17,316 INFO  org.apache.zookeeper.ClientCnxn:1297
-
>>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout
>>>>>> 400
>>>>>> 2016-07-11 15:49:17,316 INFO
>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228 -
State
>>>>>> change: READ_ONLY
>>>>>> 2016-07-11 15:51:24,607 INFO  org.apache.zookeeper.ClientCnxn:1253
-
>>>>>> Unable to read additional data from server sessionid 0x100024f0936005f,
>>>>>> likely server has closed socket, closing socket connection and attempting
>>>>>> reconnect
>>>>>> 2016-07-11 15:51:24,713 INFO
>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228 -
State
>>>>>> change: SUSPENDED
>>>>>> 2016-07-11 15:51:26,467 INFO  org.apache.zookeeper.ClientCnxn:1138
-
>>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181.
>>>>>> Will not attempt to authenticate using SASL (unknown error)
>>>>>> 2016-07-11 15:51:26,468 INFO  org.apache.zookeeper.ClientCnxn:980
-
>>>>>> Socket connection established, initiating session, client: /
>>>>>> 10.11.12.4:47576, server: QA-E8WIN11/10.11.8.236:2181
>>>>>> 2016-07-11 15:51:26,470 INFO  org.apache.zookeeper.ClientCnxn:1400
-
>>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>>>> sessionid = 0x100024f09360061, negotiated timeout = 45000 (READ-ONLY
mode)
>>>>>> 2016-07-11 15:51:26,470 INFO  org.apache.zookeeper.ClientCnxn:1297
-
>>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout
>>>>>> 800
>>>>>> 2016-07-11 15:51:26,470 INFO
>>>>>>  org.apache.curator.framework.state.ConnectionStateManager:228 -
State
>>>>>> change: READ_ONLY
>>>>>> (...)
>>>>>>
>>>>>> Thanks
>>>>>> Ben
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message