curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evaristo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CURATOR-72) Background operations don't wait for connection timeout
Date Fri, 10 Jan 2014 13:38:12 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867791#comment-13867791
] 

Evaristo commented on CURATOR-72:
---------------------------------

Hi there,

I am trying to produce a test that shows the problem but I am failing to isolate a test. Anyhow
I think I have some interesting information activating CURATOR DEBUG information just in case
that you could help me to provide some light about the problem.

As said previously with CURATOR-72 I see RECONNECTED-LOST (LOST should not happen), but I
see also some other strange things.

I have run my app tests with CURATOR 2.1.0 and CURATOR-72 branch code

My app is using several recipes (leaderelection, nodeCache, pathchildrenCache...), and the
app has several CF clients
I am using RetryOneTime(1000) retry policy for CF clients and STO 7500 msecs and CTO 1500
msecs 

Using CURATOR 2.1.0 my app works fine and listeners look ok. When I stop ZK servers the number
of background retries looks to follow the retry policy pattern.

Using CURATOR-72 branch with current code I see the following:
-	When I stop ZK servers the number  of background retries exceptions is huge in the log (for
sure it does not follow the retyr policy and CTO parameters), so I believe some recipe is
retrying like hell and not following the retry policy (in this case the process takes a much
higher CPU process that with CURATOR-2-1.0). There is a huge amount of messages with ConnectionLossException
and messages like:

[ConnectionState] 2014-01-10 14:12:19 [ERROR] - ConnectionState.java:194 | Connection timed
out for connection string (127.0.0.1:30101,127.0.0.1:30102,127.0.0.1:30103) and timeout (1500)
/ elapsed (6095)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
	at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:191)
	at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:86)
	at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:743)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:729)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$2(CuratorFrameworkImpl.java:710)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)
[CuratorFrameworkImpl] 2014-01-10 14:12:19 [DEBUG] - CuratorFrameworkImpl.java:697 | Retry
policy did not allow retry
[CuratorFrameworkImpl] 2014-01-10 14:12:19 [ERROR] - CuratorFrameworkImpl.java:512 | Background
exception was not retry-able or retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
	at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:191)
	at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:86)
	at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:743)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:729)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$2(CuratorFrameworkImpl.java:710)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)
[ConnectionState] 2014-01-10 14:12:19 [ERROR] - ConnectionState.java:194 | Connection timed
out for connection string (127.0.0.1:30101,127.0.0.1:30102,127.0.0.1:30103) and timeout (1500)
/ elapsed (6097)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
	at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:191)
	at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:86)
	at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:743)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:729)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$2(CuratorFrameworkImpl.java:710)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)
[CuratorFrameworkImpl] 2014-01-10 14:12:19 [ERROR] - CuratorFrameworkImpl.java:512 | Background
exception was not retry-able or retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
	at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:191)
	at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:86)
	at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:743)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:729)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$2(CuratorFrameworkImpl.java:710)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)
[ConnectionState] 2014-01-10 14:12:19 [ERROR] - ConnectionState.java:194 | Connection timed
out for connection string (127.0.0.1:30101,127.0.0.1:30102,127.0.0.1:30103) and timeout (1500)
/ elapsed (6098)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
	at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:191)
	at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:86)
	at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:743)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:729)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$2(CuratorFrameworkImpl.java:710)
	at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)

(Thousands of exceptions; one every 1 msec aprox).



At the end the number of ConnectionLoss Exceptions is huge and highly frequent.

When starting ZK server again, the RECONNECTED is fired but after that LOST is also fired,
so I guess that still there is race condition (although now there is an issue that triggers
so many ConnectionLoss Exceptions)

I hope it helps 

Regards,

Evaristo


> Background operations don't wait for connection timeout
> -------------------------------------------------------
>
>                 Key: CURATOR-72
>                 URL: https://issues.apache.org/jira/browse/CURATOR-72
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 2.3.0
>            Reporter: Evaristo Camarero
>            Assignee: Jordan Zimmerman
>             Fix For: 2.4.0
>
>         Attachments: TestListener.java, TestListenerConnectedAtStart.java, TestListenerSequence.java,
test.java
>
>
> Background operations don't wait for the configured connection timeout before failing.
Attached test shows the problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message