curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Serrano (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CURATOR-405) Connection loss not leading to ConnectionState.LOST and causing logs to fill
Date Wed, 27 Jun 2018 18:54:00 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525459#comment-16525459
] 

Martin Serrano commented on CURATOR-405:
----------------------------------------

We have also seen this occur in our environment with curator 4.0.0.

The issue appears to have been caused by the OS swapping out the ZK client for some period
of time.   The server side may have also been swapped out during some of the periods.  
The server does recover, but the client never does.
{noformat}
2018-06-27 06:35:37,948 WARN ClientCnxn [main-SendThread(localhost:16980)] - Client session
timed out, have not heard from server in 885327ms for sessionid 0x164381924ca0039
2018-06-27 06:35:38,011 WARN ClientCnxn [main-SendThread(localhost:16980)] - Client session
timed out, have not heard from server in 876402ms for sessionid 0x16438192ca0021
2018-06-27 06:35:38,015 INFO ClientCnxn [main-SendThread(localhost:16980)] - Client session
timed out, have not heard from server in 876402ms for sessionid 0x164381924ca0021, closing
socket connection and attempting reconnect
2018-06-27 06:35:38,021 INFO ClientCnxn [main-SendThread(localhost:16980)] - Client session
timed out, have not heard from server in 885327ms for sessionid 0x164381924ca0039, closing
socket connection and attempting reconnect
2018-06-27 06:35:38,122 INFO ConnectionStateManager [main-EventThread] - State change: SUSPENDED
2018-06-27 06:35:38,123 INFO CuratorHelper [Curator-ConnectionStateManager-0] - Connection
state change: SUSPENDED
2018-06-27 06:39:27,109 WARN ConnectionStateManager [Curator-ConnectionStateManager-0] - Session
timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 155192. Adjusted
session timeout ms: 60000
2018-06-27 06:39:27,120 INFO ClientCnxn [main-SendThread(localhost:16980)] - Opening socket
connection to server localhost/127.0.0.1:16980. Will not attempt to authenticate using SASL
(unknown error)
2018-06-27 06:39:27,120 INFO ClientCnxn [main-SendThread(localhost:16980)] - Opening socket
connection to server localhost/127.0.0.1:16980. Will not attempt to authenticate using SASL
(unknown error)
2018-06-27 06:39:27,299 INFO ClientCnxn [main-SendThread(localhost:16980)] - Socket connection
established to localhost/127.0.0.1:16980, initiating session
2018-06-27 06:39:27,324 INFO ClientCnxn [main-SendThread(localhost:16980)] - Socket connection
established to localhost/127.0.0.1:16980, initiating session
2018-06-27 06:39:27,326 WARN ClientCnxn [main-SendThread(localhost:16980)] - Unable to reconnect
to ZooKeeper service, session 0x164381924ca0039 has expired
2018-06-27 06:39:27,326 INFO ClientCnxn [main-EventThread] - EventThread shut down for session:
0x164381924ca0039
2018-06-27 06:39:27,330 WARN ClientCnxn [main-SendThread(localhost:16980)] - Unable to reconnect
to ZooKeeper service, session 0x164381924ca0021 has expired
2018-06-27 06:39:27,356 WARN ConnectionStateManager [Curator-ConnectionStateManager-0] - Session
timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 229233. Adjusted
session timeout ms: 60000
2018-06-27 06:39:27,356 WARN ConnectionStateManager [Curator-ConnectionStateManager-0] - Session
timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 229234. Adjusted
session timeout ms: 60000
2018-06-27 06:39:27,356 WARN ConnectionStateManager [Curator-ConnectionStateManager-0] - Session
timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 229234. Adjusted
session timeout ms: 60000
2018-06-27 06:39:27,356 WARN ConnectionStateManager [Curator-ConnectionStateManager-0] - Session
timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 229234. Adjusted
session timeout ms: 6000{noformat}
 

>From the server side during this time:
{noformat}
2018-06-26 12:42:50,572 INFO  ZooKeeperServer - Established session 0x164381924ca0039 with
negotiated timeout 60000 for client /127.0.0.1:51616
2018-06-27 06:35:38,042 INFO  ZooKeeperServer - Expiring session 0x164381924ca0039, timeout
of 60000ms exceeded
2018-06-27 06:35:38,060 INFO  PrepRequestProcessor - Processed session termination for sessionid:
0x164381924ca0039
EndOfStreamException: Unable to read additional data from client sessionid 0x164381924ca0039,
likely client has closed socket
2018-06-27 06:35:38,149 INFO  NIOServerCnxn - Closed socket connection for client /127.0.0.1:51616
which had sessionid 0x164381924ca0039
2018-06-27 06:39:27,326 INFO  ZooKeeperServer - Client attempting to renew session 0x164381924ca0039
at /127.0.0.1:32892
2018-06-27 06:39:27,326 INFO  ZooKeeperServer - Invalid session 0x164381924ca0039 for client
/127.0.0.1:32892, probably expired
2018-06-27 06:39:27,326 INFO  NIOServerCnxn - Closed socket connection for client /127.0.0.1:32892
which had sessionid 0x164381924ca0039{noformat}

> Connection loss not leading to ConnectionState.LOST and causing logs to fill
> ----------------------------------------------------------------------------
>
>                 Key: CURATOR-405
>                 URL: https://issues.apache.org/jira/browse/CURATOR-405
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 3.3.0
>            Reporter: Jordan Zimmerman
>            Priority: Major
>
> It's unclear how it got into this state, but a client started endlessly logging "Session
timeout has elapsed while SUSPENDED". Curator 3.0 is supposed to go to LOST and inject a session
expiration but it's not working in this case.
> There are millions of these:
> {code}
> [2017-05-01 18:29:34,636][WARN ][org.apache.curator.framework.state.ConnectionStateManager]
Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 250368206.
Adjusted session timeout ms: 15000 {}
> [2017-05-01 18:29:34,637][INFO ][XXXXX] Resolved connection string from [http://localhost:2180/zookeeper/clients/ensemble/connection-string?namespace=/v1]
to [localhost:22191,localhost:22194,localhost:22192,localhost:22193/v1] with local namespace
[/v1] {}
> [2017-05-01 18:29:34,637][INFO ][org.apache.zookeeper.ZooKeeperTestable] injectSessionExpiration()
called {}
> [2017-05-01 18:29:34,637][WARN ][org.apache.curator.framework.state.ConnectionStateManager]
Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 250368207.
Adjusted session timeout ms: 15000 {}
> [2017-05-01 18:29:34,637][INFO ][XXXXX] Resolved connection string from [http://localhost:2180/zookeeper/clients/ensemble/connection-string?namespace=/v1]
to [localhost:22191,localhost:22194,localhost:22192,localhost:22193/v1] with local namespace
[/v1] {}
> [2017-05-01 18:29:34,637][INFO ][org.apache.zookeeper.ZooKeeperTestable] injectSessionExpiration()
called {}
> [2017-05-01 18:29:34,637][WARN ][org.apache.curator.framework.state.ConnectionStateManager]
Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 250368207.
Adjusted session timeout ms: 15000 {}
> [2017-05-01 18:29:34,637][INFO ][XXXXX] Resolved connection string from [http://localhost:2180/zookeeper/clients/ensemble/connection-string?namespace=/v1]
to [localhost:22191,localhost:22194,localhost:22192,localhost:22193/v1] with local namespace
[/v1] {}
> [2017-05-01 18:29:34,637][INFO ][org.apache.zookeeper.ZooKeeperTestable] injectSessionExpiration()
called {}
> [2017-05-01 18:29:34,637][WARN ][org.apache.curator.framework.state.ConnectionStateManager]
Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 250368207.
Adjusted session timeout ms: 15000 {}
> [2017-05-01 18:29:34,637][INFO ][XXXXX] Resolved connection string from [http://localhost:2180/zookeeper/clients/ensemble/connection-string?namespace=/v1]
to [localhost:22191,localhost:22194,localhost:22192,localhost:22193/v1] with local namespace
[/v1] {}
> [2017-05-01 18:29:34,637][INFO ][org.apache.zookeeper.ZooKeeperTestable] injectSessionExpiration()
called {}
> [2017-05-01 18:29:34,637][WARN ][org.apache.curator.framework.state.ConnectionStateManager]
Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 250368207.
Adjusted session timeout ms: 15000 {}
> [2017-05-01 18:29:34,638][INFO ][XXXXX] Resolved connection string from [http://localhost:2180/zookeeper/clients/ensemble/connection-string?namespace=/v1]
to [localhost:22191,localhost:22194,localhost:22192,localhost:22193/v1] with local namespace
[/v1] {}
> [2017-05-01 18:29:34,638][INFO ][org.apache.zookeeper.ZooKeeperTestable] injectSessionExpiration()
called {}
> [2017-05-01 18:29:34,638][WARN ][org.apache.curator.framework.state.ConnectionStateManager]
Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 250368208.
Adjusted session timeout ms: 15000 {}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message