curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang XiaoTian (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (CURATOR-293) Curator can NOT reconnect after connection lost and session expired when the connection come up while the DNS server is not ready yet.(zookeeper connection string using domain names)
Date Mon, 01 Feb 2016 07:17:39 GMT

     [ https://issues.apache.org/jira/browse/CURATOR-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wang XiaoTian updated CURATOR-293:
----------------------------------
    Comment: was deleted

(was: We can solve the issue by calling the API"client.getZookeeperClient().getZooKeeper()"
periodically when receiving the "ConnectionState.LOST" event and using a handler thread pool
to process the arriving state events concurrently, so that the event will not blocked, obviously
the client.getZookeeperClient().getZooKeeper() is a thread-safe API.

Actually the framework can do the same thing for the sake of fault-tolerant feature and do
not enforce the user to handle it, just catch the exception and handle it appropriately instead
of  putting it in a background exception queue and ignore it, by the way, I don't think the
"client.getZookeeperClient().getZooKeeper()" is a public friendly API to the user.

Another issue is about the StaticHostProvider.java, it is implemented by InetAddress.java,
and there is an addressCache in the InetAddress.java, see "https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/sun/net/InetAddressCachePolicy.java",
the addressCache will cache the resolved hostname and when a given unresolved hostname be
passed, the InetAddress try to resolve the hostname by querying the address cache at first
time, I don't know why the last resolved hostname be lost in the cache. (perhaps for the reason
of the cache policy)
)

> Curator can NOT reconnect after connection lost and session expired when the connection
come up while the DNS server is not ready yet.(zookeeper connection string using domain names)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CURATOR-293
>                 URL: https://issues.apache.org/jira/browse/CURATOR-293
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.9.1
>            Reporter: huanhuan li
>            Priority: Critical
>         Attachments: CuratorConnectionLostEventTest.java
>
>
> 1. Add following lines to the /etc/hosts:
> x.x.x.x zk1.test.com
> x.x.x.x  zk2.test.com
> x.x.x.x  zk3.test.com
> 2. RUN the test programme
> 3. shutdown the network connection to x.x.x.x
> 4. wait until the session expires (for example 10 min)
> 5. remove the added 3 lines in /etc/hosts
> 6. open the network connection to x.x.x.x
> 7. watch that curator cannot reconnect
> 8. add the 3 lines to /etc/hosts
> 9. watch that curator cannot reconnect either
> The log may look like the following:
> [main-SendThread(172.24.2.35:2181)][INFO ]2016-01-26 11:07:45.005 [ClientCnxn.logStartConnect]
- Opening socket connection to server 172.24.2.35/172.24.2.35:2181. Will not attempt to authenticate
using SASL (unknown error)
> [main-SendThread(172.24.2.35:2181)][INFO ]2016-01-26 11:07:45.050 [ClientCnxn.primeConnection]
- Socket connection established to 172.24.2.35/172.24.2.35:2181, initiating session
> [main-EventThread][WARN ]2016-01-26 11:07:45.093 [ConnectionState.handleExpiredSession]
- Session expired event received
> [main-EventThread][DEBUG]2016-01-26 11:07:45.093 [ConnectionState.reset] - reset
> [main-SendThread(172.24.2.35:2181)][INFO ]2016-01-26 11:07:45.093 [ClientCnxn.run] -
Unable to reconnect to ZooKeeper service, session 0x1525d9593a537af has expired, closing socket
connection
> [main-EventThread][INFO ]2016-01-26 11:07:45.095 [ZooKeeper.<init>] - Initiating
client connection, connectString=zk1.test.com:2181,zk2.test.com:2181,zk3.test.com:2181 sessionTimeout=60000
watcher=org.apache.curator.ConnectionState@7e7d611f
> [main-EventThread][INFO ]2016-01-26 11:07:45.488 [ClientCnxn.run] - EventThread shut
down
> [main-SendThread(111.206.227.147:2181)][INFO ]2016-01-26 11:07:45.615 [ClientCnxn.logStartConnect]
- Opening socket connection to server 111.206.227.147/111.206.227.147:2181. Will not attempt
to authenticate using SASL (unknown error)
> [Curator-ConnectionStateManager-0][DEBUG]2016-01-26 11:07:58.523 [CuratorZookeeperClient.blockUntilConnectedOrTimedOut]
- blockUntilConnectedOrTimedOut() end. isConnected: false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message