zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eron Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2982) Re-try DNS hostname -> IP resolution
Date Tue, 20 Feb 2018 23:21:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370735#comment-16370735
] 

Eron Wright  commented on ZOOKEEPER-2982:
-----------------------------------------

Attached 'fixed.log' which demonstrates the behavior after the fix is applied.   Let me know
if you also need to see the output from an unpatched cluster (I would prefer not to spend
the time to get that).

> Re-try DNS hostname -> IP resolution
> ------------------------------------
>
>                 Key: ZOOKEEPER-2982
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2982
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.0, 3.5.1, 3.5.3
>            Reporter: Eron Wright 
>            Priority: Blocker
>             Fix For: 3.5.4, 3.6.0
>
>         Attachments: fixed.log
>
>
> ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4.  Some portions of the fix haven't
yet been ported to 3.5.
> To recap the outstanding problem in 3.5, if a given ZK server is started before all peer
addresses are resolvable, that server may cache a negative lookup result and forever fail
to resolve the address.    For example, deploying ZK 3.5 to Kubernetes using a StatefulSet
plus a Service (headless) may fail because the DNS records are created lazily.
> {code}
> 2018-02-18 09:11:22,583 [myid:0] - WARN  [QuorumPeer[myid=0](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@95]
- Exception when following the leader
> java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local
>         at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>         at java.net.Socket.connect(Socket.java:589)
>         at org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227)
>         at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256)
>         at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76)
>         at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {code}
> In the above example, the address `zk-2.zk.default.svc.cluster.local` was not resolvable
when the server started, but became resolvable shortly thereafter.    The server should eventually
succeed but doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message