zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abraham Fine (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ZOOKEEPER-2982) Re-try DNS hostname -> IP resolution
Date Tue, 20 Feb 2018 22:29:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370631#comment-16370631
] 

Abraham Fine edited comment on ZOOKEEPER-2982 at 2/20/18 10:28 PM:
-------------------------------------------------------------------

I'm wondering if  [~rthille] can chime in on this.

It looks like the change this JIRA is talking about is referenced by https://issues.apache.org/jira/browse/ZOOKEEPER-1506?focusedCommentId=14711955&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14711955

Is there a reason why this change was left out of branch-3.5 (and master)?

My guess is that in master and branch-3.5 we always call `recreateSocketAddresses` in `connectOne`
which should be called during leader election of communication to another quorum member stops.
Again, it would be great to have [~rthille] confirm/tell me how wrong I am.


was (Author: abrahamfine):
I'm wondering if  [~rthille] can chime in on this.

It looks like the change this JIRA is talking about is referenced by https://issues.apache.org/jira/browse/ZOOKEEPER-1506?focusedCommentId=14711955&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14711955

Is there a reason why this change was left out of branch-3.5 (and master)?

> Re-try DNS hostname -> IP resolution
> ------------------------------------
>
>                 Key: ZOOKEEPER-2982
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2982
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.0, 3.5.1, 3.5.3
>            Reporter: Eron Wright 
>            Priority: Blocker
>             Fix For: 3.5.4, 3.6.0
>
>
> ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4.  Some portions of the fix haven't
yet been ported to 3.5.
> To recap the outstanding problem in 3.5, if a given ZK server is started before all peer
addresses are resolvable, that server may cache a negative lookup result and forever fail
to resolve the address.    For example, deploying ZK 3.5 to Kubernetes using a StatefulSet
plus a Service (headless) may fail because the DNS records are created lazily.
> {code}
> 2018-02-18 09:11:22,583 [myid:0] - WARN  [QuorumPeer[myid=0](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@95]
- Exception when following the leader
> java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local
>         at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>         at java.net.Socket.connect(Socket.java:589)
>         at org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227)
>         at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256)
>         at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76)
>         at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {code}
> In the above example, the address `zk-2.zk.default.svc.cluster.local` was not resolvable
when the server started, but became resolvable shortly thereafter.    The server should eventually
succeed but doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message