zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michi Mutsuzaki (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails
Date Sat, 16 May 2015 05:15:03 GMT

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michi Mutsuzaki updated ZOOKEEPER-1506:
---------------------------------------
    Description: 
   In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble.
These hostnames are configured with a low (<= 60s) TTL and the IP address they map to can
and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new
instance and remap the hostname to the new instance's IP address. Our expectation is that
when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would
reconnect to the new instance.

However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve
the hostname->IP mapping for the new server. Once the original ZK node is terminated, the
existing servers continue to attempt contacting it at the old IP address. It would be great
if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost
ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart
of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose
quorum.

The exact method we are following is to boot new instances in EC2 and attach one, of a set
of three, Elastic IP address. External to EC2 this IP address remains the same and maps to
whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of
about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it
is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address
that the elastic IP hostname gets mapped to and reconnect appropriately.

  was:
In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These
hostnames are configured with a low (<= 60s) TTL and the IP address they map to can and
does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance
and remap the hostname to the new instance's IP address. Our expectation is that when the
original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect
to the new instance.

However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve
the hostname->IP mapping for the new server. Once the original ZK node is terminated, the
existing servers continue to attempt contacting it at the old IP address. It would be great
if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost
ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart
of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose
quorum.

The exact method we are following is to boot new instances in EC2 and attach one, of a set
of three, Elastic IP address. External to EC2 this IP address remains the same and maps to
whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of
about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it
is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address
that the elastic IP hostname gets mapped to and reconnect appropriately.


> Re-try DNS hostname -> IP resolution if node connection fails
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1506
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.4.5
>         Environment: Ubuntu 11.04 64-bit
>            Reporter: Mike Heffner
>            Assignee: Michi Mutsuzaki
>            Priority: Critical
>              Labels: patch
>             Fix For: 3.4.7, 3.5.1, 3.6.0
>
>         Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch,
ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch,
zk-dns-caching-refresh.patch
>
>
>    In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble.
These hostnames are configured with a low (<= 60s) TTL and the IP address they map to can
and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new
instance and remap the hostname to the new instance's IP address. Our expectation is that
when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would
reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve
the hostname->IP mapping for the new server. Once the original ZK node is terminated, the
existing servers continue to attempt contacting it at the old IP address. It would be great
if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost
ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart
of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose
quorum.
> The exact method we are following is to boot new instances in EC2 and attach one, of
a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps
to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL
of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance
it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address
that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message