zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Han <h...@cloudera.com>
Subject Re: one bad ip behind DNS causing zk client failure
Date Fri, 22 Jul 2016 00:15:09 GMT
>> If the client fails to connect on the first, it tries the second, etc.
until there are no more (unless retries are specified).

ZK client does the same by transparently handling reconnection for users.
What I meant is it does not sound like a bug that ZK client fail because of
the faulty process hiding behind the DNS name - it would be a bug if ZK
client can't recover by retrying connection to good processes (under same
DNS name - assume a quorum can still be formed). It was unclear to me if
the ZK client finally recovered or not from the original post.

On Thu, Jul 21, 2016 at 4:22 PM, David Brower <david.brower@oracle.com>

> This may not be a bug in the ZK /server/, but it does seem like a problem
> case for client-side software.
> If we were able to guarantee the server process was always running, then
> we wouldn't ever need more than a one node ensemble.  Suggesting that
> clients extract names from zoo.cfg or use numeric addresses makes thing
> worse rather than better.
> I suspect more the issue is that connect strings with multiple names or
> addresses are handled differently by /clients /than a name that resolves to
> multiple addresses.
> In the Oracle client software, we had to correct such an oversight when we
> introduced the "single client access name" (SCAN) to the RAC database.
> The SCAN is a DNS name that expands to multiple addresses, normally on
> different hosts.   The client is expected to get all of the addresses back
> when it resolves the name, typically in a pseudo-random order.   If the
> client fails to connect on the first, it tries the second, etc. until there
> are no more (unless retries are specified).
> It is very convenient to not have to configure clients with explicit names
> for the server addresses, using a single name to represent the entire
> collection.    It also makes it possible to add and delete servers from the
> group transparently to the clients by manipulating the DNS entry for the
> group.
> -dB,
> Oracle RAC Database and Cluster Infrastructure Architect
> On 7/21/2016 1:16 PM, Michael Han wrote:
>> This does not sound like a ZK bug - the contract on ZooKeeper is the IP
>> addresses resolved from the host DNS name extracted from the connection
>> string should have ZK server process running.. so in this case either the
>> 'bad' IP should be removed from the record or you can use the IP address
>> instead of DNS name in zoo.cfg for connection string.
>> On Wed, Jul 20, 2016 at 6:24 PM, 蒋丽诗 <lizissleepy@gmail.com> wrote:
>> Hi,
>>> I am using zookeeper 3.4.6.
>>> I have created A records "test-zookeeper.domain.name" with 2 ips
>>> behinds.
>>> One has the zookeeper running, the other not.
>>> 21 Jul 2016 01:12:24,616 [WARN]  (main-SendThread)
>>> org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected
>>> error, closing socket connection and attempting reconnect
>>> java.net.ConnectException: Connection refused
>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>>> at
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
>>> 21 Jul 2016 01:12:24,724 [ERROR]  (main) KafkaProducerConfig: Failed to
>>> get
>>> data from zookeeper, as KeeperErrorCode = ConnectionLoss for /brokers/ids
>>> My code:
>>> ZooKeeper zk = new ZooKeeper("test-zookeeper.domain.name:2181", 60000,
>>> null); //zookeeper will close the session after 60s
>>> List<String> ids = zk.getChildren("/brokers/ids", false);
>>> =====Some debug I have already done===
>>> ConnectStringParser connectStringParser = new ConnectStringParser("
>>> test-zookeeper.domain.name:2181");
>>> Collection<InetSocketAddress> serverAddresses =
>>> connectStringParser.getServerAddresses();
>>> StaticHostProvider test = new StaticHostProvider(serverAddresses);
>>> LOG.info(test.size()); //the result is 2
>>> --
>>> Thanks,
>>> Lishi


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message