Return-Path: X-Original-To: apmail-zookeeper-dev-archive@www.apache.org Delivered-To: apmail-zookeeper-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 82BEA107FE for ; Sun, 1 Sep 2013 17:07:53 +0000 (UTC) Received: (qmail 49960 invoked by uid 500); 1 Sep 2013 17:07:53 -0000 Delivered-To: apmail-zookeeper-dev-archive@zookeeper.apache.org Received: (qmail 49934 invoked by uid 500); 1 Sep 2013 17:07:52 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 49913 invoked by uid 99); 1 Sep 2013 17:07:51 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Sep 2013 17:07:51 +0000 Date: Sun, 1 Sep 2013 17:07:51 +0000 (UTC) From: "Edward Ribeiro (JIRA)" To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ZOOKEEPER-1576) Zookeeper cluster - failed to connect to cluster if one of the provided IPs causes java.net.UnknownHostException MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ZOOKEEPER-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Ribeiro updated ZOOKEEPER-1576: -------------------------------------- Attachment: ZOOKEEPER-1576.2.patch > Zookeeper cluster - failed to connect to cluster if one of the provided IPs causes java.net.UnknownHostException > ---------------------------------------------------------------------------------------------------------------- > > Key: ZOOKEEPER-1576 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1576 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.5.0 > Environment: Three 3.4.3 zookeeper servers in cluster, linux. > Reporter: Tally Tsabary > Attachments: ZOOKEEPER-1576.2.patch, ZOOKEEPER-1576.patch > > > Using a cluster of three 3.4.3 zookeeper servers. > All the servers are up, but on the client machine, the firewall is blocking one of the servers. > The following exception is happening, and the client is not connected to any of the other cluster members. > The exception:Nov 02, 2012 9:54:32 PM com.netflix.curator.framework.imps.CuratorFrameworkImpl logError > SEVERE: Background exception was not retry-able or retry gave up > java.net.UnknownHostException: scnrmq003.myworkday.com > at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$1.lookupAllHostAddr(Unknown Source) > at java.net.InetAddress.getAddressesFromNameService(Unknown Source) > at java.net.InetAddress.getAllByName0(Unknown Source) > at java.net.InetAddress.getAllByName(Unknown Source) > at java.net.InetAddress.getAllByName(Unknown Source) > at org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:60) > at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:440) > at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:375) > The code at the org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:60) is : > public StaticHostProvider(Collection serverAddresses) throws UnknownHostException { > for (InetSocketAddress address : serverAddresses) { > InetAddress resolvedAddresses[] = InetAddress.getAllByName(address > .getHostName()); > for (InetAddress resolvedAddress : resolvedAddresses) { this.serverAddresses.add(new InetSocketAddress(resolvedAddress .getHostAddress(), address.getPort())); } > } > ...... > The for-loop is not trying to resolve the rest of the servers on the list if there is an UnknownHostException at the InetAddress.getAllByName(address.getHostName()); > and it fails the client connection creation. > I was expecting the connection will be created for the other members of the cluster. > Also, InetAddress is a blocking command, and if it takes very long time, (longer than the defined timeout) - that also should allow us to continue to try and connect to the other servers on the list. > Assuming this will be fixed, and we will get connection to the current available servers, I think the zookeeper should continue to retry to connect to the not-connected server of the cluster, so it will be able to use it later when it is back. > If one of the servers on the list is not available during the connection creation, then it should be retried every x time despite the fact that we -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira