curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Grove <andy.gr...@codefutures.com>
Subject Re: Curator client fails to connect if any one of my zookeeper instances is down (UnknownHostException)
Date Tue, 25 Jun 2013 21:08:52 GMT
I went ahead and filed a JIRA ticket for this issue with source code to reproduce: 

https://issues.apache.org/jira/browse/CURATOR-40

Thanks,

Andy.

--
Andy Grove
VP, R&D
CodeFutures Corporation

Share Nothing, Shard Everything!
http://www.dbshards.com




On Jun 25, 2013, at 2:07 PM, Andy Grove <andy.grove@codefutures.com> wrote:

> After failing to reproduce this issue locally, I enabled some trace logging and re-tested
on Amazon EC2 and have further information on this now. 
> 
> The issue seems to be specific to java.net.UnknownHostException.
> 
> The first error I see is:
> 
>      [java]  2013-06-25 19:59:47,883 ERROR c.n.c.f.i.CuratorFrameworkImpl [main] Background
exception was not retry-able or retry gave up
>      [java]  java.net.UnknownHostException: ec2-107-21-126-93.compute-1.amazonaws.com
>      [java] 	at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
>      [java] 	at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:850)
>      [java] 	at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1201)
>      [java] 	at java.net.InetAddress.getAllByName0(InetAddress.java:1154)
>      [java] 	at java.net.InetAddress.getAllByName(InetAddress.java:1084)
>      [java] 	at java.net.InetAddress.getAllByName(InetAddress.java:1020)
>      [java] 	at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60)
>      [java] 	at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
>      [java] 	at com.netflix.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:27)
> 
> This error is not escalated to the application code, so when the application tries performing
an operation on the curator client, I get the following logging in a loop:
> 
> BEGIN LOOP
> 
> [java]  2013-06-25 20:00:03,131 ERROR c.n.c.ConnectionState [main] Connection timed out
for connection string (10.96.214.121:8090,ec2-107-21-126-93.compute-1.amazonaws.com:8090,10.112.81.128:8090)
and timeout (15000) / elapsed (15272)
> [java]  org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
= ConnectionLoss
> 
> ...
> 
> [java] 2013-06-25 20:00:03,134 TRACE c.d.n.ZKClient$1 [main] addCount() connections-timed-out
> [java]  2013-06-25 20:00:03,135 DEBUG c.n.c.RetryLoop [main] Retry-able exception received
> [java]  org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
= ConnectionLoss
> 
> ....
> 
> [java] 2013-06-25 20:04:47,518 TRACE c.d.n.ZKClient$1 [main] addCount() retries-allowed
> [java]  2013-06-25 20:04:47,519 DEBUG c.n.c.RetryLoop [main] Retrying operation
> 
> END LOOP
> 
> On Jun 25, 2013, at 11:16 AM, Andy Grove <andy.grove@codefutures.com> wrote:
> 
>> Hi,
>> 
>> I'm using the following code to connect to my zookeeper instances:
>> 
>>             client = CuratorFrameworkFactory.newClient(connectString, sessionTimeout,
connectTimeout,new ExponentialBackoffRetry(1000, 3));
>> 
>> I have three hosts, lets call them host1, host2 and host3. If all hosts are running
then everything works as expected.
>> 
>> If host1 is down (server shut down) then all operations on the curator client fail
and I see errors like this:
>> 
>> ERROR com.netflix.curator.ConnectionState - Connection timed out for connection string
(host1:8090,host2:8090,host3:8090) and timeout (15000) / elapsed (15310)
>> 
>> It doesn't matter what order I specify the hosts in, I always get these errors and
my operation eventually fails with:
>> 
>>      [java] Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss
>>      [java] 	at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:101)
>>      [java] 	at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:107)
>>      [java] 	at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:445)
>>      [java] 	at com.netflix.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:171)
>>      [java] 	at com.netflix.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:160)
>>      [java] 	at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106)
>>      [java] 	at com.netflix.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:156)
>>      [java] 	at com.netflix.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:147)
>>      [java] 	at com.netflix.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:35)
>>      [java] 	at com.dbshards.nameserver.ZKClient.createPath(ZKClient.java:406)
>> 
>> I would expect Curator/Zookeeper to try this operation with host2 or host3 after
an error connecting to host1 but this is not the case. I even have a retry loop in my code
that tries the operation 10 times and it fails every time if host1 is in the connect string.
>> 
>> I'm hoping I'm missing something obvious here. Any help would be appreciated.
>> 
>> Thanks,
>> 
>> Andy.
>> 
>> --
>> Andy Grove
>> VP, R&D
>> CodeFutures Corporation
>> 
>> Share Nothing, Shard Everything!
>> http://www.dbshards.com
>> 
>> 
>> 
>> 
> 


Mime
View raw message