drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Chen <tnac...@gmail.com>
Subject Re: Distributed mode troubles: ZK/Curator connection time out
Date Sun, 27 Oct 2013 21:50:20 GMT
In my AWS deployment I don't see this problem yet, but like Steven said I override drill conf
to point zk host and port.

Tim

Sent from my iPhone

> On Oct 27, 2013, at 2:32 PM, Steven Phillips <sphillips@maprtech.com> wrote:
> 
> One thing to add to the diagram is that all of the drill java processes
> will look at what is in drill-override.conf. You must set zk.connect to the
> correct zk host:port.
> 
> 
> On Sun, Oct 27, 2013 at 2:00 PM, Michael Hausenblas <
> michael.hausenblas@gmail.com> wrote:
> 
>> 
>> Folks,
>> 
>> I’m trying to set up Drill in distributed mode. Here’s what I have so far:
>> when I launch the first Drillbit with bin/drillbit.sh I get the following
>> in log/drillbit.out:
>> 
>> [[
>> 20:47:20.963 [main] ERROR com.netflix.curator.ConnectionState - Connection
>> timed out for connection string (localhost:2181) and timeout (5000) /
>> elapsed (5045)
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss
>>        at
>> com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
>> ~[curator-client-1.1.9.jar:na]
>>        at
>> com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:106)
>> [curator-client-1.1.9.jar:na]
>>        at
>> com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:393)
>> [curator-framework-1.1.9.jar:na]
>>        at
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:184)
>> [curator-framework-1.1.9.jar:na]
>>        at
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:173)
>> [curator-framework-1.1.9.jar:na]
>>        at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
>> [curator-client-1.1.9.jar:na]
>>        at
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:169)
>> [curator-framework-1.1.9.jar:na]
>>        at
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:161)
>> [curator-framework-1.1.9.jar:na]
>>        at
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:36)
>> [curator-framework-1.1.9.jar:na]
>>        at
>> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.getChildrenWatched(ServiceDiscoveryImpl.java:306)
>> [curator-x-discovery-1.1.9.jar:na]
>>        at
>> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.queryForInstances(ServiceDiscoveryImpl.java:276)
>> [curator-x-discovery-1.1.9.jar:na]
>>        at
>> com.netflix.curator.x.discovery.details.ServiceCache.refresh(ServiceCache.java:193)
>> [curator-x-discovery-1.1.9.jar:na]
>>        at
>> com.netflix.curator.x.discovery.details.ServiceCache.start(ServiceCache.java:116)
>> [curator-x-discovery-1.1.9.jar:na]
>>        at
>> org.apache.drill.exec.coord.ZKClusterCoordinator.start(ZKClusterCoordinator.java:89)
>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>        at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:94)
>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>        at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:56)
>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>        at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:43)
>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>        at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:65)
>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>> ]]
>> 
>> This seems to be a known issue? See
>> http://stackoverflow.com/questions/16056751/curator-zookeeper-client-keeps-throw-out-connectionlossexception-per-connection
>> 
>> Any ideas? Did anyone actually run Drill in distributed mode already and
>> if so, how did you overcome the above issue?
>> 
>> What is next? How do I make other Drillbits point to the same ZK cluster?
>> And has anyone an example of the call parameters for bin/submit_plan maybe
>> as well?
>> 
>> 
>> BTW, in the process of trying to figure what’s going on behind the scene I
>> traced down the startup call dependencies (scripts), available via:
>> 
>> 
>> https://docs.google.com/drawings/d/1-ADIGJ-lBr-dOrOjMpQlProiZjYjjuM0kR6A81BYwKA/edit?usp=sharing
>> 
>> which we could then also use for documentation purposes.
>> 
>> 
>> Cheers,
>>                Michael
>> 
>> --
>> Michael Hausenblas
>> Ireland, Europe
>> http://mhausenblas.info/
>> 
>> 

Mime
View raw message