incubator-drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Hausenblas <michael.hausenb...@gmail.com>
Subject Distributed mode troubles: ZK/Curator connection time out
Date Sun, 27 Oct 2013 21:00:18 GMT

Folks,

I’m trying to set up Drill in distributed mode. Here’s what I have so far: when I launch
the first Drillbit with bin/drillbit.sh I get the following in log/drillbit.out:

[[
20:47:20.963 [main] ERROR com.netflix.curator.ConnectionState - Connection timed out for connection
string (localhost:2181) and timeout (5000) / elapsed (5045)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
	at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94) ~[curator-client-1.1.9.jar:na]
	at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:106)
[curator-client-1.1.9.jar:na]
	at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:393)
[curator-framework-1.1.9.jar:na]
	at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:184)
[curator-framework-1.1.9.jar:na]
	at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:173)
[curator-framework-1.1.9.jar:na]
	at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85) [curator-client-1.1.9.jar:na]
	at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:169)
[curator-framework-1.1.9.jar:na]
	at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:161)
[curator-framework-1.1.9.jar:na]
	at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:36)
[curator-framework-1.1.9.jar:na]
	at com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.getChildrenWatched(ServiceDiscoveryImpl.java:306)
[curator-x-discovery-1.1.9.jar:na]
	at com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.queryForInstances(ServiceDiscoveryImpl.java:276)
[curator-x-discovery-1.1.9.jar:na]
	at com.netflix.curator.x.discovery.details.ServiceCache.refresh(ServiceCache.java:193) [curator-x-discovery-1.1.9.jar:na]
	at com.netflix.curator.x.discovery.details.ServiceCache.start(ServiceCache.java:116) [curator-x-discovery-1.1.9.jar:na]
	at org.apache.drill.exec.coord.ZKClusterCoordinator.start(ZKClusterCoordinator.java:89) [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
	at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:94) [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
	at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:56) [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
	at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:43) [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
	at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:65) [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
]]

This seems to be a known issue? See http://stackoverflow.com/questions/16056751/curator-zookeeper-client-keeps-throw-out-connectionlossexception-per-connection

Any ideas? Did anyone actually run Drill in distributed mode already and if so, how did you
overcome the above issue?

What is next? How do I make other Drillbits point to the same ZK cluster? And has anyone an
example of the call parameters for bin/submit_plan maybe as well?


BTW, in the process of trying to figure what’s going on behind the scene I traced down the
startup call dependencies (scripts), available via:

  https://docs.google.com/drawings/d/1-ADIGJ-lBr-dOrOjMpQlProiZjYjjuM0kR6A81BYwKA/edit?usp=sharing

which we could then also use for documentation purposes.


Cheers,
		Michael

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/


Mime
View raw message