drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Hausenblas <michael.hausenb...@gmail.com>
Subject Re: Distributed mode troubles: ZK/Curator connection time out
Date Sun, 27 Oct 2013 22:39:52 GMT

Maybe I'm dense but I thought Drill starts a ZK? Or do I have to install and launch ZK separately?

I'm using the binary version of M1. Run all things local only on my laptop ...

Cheers,
             Michael

Sent from my iPad

--
Michael Hausenblas, http://mhausenblas.info

> On 27 Oct 2013, at 22:17, Steven Phillips <sphillips@maprtech.com> wrote:
> 
> You need to replace localhost with the hostname of the node running
> zookeeper. If that zookeeper is configured to use a port different than
> 2181, then that needs to be set as well. If you have multiple zookeepers in
> the quorum, you then zk.connect should be a comma separated list of the
> host:port of each node.
> 
> The default, localhost setting will only work when a drillbit is running on
> the same node as the zookeeper.
> 
> 
> On Sun, Oct 27, 2013 at 2:57 PM, Michael Hausenblas <
> michael.hausenblas@gmail.com> wrote:
> 
>> 
>>> One thing to add to the diagram is that all of the drill java processes
>> will look at what is in drill-override.conf.
>> 
>> Thanks, done.
>> 
>> 
>>> You must set zk.connect to the correct zk host:port.
>> 
>> 
>> Can you be a tad more explicit, please? In drill-override.conf I have
>> 
>> [[
>> …
>> zk: {
>>        connect: "localhost:2181”,
>> …
>> ]]
>> 
>> 
>> What am I overlooking?
>> 
>> Also, any directions re the rest of my questions (re bin/submit_plan etc.)?
>> 
>> 
>> With a little help from here,  I’m happy to put together the description
>> how to set this up in the Wiki, also to address a query we’ve now lying
>> around for more than three weeks, by Steve McPherson – see
>> http://mail-archives.apache.org/mod_mbox/incubator-drill-user/201310.mbox/%3CCE71A20F.14F5B%25stevemp%40amazon.com%3E–
the fact that it attracted 0 responses I find slightly embarrassing, and
>> if I were Steve, I’d prolly not touch Drill anymore, but let’s hope for the
>> best …
>> 
>> 
>> Cheers,
>>                Michael
>> 
>> --
>> Michael Hausenblas
>> Ireland, Europe
>> http://mhausenblas.info/
>> 
>>> On 27 Oct 2013, at 21:32, Steven Phillips <sphillips@maprtech.com> wrote:
>>> 
>>> One thing to add to the diagram is that all of the drill java processes
>>> will look at what is in drill-override.conf. You must set zk.connect to
>> the
>>> correct zk host:port.
>>> 
>>> 
>>> On Sun, Oct 27, 2013 at 2:00 PM, Michael Hausenblas <
>>> michael.hausenblas@gmail.com> wrote:
>>> 
>>>> 
>>>> Folks,
>>>> 
>>>> I’m trying to set up Drill in distributed mode. Here’s what I have so
>> far:
>>>> when I launch the first Drillbit with bin/drillbit.sh I get the
>> following
>>>> in log/drillbit.out:
>>>> 
>>>> [[
>>>> 20:47:20.963 [main] ERROR com.netflix.curator.ConnectionState -
>> Connection
>>>> timed out for connection string (localhost:2181) and timeout (5000) /
>>>> elapsed (5045)
>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>> KeeperErrorCode = ConnectionLoss
>>>>       at
>>>> 
>> com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
>>>> ~[curator-client-1.1.9.jar:na]
>>>>       at
>>>> 
>> com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:106)
>>>> [curator-client-1.1.9.jar:na]
>>>>       at
>>>> 
>> com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:393)
>>>> [curator-framework-1.1.9.jar:na]
>>>>       at
>>>> 
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:184)
>>>> [curator-framework-1.1.9.jar:na]
>>>>       at
>>>> 
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:173)
>>>> [curator-framework-1.1.9.jar:na]
>>>>       at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
>>>> [curator-client-1.1.9.jar:na]
>>>>       at
>>>> 
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:169)
>>>> [curator-framework-1.1.9.jar:na]
>>>>       at
>>>> 
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:161)
>>>> [curator-framework-1.1.9.jar:na]
>>>>       at
>>>> 
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:36)
>>>> [curator-framework-1.1.9.jar:na]
>>>>       at
>>>> 
>> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.getChildrenWatched(ServiceDiscoveryImpl.java:306)
>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>       at
>>>> 
>> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.queryForInstances(ServiceDiscoveryImpl.java:276)
>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>       at
>>>> 
>> com.netflix.curator.x.discovery.details.ServiceCache.refresh(ServiceCache.java:193)
>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>       at
>>>> 
>> com.netflix.curator.x.discovery.details.ServiceCache.start(ServiceCache.java:116)
>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>       at
>>>> 
>> org.apache.drill.exec.coord.ZKClusterCoordinator.start(ZKClusterCoordinator.java:89)
>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>       at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:94)
>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>       at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:56)
>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>       at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:43)
>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>       at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:65)
>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>> ]]
>>>> 
>>>> This seems to be a known issue? See
>>>> 
>> http://stackoverflow.com/questions/16056751/curator-zookeeper-client-keeps-throw-out-connectionlossexception-per-connection
>>>> 
>>>> Any ideas? Did anyone actually run Drill in distributed mode already and
>>>> if so, how did you overcome the above issue?
>>>> 
>>>> What is next? How do I make other Drillbits point to the same ZK
>> cluster?
>>>> And has anyone an example of the call parameters for bin/submit_plan
>> maybe
>>>> as well?
>>>> 
>>>> 
>>>> BTW, in the process of trying to figure what’s going on behind the
>> scene I
>>>> traced down the startup call dependencies (scripts), available via:
>>>> 
>>>> 
>>>> 
>> https://docs.google.com/drawings/d/1-ADIGJ-lBr-dOrOjMpQlProiZjYjjuM0kR6A81BYwKA/edit?usp=sharing
>>>> 
>>>> which we could then also use for documentation purposes.
>>>> 
>>>> 
>>>> Cheers,
>>>>               Michael
>>>> 
>>>> --
>>>> Michael Hausenblas
>>>> Ireland, Europe
>>>> http://mhausenblas.info/
>>>> 
>>>> 
>> 
>> 

Mime
View raw message