incubator-drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Hausenblas <michael.hausenb...@gmail.com>
Subject Re: Distributed mode troubles: ZK/Curator connection time out
Date Sun, 27 Oct 2013 22:42:45 GMT
Perfect, thank you very much Steven! Will give it a try first thing tomorrow morning.

Cheers,
             Michael

Sent from my iPad

--
Michael Hausenblas, http://mhausenblas.info

> On 27 Oct 2013, at 22:35, Steven Phillips <sphillips@maprtech.com> wrote:
> 
> As for the submit_plan tool, unfortunately at the time of the M1 release,
> the tool was not fully baked. So it does not have very good parsing
> capabilities. But as you wrote in your diagram, it takes three parameters,
> a file, a type, and a zk connect string. The file is a path on the local
> filesystem to a file that contains either a logical or a physical plan, in
> json format. The type should be either physical or logical, and this
> corresponds to the type of plan that the file parameter points to. And the
> zk connect string is the zk connect string for the cluster you want to
> submit to.
> 
> for example:
> 
> submit_plan /plans/plan1.json physical 1.1.1.1:5181,2.2.2.2:5181 will use
> the zk quorum of servers running on 1.1.1.1 and 2.2.2.2 to find and connect
> to a drillbit, and then submit the physical plan contained in
> /plans/plan1.json to the drillbit. It will then print results to the screen
> as they come in.
> 
> If you build drill off of master, the submit_plan tool has been fixed
> somewhat:
> 
> $ bin/submit_plan -h
> 
> Usage: ./submit_plan [options]
>  Options:
>    -h, -help, --help
>       show usage
>       Default: false
>    -bits
>       number of drillbits to run. local mode only
>       Default: 1
>  * -f
>       file containing plan
>    -local
>       run query in local mode
>       Default: false
>  * -t
>       type of plan, logical/physical
>    -zk
>       zookeeper connect string.
>       Default: localhost:2181
> 
> 
> 
> On Sun, Oct 27, 2013 at 3:17 PM, Steven Phillips <sphillips@maprtech.com>wrote:
> 
>> You need to replace localhost with the hostname of the node running
>> zookeeper. If that zookeeper is configured to use a port different than
>> 2181, then that needs to be set as well. If you have multiple zookeepers in
>> the quorum, you then zk.connect should be a comma separated list of the
>> host:port of each node.
>> 
>> The default, localhost setting will only work when a drillbit is running
>> on the same node as the zookeeper.
>> 
>> 
>> On Sun, Oct 27, 2013 at 2:57 PM, Michael Hausenblas <
>> michael.hausenblas@gmail.com> wrote:
>> 
>>> 
>>>> One thing to add to the diagram is that all of the drill java processes
>>> will look at what is in drill-override.conf.
>>> 
>>> Thanks, done.
>>> 
>>> 
>>>> You must set zk.connect to the correct zk host:port.
>>> 
>>> 
>>> Can you be a tad more explicit, please? In drill-override.conf I have
>>> 
>>> [[
>>> …
>>> zk: {
>>>        connect: "localhost:2181”,
>>> …
>>> ]]
>>> 
>>> 
>>> What am I overlooking?
>>> 
>>> Also, any directions re the rest of my questions (re bin/submit_plan
>>> etc.)?
>>> 
>>> 
>>> With a little help from here,  I’m happy to put together the description
>>> how to set this up in the Wiki, also to address a query we’ve now lying
>>> around for more than three weeks, by Steve McPherson – see
>>> http://mail-archives.apache.org/mod_mbox/incubator-drill-user/201310.mbox/%3CCE71A20F.14F5B%25stevemp%40amazon.com%3E–
the fact that it attracted 0 responses I find slightly embarrassing, and
>>> if I were Steve, I’d prolly not touch Drill anymore, but let’s hope for the
>>> best …
>>> 
>>> 
>>> Cheers,
>>>                Michael
>>> 
>>> --
>>> Michael Hausenblas
>>> Ireland, Europe
>>> http://mhausenblas.info/
>>> 
>>>> On 27 Oct 2013, at 21:32, Steven Phillips <sphillips@maprtech.com>
wrote:
>>>> 
>>>> One thing to add to the diagram is that all of the drill java processes
>>>> will look at what is in drill-override.conf. You must set zk.connect to
>>> the
>>>> correct zk host:port.
>>>> 
>>>> 
>>>> On Sun, Oct 27, 2013 at 2:00 PM, Michael Hausenblas <
>>>> michael.hausenblas@gmail.com> wrote:
>>>> 
>>>>> 
>>>>> Folks,
>>>>> 
>>>>> I’m trying to set up Drill in distributed mode. Here’s what I have
so
>>> far:
>>>>> when I launch the first Drillbit with bin/drillbit.sh I get the
>>> following
>>>>> in log/drillbit.out:
>>>>> 
>>>>> [[
>>>>> 20:47:20.963 [main] ERROR com.netflix.curator.ConnectionState -
>>> Connection
>>>>> timed out for connection string (localhost:2181) and timeout (5000) /
>>>>> elapsed (5045)
>>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>>> KeeperErrorCode = ConnectionLoss
>>>>>       at
>>> com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
>>>>> ~[curator-client-1.1.9.jar:na]
>>>>>       at
>>> com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:106)
>>>>> [curator-client-1.1.9.jar:na]
>>>>>       at
>>> com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:393)
>>>>> [curator-framework-1.1.9.jar:na]
>>>>>       at
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:184)
>>>>> [curator-framework-1.1.9.jar:na]
>>>>>       at
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:173)
>>>>> [curator-framework-1.1.9.jar:na]
>>>>>       at
>>> com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
>>>>> [curator-client-1.1.9.jar:na]
>>>>>       at
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:169)
>>>>> [curator-framework-1.1.9.jar:na]
>>>>>       at
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:161)
>>>>> [curator-framework-1.1.9.jar:na]
>>>>>       at
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:36)
>>>>> [curator-framework-1.1.9.jar:na]
>>>>>       at
>>> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.getChildrenWatched(ServiceDiscoveryImpl.java:306)
>>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>>       at
>>> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.queryForInstances(ServiceDiscoveryImpl.java:276)
>>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>>       at
>>> com.netflix.curator.x.discovery.details.ServiceCache.refresh(ServiceCache.java:193)
>>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>>       at
>>> com.netflix.curator.x.discovery.details.ServiceCache.start(ServiceCache.java:116)
>>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>>       at
>>> org.apache.drill.exec.coord.ZKClusterCoordinator.start(ZKClusterCoordinator.java:89)
>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>>       at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:94)
>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>>       at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:56)
>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>>       at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:43)
>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>>       at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:65)
>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>> ]]
>>>>> 
>>>>> This seems to be a known issue? See
>>> http://stackoverflow.com/questions/16056751/curator-zookeeper-client-keeps-throw-out-connectionlossexception-per-connection
>>>>> 
>>>>> Any ideas? Did anyone actually run Drill in distributed mode already
>>> and
>>>>> if so, how did you overcome the above issue?
>>>>> 
>>>>> What is next? How do I make other Drillbits point to the same ZK
>>> cluster?
>>>>> And has anyone an example of the call parameters for bin/submit_plan
>>> maybe
>>>>> as well?
>>>>> 
>>>>> 
>>>>> BTW, in the process of trying to figure what’s going on behind the
>>> scene I
>>>>> traced down the startup call dependencies (scripts), available via:
>>> https://docs.google.com/drawings/d/1-ADIGJ-lBr-dOrOjMpQlProiZjYjjuM0kR6A81BYwKA/edit?usp=sharing
>>>>> 
>>>>> which we could then also use for documentation purposes.
>>>>> 
>>>>> 
>>>>> Cheers,
>>>>>               Michael
>>>>> 
>>>>> --
>>>>> Michael Hausenblas
>>>>> Ireland, Europe
>>>>> http://mhausenblas.info/
>> 

Mime
View raw message