drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Phillips <sphill...@maprtech.com>
Subject Re: Distributed mode troubles: ZK/Curator connection time out
Date Sun, 27 Oct 2013 22:48:40 GMT
Actually, I am wrong, Drill does not start a zookeeper when running in
local mode. The LocalClusterCoordinator does not use zookeeper at all.


On Sun, Oct 27, 2013 at 3:44 PM, Steven Phillips <sphillips@maprtech.com>wrote:

> Drill will start a zookeeper only in embedded mode. For example, running
> sqlline using parquet-local will launch a drillbit and zk all within one
> JVM.
>
> But to run a standalone drillbit requires an external zookeeper.
>
>
> On Sun, Oct 27, 2013 at 3:39 PM, Michael Hausenblas <
> michael.hausenblas@gmail.com> wrote:
>
>>
>> Maybe I'm dense but I thought Drill starts a ZK? Or do I have to install
>> and launch ZK separately?
>>
>> I'm using the binary version of M1. Run all things local only on my
>> laptop ...
>>
>> Cheers,
>>              Michael
>>
>> Sent from my iPad
>>
>> --
>> Michael Hausenblas, http://mhausenblas.info
>>
>> > On 27 Oct 2013, at 22:17, Steven Phillips <sphillips@maprtech.com>
>> wrote:
>> >
>> > You need to replace localhost with the hostname of the node running
>> > zookeeper. If that zookeeper is configured to use a port different than
>> > 2181, then that needs to be set as well. If you have multiple
>> zookeepers in
>> > the quorum, you then zk.connect should be a comma separated list of the
>> > host:port of each node.
>> >
>> > The default, localhost setting will only work when a drillbit is
>> running on
>> > the same node as the zookeeper.
>> >
>> >
>> > On Sun, Oct 27, 2013 at 2:57 PM, Michael Hausenblas <
>> > michael.hausenblas@gmail.com> wrote:
>> >
>> >>
>> >>> One thing to add to the diagram is that all of the drill java
>> processes
>> >> will look at what is in drill-override.conf.
>> >>
>> >> Thanks, done.
>> >>
>> >>
>> >>> You must set zk.connect to the correct zk host:port.
>> >>
>> >>
>> >> Can you be a tad more explicit, please? In drill-override.conf I have
>> >>
>> >> [[
>> >> …
>> >> zk: {
>> >>        connect: "localhost:2181”,
>> >> …
>> >> ]]
>> >>
>> >>
>> >> What am I overlooking?
>> >>
>> >> Also, any directions re the rest of my questions (re bin/submit_plan
>> etc.)?
>> >>
>> >>
>> >> With a little help from here,  I’m happy to put together the
>> description
>> >> how to set this up in the Wiki, also to address a query we’ve now lying
>> >> around for more than three weeks, by Steve McPherson – see
>> >>
>> http://mail-archives.apache.org/mod_mbox/incubator-drill-user/201310.mbox/%3CCE71A20F.14F5B%25stevemp%40amazon.com%3E–<http://mail-archives.apache.org/mod_mbox/incubator-drill-user/201310.mbox/%3CCE71A20F.14F5B%25stevemp%40amazon.com%3E%E2%80%93>the
fact that it attracted 0 responses I find slightly embarrassing, and
>> >> if I were Steve, I’d prolly not touch Drill anymore, but let’s hope
>> for the
>> >> best …
>> >>
>> >>
>> >> Cheers,
>> >>                Michael
>> >>
>> >> --
>> >> Michael Hausenblas
>> >> Ireland, Europe
>> >> http://mhausenblas.info/
>> >>
>> >>> On 27 Oct 2013, at 21:32, Steven Phillips <sphillips@maprtech.com>
>> wrote:
>> >>>
>> >>> One thing to add to the diagram is that all of the drill java
>> processes
>> >>> will look at what is in drill-override.conf. You must set zk.connect
>> to
>> >> the
>> >>> correct zk host:port.
>> >>>
>> >>>
>> >>> On Sun, Oct 27, 2013 at 2:00 PM, Michael Hausenblas <
>> >>> michael.hausenblas@gmail.com> wrote:
>> >>>
>> >>>>
>> >>>> Folks,
>> >>>>
>> >>>> I’m trying to set up Drill in distributed mode. Here’s what
I have so
>> >> far:
>> >>>> when I launch the first Drillbit with bin/drillbit.sh I get the
>> >> following
>> >>>> in log/drillbit.out:
>> >>>>
>> >>>> [[
>> >>>> 20:47:20.963 [main] ERROR com.netflix.curator.ConnectionState -
>> >> Connection
>> >>>> timed out for connection string (localhost:2181) and timeout (5000)
/
>> >>>> elapsed (5045)
>> >>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> >>>> KeeperErrorCode = ConnectionLoss
>> >>>>       at
>> >>>>
>> >>
>> com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
>> >>>> ~[curator-client-1.1.9.jar:na]
>> >>>>       at
>> >>>>
>> >>
>> com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:106)
>> >>>> [curator-client-1.1.9.jar:na]
>> >>>>       at
>> >>>>
>> >>
>> com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:393)
>> >>>> [curator-framework-1.1.9.jar:na]
>> >>>>       at
>> >>>>
>> >>
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:184)
>> >>>> [curator-framework-1.1.9.jar:na]
>> >>>>       at
>> >>>>
>> >>
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:173)
>> >>>> [curator-framework-1.1.9.jar:na]
>> >>>>       at
>> com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
>> >>>> [curator-client-1.1.9.jar:na]
>> >>>>       at
>> >>>>
>> >>
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:169)
>> >>>> [curator-framework-1.1.9.jar:na]
>> >>>>       at
>> >>>>
>> >>
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:161)
>> >>>> [curator-framework-1.1.9.jar:na]
>> >>>>       at
>> >>>>
>> >>
>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:36)
>> >>>> [curator-framework-1.1.9.jar:na]
>> >>>>       at
>> >>>>
>> >>
>> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.getChildrenWatched(ServiceDiscoveryImpl.java:306)
>> >>>> [curator-x-discovery-1.1.9.jar:na]
>> >>>>       at
>> >>>>
>> >>
>> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.queryForInstances(ServiceDiscoveryImpl.java:276)
>> >>>> [curator-x-discovery-1.1.9.jar:na]
>> >>>>       at
>> >>>>
>> >>
>> com.netflix.curator.x.discovery.details.ServiceCache.refresh(ServiceCache.java:193)
>> >>>> [curator-x-discovery-1.1.9.jar:na]
>> >>>>       at
>> >>>>
>> >>
>> com.netflix.curator.x.discovery.details.ServiceCache.start(ServiceCache.java:116)
>> >>>> [curator-x-discovery-1.1.9.jar:na]
>> >>>>       at
>> >>>>
>> >>
>> org.apache.drill.exec.coord.ZKClusterCoordinator.start(ZKClusterCoordinator.java:89)
>> >>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>> >>>>       at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:94)
>> >>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>> >>>>       at
>> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:56)
>> >>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>> >>>>       at
>> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:43)
>> >>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>> >>>>       at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:65)
>> >>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>> >>>> ]]
>> >>>>
>> >>>> This seems to be a known issue? See
>> >>>>
>> >>
>> http://stackoverflow.com/questions/16056751/curator-zookeeper-client-keeps-throw-out-connectionlossexception-per-connection
>> >>>>
>> >>>> Any ideas? Did anyone actually run Drill in distributed mode already
>> and
>> >>>> if so, how did you overcome the above issue?
>> >>>>
>> >>>> What is next? How do I make other Drillbits point to the same ZK
>> >> cluster?
>> >>>> And has anyone an example of the call parameters for bin/submit_plan
>> >> maybe
>> >>>> as well?
>> >>>>
>> >>>>
>> >>>> BTW, in the process of trying to figure what’s going on behind
the
>> >> scene I
>> >>>> traced down the startup call dependencies (scripts), available via:
>> >>>>
>> >>>>
>> >>>>
>> >>
>> https://docs.google.com/drawings/d/1-ADIGJ-lBr-dOrOjMpQlProiZjYjjuM0kR6A81BYwKA/edit?usp=sharing
>> >>>>
>> >>>> which we could then also use for documentation purposes.
>> >>>>
>> >>>>
>> >>>> Cheers,
>> >>>>               Michael
>> >>>>
>> >>>> --
>> >>>> Michael Hausenblas
>> >>>> Ireland, Europe
>> >>>> http://mhausenblas.info/
>> >>>>
>> >>>>
>> >>
>> >>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message