drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Hausenblas <mhausenb...@maprtech.com>
Subject Re: Distributed mode troubles: ZK/Curator connection time out
Date Mon, 28 Oct 2013 09:26:46 GMT

OK, thanks again re the hints for ZK and how to launch submit_plan. Now I’ve got a 'java.net.SocketException:
Network is unreachable’

Background: I’ve three Drillbits running all connected to ZK:


[zk: 127.0.0.1:2181(CONNECTED) 4] ls /drill/drillbits1
[d2e9c990-1607-48f8-8d99-4a209b312a43, 17bf46c9-23f2-42cc-8d25-cc42b7a599f0, 146c8df4-a62c-41b8-af1f-0f7551867d84]


When I then submit a physical plan:

$ bin/submit_plan -f sample-data/physical_json_scan_test1.json -t physical -zk 127.0.0.1:2181

I get:

[[
Exception in thread "main" org.apache.drill.exec.rpc.RpcException: Failure connecting to server.
Failure of type CONNECTION.
	at org.apache.drill.exec.client.DrillClient$FutureHandler.connectionFailed(DrillClient.java:246)
	at org.apache.drill.exec.rpc.BasicClient$ConnectionMultiListener$ConnectionHandler.operationComplete(BasicClient.java:155)
	at org.apache.drill.exec.rpc.BasicClient$ConnectionMultiListener$ConnectionHandler.operationComplete(BasicClient.java:141)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:621)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:548)
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:407)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:217)
	at io.netty.channel.DefaultChannelPipeline$HeadHandler.connect(DefaultChannelPipeline.java:1008)
	at io.netty.channel.DefaultChannelHandlerContext.invokeConnect(DefaultChannelHandlerContext.java:491)
	at io.netty.channel.DefaultChannelHandlerContext.connect(DefaultChannelHandlerContext.java:476)
	at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
	at io.netty.channel.DefaultChannelHandlerContext.invokeConnect(DefaultChannelHandlerContext.java:491)
	at io.netty.channel.DefaultChannelHandlerContext.connect(DefaultChannelHandlerContext.java:476)
	at io.netty.channel.DefaultChannelHandlerContext.connect(DefaultChannelHandlerContext.java:461)
	at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:847)
	at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:198)
	at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:165)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:354)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:366)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
	at java.lang.Thread.run(Thread.java:722)
Caused by: java.util.concurrent.ExecutionException: java.net.SocketException: Network is unreachable
	at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
	at org.apache.drill.exec.rpc.BasicClient$ConnectionMultiListener$ConnectionHandler.operationComplete(BasicClient.java:147)
	... 19 more
Caused by: java.net.SocketException: Network is unreachable
	at sun.nio.ch.Net.connect0(Native Method)
	at sun.nio.ch.Net.connect(Net.java:364)
	at sun.nio.ch.Net.connect(Net.java:356)
	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:623)
	at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:195)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:172)
	... 14 more

]]

Thoughts?


Cheers,
		Michael

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/

On 27 Oct 2013, at 22:48, Steven Phillips <sphillips@maprtech.com> wrote:

> Actually, I am wrong, Drill does not start a zookeeper when running in
> local mode. The LocalClusterCoordinator does not use zookeeper at all.
> 
> 
> On Sun, Oct 27, 2013 at 3:44 PM, Steven Phillips <sphillips@maprtech.com>wrote:
> 
>> Drill will start a zookeeper only in embedded mode. For example, running
>> sqlline using parquet-local will launch a drillbit and zk all within one
>> JVM.
>> 
>> But to run a standalone drillbit requires an external zookeeper.
>> 
>> 
>> On Sun, Oct 27, 2013 at 3:39 PM, Michael Hausenblas <
>> michael.hausenblas@gmail.com> wrote:
>> 
>>> 
>>> Maybe I'm dense but I thought Drill starts a ZK? Or do I have to install
>>> and launch ZK separately?
>>> 
>>> I'm using the binary version of M1. Run all things local only on my
>>> laptop ...
>>> 
>>> Cheers,
>>>           Michael
>>> 
>>> Sent from my iPad
>>> 
>>> --
>>> Michael Hausenblas, http://mhausenblas.info
>>> 
>>>> On 27 Oct 2013, at 22:17, Steven Phillips <sphillips@maprtech.com>
>>> wrote:
>>>> 
>>>> You need to replace localhost with the hostname of the node running
>>>> zookeeper. If that zookeeper is configured to use a port different than
>>>> 2181, then that needs to be set as well. If you have multiple
>>> zookeepers in
>>>> the quorum, you then zk.connect should be a comma separated list of the
>>>> host:port of each node.
>>>> 
>>>> The default, localhost setting will only work when a drillbit is
>>> running on
>>>> the same node as the zookeeper.
>>>> 
>>>> 
>>>> On Sun, Oct 27, 2013 at 2:57 PM, Michael Hausenblas <
>>>> michael.hausenblas@gmail.com> wrote:
>>>> 
>>>>> 
>>>>>> One thing to add to the diagram is that all of the drill java
>>> processes
>>>>> will look at what is in drill-override.conf.
>>>>> 
>>>>> Thanks, done.
>>>>> 
>>>>> 
>>>>>> You must set zk.connect to the correct zk host:port.
>>>>> 
>>>>> 
>>>>> Can you be a tad more explicit, please? In drill-override.conf I have
>>>>> 
>>>>> [[
>>>>> …
>>>>> zk: {
>>>>>     connect: "localhost:2181”,
>>>>> …
>>>>> ]]
>>>>> 
>>>>> 
>>>>> What am I overlooking?
>>>>> 
>>>>> Also, any directions re the rest of my questions (re bin/submit_plan
>>> etc.)?
>>>>> 
>>>>> 
>>>>> With a little help from here,  I’m happy to put together the
>>> description
>>>>> how to set this up in the Wiki, also to address a query we’ve now lying
>>>>> around for more than three weeks, by Steve McPherson – see
>>>>> 
>>> http://mail-archives.apache.org/mod_mbox/incubator-drill-user/201310.mbox/%3CCE71A20F.14F5B%25stevemp%40amazon.com%3E–<http://mail-archives.apache.org/mod_mbox/incubator-drill-user/201310.mbox/%3CCE71A20F.14F5B%25stevemp%40amazon.com%3E%E2%80%93>the
fact that it attracted 0 responses I find slightly embarrassing, and
>>>>> if I were Steve, I’d prolly not touch Drill anymore, but let’s hope
>>> for the
>>>>> best …
>>>>> 
>>>>> 
>>>>> Cheers,
>>>>>             Michael
>>>>> 
>>>>> --
>>>>> Michael Hausenblas
>>>>> Ireland, Europe
>>>>> http://mhausenblas.info/
>>>>> 
>>>>>> On 27 Oct 2013, at 21:32, Steven Phillips <sphillips@maprtech.com>
>>> wrote:
>>>>>> 
>>>>>> One thing to add to the diagram is that all of the drill java
>>> processes
>>>>>> will look at what is in drill-override.conf. You must set zk.connect
>>> to
>>>>> the
>>>>>> correct zk host:port.
>>>>>> 
>>>>>> 
>>>>>> On Sun, Oct 27, 2013 at 2:00 PM, Michael Hausenblas <
>>>>>> michael.hausenblas@gmail.com> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> Folks,
>>>>>>> 
>>>>>>> I’m trying to set up Drill in distributed mode. Here’s what
I have so
>>>>> far:
>>>>>>> when I launch the first Drillbit with bin/drillbit.sh I get the
>>>>> following
>>>>>>> in log/drillbit.out:
>>>>>>> 
>>>>>>> [[
>>>>>>> 20:47:20.963 [main] ERROR com.netflix.curator.ConnectionState
-
>>>>> Connection
>>>>>>> timed out for connection string (localhost:2181) and timeout
(5000) /
>>>>>>> elapsed (5045)
>>>>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>>>>> KeeperErrorCode = ConnectionLoss
>>>>>>>    at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
>>>>>>> ~[curator-client-1.1.9.jar:na]
>>>>>>>    at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:106)
>>>>>>> [curator-client-1.1.9.jar:na]
>>>>>>>    at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:393)
>>>>>>> [curator-framework-1.1.9.jar:na]
>>>>>>>    at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:184)
>>>>>>> [curator-framework-1.1.9.jar:na]
>>>>>>>    at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:173)
>>>>>>> [curator-framework-1.1.9.jar:na]
>>>>>>>    at
>>> com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
>>>>>>> [curator-client-1.1.9.jar:na]
>>>>>>>    at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:169)
>>>>>>> [curator-framework-1.1.9.jar:na]
>>>>>>>    at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:161)
>>>>>>> [curator-framework-1.1.9.jar:na]
>>>>>>>    at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:36)
>>>>>>> [curator-framework-1.1.9.jar:na]
>>>>>>>    at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.getChildrenWatched(ServiceDiscoveryImpl.java:306)
>>>>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>>>>    at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.queryForInstances(ServiceDiscoveryImpl.java:276)
>>>>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>>>>    at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.x.discovery.details.ServiceCache.refresh(ServiceCache.java:193)
>>>>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>>>>    at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.x.discovery.details.ServiceCache.start(ServiceCache.java:116)
>>>>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>>>>    at
>>>>>>> 
>>>>> 
>>> org.apache.drill.exec.coord.ZKClusterCoordinator.start(ZKClusterCoordinator.java:89)
>>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>>>>    at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:94)
>>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>>>>    at
>>> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:56)
>>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>>>>    at
>>> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:43)
>>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>>>>    at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:65)
>>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>>>> ]]
>>>>>>> 
>>>>>>> This seems to be a known issue? See
>>>>>>> 
>>>>> 
>>> http://stackoverflow.com/questions/16056751/curator-zookeeper-client-keeps-throw-out-connectionlossexception-per-connection
>>>>>>> 
>>>>>>> Any ideas? Did anyone actually run Drill in distributed mode
already
>>> and
>>>>>>> if so, how did you overcome the above issue?
>>>>>>> 
>>>>>>> What is next? How do I make other Drillbits point to the same
ZK
>>>>> cluster?
>>>>>>> And has anyone an example of the call parameters for bin/submit_plan
>>>>> maybe
>>>>>>> as well?
>>>>>>> 
>>>>>>> 
>>>>>>> BTW, in the process of trying to figure what’s going on behind
the
>>>>> scene I
>>>>>>> traced down the startup call dependencies (scripts), available
via:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://docs.google.com/drawings/d/1-ADIGJ-lBr-dOrOjMpQlProiZjYjjuM0kR6A81BYwKA/edit?usp=sharing
>>>>>>> 
>>>>>>> which we could then also use for documentation purposes.
>>>>>>> 
>>>>>>> 
>>>>>>> Cheers,
>>>>>>>            Michael
>>>>>>> 
>>>>>>> --
>>>>>>> Michael Hausenblas
>>>>>>> Ireland, Europe
>>>>>>> http://mhausenblas.info/
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
>> 


Mime
View raw message