drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: Distributed mode troubles: ZK/Curator connection time out
Date Mon, 28 Oct 2013 20:15:34 GMT
I seem to recall issues right now if you're not connected to the internet.
 Can you do some testing to see whether that was the problem you were
having?

Thanks,
Jacques


On Mon, Oct 28, 2013 at 2:42 AM, Michael Hausenblas <
mhausenblas@maprtech.com> wrote:

>
> Interestingly enough now it works. Can it be that due to whatever reasons
> there must be an Internet connection available?. BTW, I’m doing the stuff
> on MacOS 10.9.
>
> $ bin/submit_plan -f sample-data/physical_json_scan_test1.json -t physical
> -zk 127.0.0.1:2181
>
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> | id             | type           | name           | ppu            |
> sales          | batters.batter.id| batters.batter.type| topping.id     |
> topping.type   | filling.id     | filling.type   |
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> | 0001           | donut          | Cake           | 0.55           | 35
>
>
> Still, strangely enough there are errors in submitter.log (that do not
> affect the result, but would love to understand what’s going on here):
>
> [[
>
> 09:37:20.632 [Client-1] DEBUG o.a.d.e.rpc.user.QueryResultHandler -
> Received QueryId part1: 3952191315122866480
> part2: -6119095990164217550
>  succesfully.  Adding listener
> org.apache.drill.exec.client.QuerySubmitter$QueryResultsListener@1d007a1a
> 09:37:27.005 [Client-1] ERROR o.a.d.exec.rpc.RpcExceptionHandler -
> Exception in pipeline.  Closing channel between local /10.109.7.56:63536and remote /
> 10.109.7.56:31012
> io.netty.handler.codec.DecoderException:
> java.lang.IndexOutOfBoundsException
>         at
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:99)
> [netty-codec-4.0.7.Final.jar:na]
>         at
> io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:334)
> [netty-transport-4.0.7.Final.jar:na]
>         at
> io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:320)
> [netty-transport-4.0.7.Final.jar:na]
>         at
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
> [netty-codec-4.0.7.Final.jar:na]
>         at
> io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:334)
> [netty-transport-4.0.7.Final.jar:na]
>         at
> io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:320)
> [netty-transport-4.0.7.Final.jar:na]
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:173)
> [netty-codec-4.0.7.Final.jar:na]
>         at
> io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:334)
> [netty-transport-4.0.7.Final.jar:na]
>         at
> io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:320)
> [netty-transport-4.0.7.Final.jar:na]
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:785)
> [netty-transport-4.0.7.Final.jar:na]
>         at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:100)
> [netty-transport-4.0.7.Final.jar:na]
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:497)
> [netty-transport-4.0.7.Final.jar:na]
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:465)
> [netty-transport-4.0.7.Final.jar:na]
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:359)
> [netty-transport-4.0.7.Final.jar:na]
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
> [netty-common-4.0.7.Final.jar:na]
>         at java.lang.Thread.run(Thread.java:722) [na:1.7.0_11]
> Caused by: java.lang.IndexOutOfBoundsException: null
>         at io.netty.buffer.EmptyByteBuf.checkIndex(EmptyByteBuf.java:857)
> ~[netty-buffer-4.0.7.Final.jar:na]
>         at io.netty.buffer.EmptyByteBuf.getBytes(EmptyByteBuf.java:321)
> ~[netty-buffer-4.0.7.Final.jar:na]
>         at
> org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:240)
> ~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>         at
> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject(VarCharVector.java:257)
> ~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>         at
> org.apache.drill.exec.vector.NullableVarCharVector$Accessor.getObject(NullableVarCharVector.java:244)
> ~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>         at
> org.apache.drill.exec.client.QuerySubmitter$QueryResultsListener.resultArrived(QuerySubmitter.java:103)
> ~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>         at
> org.apache.drill.exec.rpc.user.QueryResultHandler.batchArrived(QueryResultHandler.java:75)
> ~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>         at
> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:79)
> ~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>         at
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:48)
> ~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>         at
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:33)
> ~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>         at
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:142)
> ~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>         at
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:127)
> ~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>         at
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
> [netty-codec-4.0.7.Final.jar:na]
>         ... 15 common frames omitted
>
> ]]
>
>
> Cheers,
>                 Michael
>
> --
> Michael Hausenblas
> Ireland, Europe
> http://mhausenblas.info/
>
> On 27 Oct 2013, at 22:48, Steven Phillips <sphillips@maprtech.com> wrote:
>
> > Actually, I am wrong, Drill does not start a zookeeper when running in
> > local mode. The LocalClusterCoordinator does not use zookeeper at all.
> >
> >
> > On Sun, Oct 27, 2013 at 3:44 PM, Steven Phillips <sphillips@maprtech.com
> >wrote:
> >
> >> Drill will start a zookeeper only in embedded mode. For example, running
> >> sqlline using parquet-local will launch a drillbit and zk all within one
> >> JVM.
> >>
> >> But to run a standalone drillbit requires an external zookeeper.
> >>
> >>
> >> On Sun, Oct 27, 2013 at 3:39 PM, Michael Hausenblas <
> >> michael.hausenblas@gmail.com> wrote:
> >>
> >>>
> >>> Maybe I'm dense but I thought Drill starts a ZK? Or do I have to
> install
> >>> and launch ZK separately?
> >>>
> >>> I'm using the binary version of M1. Run all things local only on my
> >>> laptop ...
> >>>
> >>> Cheers,
> >>>             Michael
> >>>
> >>> Sent from my iPad
> >>>
> >>> --
> >>> Michael Hausenblas, http://mhausenblas.info
> >>>
> >>>> On 27 Oct 2013, at 22:17, Steven Phillips <sphillips@maprtech.com>
> >>> wrote:
> >>>>
> >>>> You need to replace localhost with the hostname of the node running
> >>>> zookeeper. If that zookeeper is configured to use a port different
> than
> >>>> 2181, then that needs to be set as well. If you have multiple
> >>> zookeepers in
> >>>> the quorum, you then zk.connect should be a comma separated list of
> the
> >>>> host:port of each node.
> >>>>
> >>>> The default, localhost setting will only work when a drillbit is
> >>> running on
> >>>> the same node as the zookeeper.
> >>>>
> >>>>
> >>>> On Sun, Oct 27, 2013 at 2:57 PM, Michael Hausenblas <
> >>>> michael.hausenblas@gmail.com> wrote:
> >>>>
> >>>>>
> >>>>>> One thing to add to the diagram is that all of the drill java
> >>> processes
> >>>>> will look at what is in drill-override.conf.
> >>>>>
> >>>>> Thanks, done.
> >>>>>
> >>>>>
> >>>>>> You must set zk.connect to the correct zk host:port.
> >>>>>
> >>>>>
> >>>>> Can you be a tad more explicit, please? In drill-override.conf I
have
> >>>>>
> >>>>> [[
> >>>>> …
> >>>>> zk: {
> >>>>>       connect: "localhost:2181”,
> >>>>> …
> >>>>> ]]
> >>>>>
> >>>>>
> >>>>> What am I overlooking?
> >>>>>
> >>>>> Also, any directions re the rest of my questions (re bin/submit_plan
> >>> etc.)?
> >>>>>
> >>>>>
> >>>>> With a little help from here,  I’m happy to put together the
> >>> description
> >>>>> how to set this up in the Wiki, also to address a query we’ve
now
> lying
> >>>>> around for more than three weeks, by Steve McPherson – see
> >>>>>
> >>>
> http://mail-archives.apache.org/mod_mbox/incubator-drill-user/201310.mbox/%3CCE71A20F.14F5B%25stevemp%40amazon.com%3E–<http://mail-archives.apache.org/mod_mbox/incubator-drill-user/201310.mbox/%3CCE71A20F.14F5B%25stevemp%40amazon.com%3E%E2%80%93>
> <
> http://mail-archives.apache.org/mod_mbox/incubator-drill-user/201310.mbox/%3CCE71A20F.14F5B%25stevemp%40amazon.com%3E%E2%80%93>the
> fact that it attracted 0 responses I find slightly embarrassing, and
> >>>>> if I were Steve, I’d prolly not touch Drill anymore, but let’s
hope
> >>> for the
> >>>>> best …
> >>>>>
> >>>>>
> >>>>> Cheers,
> >>>>>               Michael
> >>>>>
> >>>>> --
> >>>>> Michael Hausenblas
> >>>>> Ireland, Europe
> >>>>> http://mhausenblas.info/
> >>>>>
> >>>>>> On 27 Oct 2013, at 21:32, Steven Phillips <sphillips@maprtech.com>
> >>> wrote:
> >>>>>>
> >>>>>> One thing to add to the diagram is that all of the drill java
> >>> processes
> >>>>>> will look at what is in drill-override.conf. You must set zk.connect
> >>> to
> >>>>> the
> >>>>>> correct zk host:port.
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Oct 27, 2013 at 2:00 PM, Michael Hausenblas <
> >>>>>> michael.hausenblas@gmail.com> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>> Folks,
> >>>>>>>
> >>>>>>> I’m trying to set up Drill in distributed mode. Here’s
what I have
> so
> >>>>> far:
> >>>>>>> when I launch the first Drillbit with bin/drillbit.sh I
get the
> >>>>> following
> >>>>>>> in log/drillbit.out:
> >>>>>>>
> >>>>>>> [[
> >>>>>>> 20:47:20.963 [main] ERROR com.netflix.curator.ConnectionState
-
> >>>>> Connection
> >>>>>>> timed out for connection string (localhost:2181) and timeout
> (5000) /
> >>>>>>> elapsed (5045)
> >>>>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
> >>>>>>> KeeperErrorCode = ConnectionLoss
> >>>>>>>      at
> >>>>>>>
> >>>>>
> >>>
> com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
> >>>>>>> ~[curator-client-1.1.9.jar:na]
> >>>>>>>      at
> >>>>>>>
> >>>>>
> >>>
> com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:106)
> >>>>>>> [curator-client-1.1.9.jar:na]
> >>>>>>>      at
> >>>>>>>
> >>>>>
> >>>
> com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:393)
> >>>>>>> [curator-framework-1.1.9.jar:na]
> >>>>>>>      at
> >>>>>>>
> >>>>>
> >>>
> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:184)
> >>>>>>> [curator-framework-1.1.9.jar:na]
> >>>>>>>      at
> >>>>>>>
> >>>>>
> >>>
> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:173)
> >>>>>>> [curator-framework-1.1.9.jar:na]
> >>>>>>>      at
> >>> com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
> >>>>>>> [curator-client-1.1.9.jar:na]
> >>>>>>>      at
> >>>>>>>
> >>>>>
> >>>
> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:169)
> >>>>>>> [curator-framework-1.1.9.jar:na]
> >>>>>>>      at
> >>>>>>>
> >>>>>
> >>>
> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:161)
> >>>>>>> [curator-framework-1.1.9.jar:na]
> >>>>>>>      at
> >>>>>>>
> >>>>>
> >>>
> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:36)
> >>>>>>> [curator-framework-1.1.9.jar:na]
> >>>>>>>      at
> >>>>>>>
> >>>>>
> >>>
> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.getChildrenWatched(ServiceDiscoveryImpl.java:306)
> >>>>>>> [curator-x-discovery-1.1.9.jar:na]
> >>>>>>>      at
> >>>>>>>
> >>>>>
> >>>
> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.queryForInstances(ServiceDiscoveryImpl.java:276)
> >>>>>>> [curator-x-discovery-1.1.9.jar:na]
> >>>>>>>      at
> >>>>>>>
> >>>>>
> >>>
> com.netflix.curator.x.discovery.details.ServiceCache.refresh(ServiceCache.java:193)
> >>>>>>> [curator-x-discovery-1.1.9.jar:na]
> >>>>>>>      at
> >>>>>>>
> >>>>>
> >>>
> com.netflix.curator.x.discovery.details.ServiceCache.start(ServiceCache.java:116)
> >>>>>>> [curator-x-discovery-1.1.9.jar:na]
> >>>>>>>      at
> >>>>>>>
> >>>>>
> >>>
> org.apache.drill.exec.coord.ZKClusterCoordinator.start(ZKClusterCoordinator.java:89)
> >>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
> >>>>>>>      at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:94)
> >>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
> >>>>>>>      at
> >>> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:56)
> >>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
> >>>>>>>      at
> >>> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:43)
> >>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
> >>>>>>>      at
> org.apache.drill.exec.server.Drillbit.main(Drillbit.java:65)
> >>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
> >>>>>>> ]]
> >>>>>>>
> >>>>>>> This seems to be a known issue? See
> >>>>>>>
> >>>>>
> >>>
> http://stackoverflow.com/questions/16056751/curator-zookeeper-client-keeps-throw-out-connectionlossexception-per-connection
> >>>>>>>
> >>>>>>> Any ideas? Did anyone actually run Drill in distributed
mode
> already
> >>> and
> >>>>>>> if so, how did you overcome the above issue?
> >>>>>>>
> >>>>>>> What is next? How do I make other Drillbits point to the
same ZK
> >>>>> cluster?
> >>>>>>> And has anyone an example of the call parameters for
> bin/submit_plan
> >>>>> maybe
> >>>>>>> as well?
> >>>>>>>
> >>>>>>>
> >>>>>>> BTW, in the process of trying to figure what’s going on
behind the
> >>>>> scene I
> >>>>>>> traced down the startup call dependencies (scripts), available
via:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> https://docs.google.com/drawings/d/1-ADIGJ-lBr-dOrOjMpQlProiZjYjjuM0kR6A81BYwKA/edit?usp=sharing
> >>>>>>>
> >>>>>>> which we could then also use for documentation purposes.
> >>>>>>>
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>>              Michael
> >>>>>>>
> >>>>>>> --
> >>>>>>> Michael Hausenblas
> >>>>>>> Ireland, Europe
> >>>>>>> http://mhausenblas.info/
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message