Return-Path: X-Original-To: apmail-incubator-drill-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-drill-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E43E710254 for ; Sun, 27 Oct 2013 22:12:28 +0000 (UTC) Received: (qmail 23314 invoked by uid 500); 27 Oct 2013 22:12:27 -0000 Delivered-To: apmail-incubator-drill-user-archive@incubator.apache.org Received: (qmail 23262 invoked by uid 500); 27 Oct 2013 22:12:27 -0000 Mailing-List: contact drill-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: drill-user@incubator.apache.org Delivered-To: mailing list drill-user@incubator.apache.org Delivered-To: moderator for drill-user@incubator.apache.org Received: (qmail 86033 invoked by uid 99); 27 Oct 2013 21:33:04 -0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sphillips@maprtech.com designates 209.85.212.177 as permitted sender) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=p751HA0HRmkM26elPfh4l4gQvrJ67Qbtyqczj+8SJ+8=; b=Ii5l8mhTtCJhspPnZPgFHaqbYGhOYdhwwRx3g/BZxEUIjpdBQp40diK+iJAeXzY+PG g2w1iWNcl0N4IqyQANMlBqIDMz0wmeTbfoMwq+iKrNskQ5+ZVggVGfBYEwEIxSjEo9eQ GVWBD+OA0SvZEFzYKYRCdNGd/i3pn+8Zr9TBxsPxobkViCByz9c3trziigrGPYGh9Ytl hMm5QgyDJuH1iqu06TrfyOgTdeMabZBmVOesYJQbnhGYQKVjAfVcGZGJf3tac5RRGcb+ JI/gsJWKENEZitf/xp2g28sTSSqIliGx61LJ3g6SnzwxV0rE6B5LmMEXwz8U+n6TZ5u3 02PA== X-Gm-Message-State: ALoCoQl5TqrtIExbwCVwhCNO2QdgpTXeQq8AYMn3tPE6X6/9faEOLPruY1zjrsBAUk9skDyfwHlO MIME-Version: 1.0 X-Received: by 10.180.185.10 with SMTP id ey10mr6410557wic.29.1382909558104; Sun, 27 Oct 2013 14:32:38 -0700 (PDT) In-Reply-To: <93C17963-04F8-41B6-B2D1-F90473F9DB90@gmail.com> References: <93C17963-04F8-41B6-B2D1-F90473F9DB90@gmail.com> Date: Sun, 27 Oct 2013 14:32:38 -0700 Message-ID: Subject: Re: Distributed mode troubles: ZK/Curator connection time out From: Steven Phillips To: drill-dev@incubator.apache.org Cc: Apache Drill User Content-Type: multipart/alternative; boundary=001a11c3657ef7bca804e9bfb87e X-Virus-Checked: Checked by ClamAV on apache.org --001a11c3657ef7bca804e9bfb87e Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable One thing to add to the diagram is that all of the drill java processes will look at what is in drill-override.conf. You must set zk.connect to the correct zk host:port. On Sun, Oct 27, 2013 at 2:00 PM, Michael Hausenblas < michael.hausenblas@gmail.com> wrote: > > Folks, > > I=92m trying to set up Drill in distributed mode. Here=92s what I have so= far: > when I launch the first Drillbit with bin/drillbit.sh I get the following > in log/drillbit.out: > > [[ > 20:47:20.963 [main] ERROR com.netflix.curator.ConnectionState - Connectio= n > timed out for connection string (localhost:2181) and timeout (5000) / > elapsed (5045) > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode =3D ConnectionLoss > at > com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94) > ~[curator-client-1.1.9.jar:na] > at > com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperC= lient.java:106) > [curator-client-1.1.9.jar:na] > at > com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(Cura= torFrameworkImpl.java:393) > [curator-framework-1.1.9.jar:na] > at > com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChild= renBuilderImpl.java:184) > [curator-framework-1.1.9.jar:na] > at > com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChild= renBuilderImpl.java:173) > [curator-framework-1.1.9.jar:na] > at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85) > [curator-client-1.1.9.jar:na] > at > com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForegroun= d(GetChildrenBuilderImpl.java:169) > [curator-framework-1.1.9.jar:na] > at > com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChil= drenBuilderImpl.java:161) > [curator-framework-1.1.9.jar:na] > at > com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChil= drenBuilderImpl.java:36) > [curator-framework-1.1.9.jar:na] > at > com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.getChildrenW= atched(ServiceDiscoveryImpl.java:306) > [curator-x-discovery-1.1.9.jar:na] > at > com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.queryForInst= ances(ServiceDiscoveryImpl.java:276) > [curator-x-discovery-1.1.9.jar:na] > at > com.netflix.curator.x.discovery.details.ServiceCache.refresh(ServiceCache= .java:193) > [curator-x-discovery-1.1.9.jar:na] > at > com.netflix.curator.x.discovery.details.ServiceCache.start(ServiceCache.j= ava:116) > [curator-x-discovery-1.1.9.jar:na] > at > org.apache.drill.exec.coord.ZKClusterCoordinator.start(ZKClusterCoordinat= or.java:89) > [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1] > at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:94) > [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1] > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:56) > [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1] > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:43) > [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1] > at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:65) > [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1] > ]] > > This seems to be a known issue? See > http://stackoverflow.com/questions/16056751/curator-zookeeper-client-keep= s-throw-out-connectionlossexception-per-connection > > Any ideas? Did anyone actually run Drill in distributed mode already and > if so, how did you overcome the above issue? > > What is next? How do I make other Drillbits point to the same ZK cluster? > And has anyone an example of the call parameters for bin/submit_plan mayb= e > as well? > > > BTW, in the process of trying to figure what=92s going on behind the scen= e I > traced down the startup call dependencies (scripts), available via: > > > https://docs.google.com/drawings/d/1-ADIGJ-lBr-dOrOjMpQlProiZjYjjuM0kR6A8= 1BYwKA/edit?usp=3Dsharing > > which we could then also use for documentation purposes. > > > Cheers, > Michael > > -- > Michael Hausenblas > Ireland, Europe > http://mhausenblas.info/ > > --001a11c3657ef7bca804e9bfb87e--