Return-Path: X-Original-To: apmail-incubator-drill-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-drill-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EBBAB17679 for ; Thu, 23 Oct 2014 00:01:52 +0000 (UTC) Received: (qmail 92381 invoked by uid 500); 23 Oct 2014 00:01:52 -0000 Delivered-To: apmail-incubator-drill-user-archive@incubator.apache.org Received: (qmail 92323 invoked by uid 500); 23 Oct 2014 00:01:52 -0000 Mailing-List: contact drill-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: drill-user@incubator.apache.org Delivered-To: mailing list drill-user@incubator.apache.org Delivered-To: moderator for drill-user@incubator.apache.org Received: (qmail 82405 invoked by uid 99); 22 Oct 2014 16:05:18 -0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) From: Chris Drawater To: "drill-user@incubator.apache.org" Subject: RE: Distributed SQL on Drill question Thread-Topic: Distributed SQL on Drill question Thread-Index: Ac/t2wADnypApvAQQ52El4hyG8JDxAANVlXA Date: Wed, 22 Oct 2014 16:04:42 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-originating-ip: [10.44.20.40] Content-Type: multipart/related; boundary="_004_D36FF52CC334C44983BA4349D08953AB85B94A50EUEXMB01dsjdsun_"; type="multipart/alternative" MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.12.52,1.0.28,0.0.0000 definitions=2014-10-22_06:2014-10-22,2014-10-22,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1410220146 X-Virus-Checked: Checked by ClamAV on apache.org --_004_D36FF52CC334C44983BA4349D08953AB85B94A50EUEXMB01dsjdsun_ Content-Type: multipart/alternative; boundary="_000_D36FF52CC334C44983BA4349D08953AB85B94A50EUEXMB01dsjdsun_" --_000_D36FF52CC334C44983BA4349D08953AB85B94A50EUEXMB01dsjdsun_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable We have now resolved our JDBC and ODBC connection issues .... It turns out that although throughout we used IP addresses in all our confi= gurations ... somewhere down the line the IP Name of the DrillBit started t= o be used (in red below)... 16:42:21.834 [main-SendThread(10.44.18.101:2181)] DEBUG org.apache.zookeepe= r.ClientCnxn - Reading reply sessionid:0x1493348d1c30005, packet:: clientPa= th:null serverPath:null finishe d:false header:: 9,4 replyHeader:: 9,316,0 request:: '/drill/drillbits1/1= b8d393d-9802-4e0b-9115-b60e3154829a,F response:: #a2431623864333933642d393= 830322d346530622d393131352d6236 3065333135343832396110fffffff1fffffff0ffffffa4ffffff9affffff93291a14a664726= 96c6c3110ffffffa2fffffff2118ffffffa3fffffff2120ffffffa4fffffff21,s{308,308,= 1413904677021,1413904677021,0,0 ,0,92661655187685376,67,0,308} 16:42:22.006 [main] DEBUG i.n.c.MultithreadEventLoopGroup - -Dio.netty.even= tLoopThreads: 8 16:42:22.021 [main] DEBUG io.netty.channel.nio.NioEventLoop - -Dio.netty.no= KeySetOptimization: false 16:42:22.021 [main] DEBUG io.netty.channel.nio.NioEventLoop - -Dio.netty.se= lectorAutoRebuildThreshold: 512 16:42:22.053 [main] DEBUG o.a.drill.exec.client.DrillClient - Connecting to= server drill1:31010 16:42:24.767 [main] DEBUG i.n.util.internal.ThreadLocalRandom - -Dio.netty.= initialSeedUniquifier: 0x2430146dd102a22c 16:42:24.783 [main] DEBUG i.n.channel.ChannelOutboundBuffer - -Dio.netty.th= readLocalDirectBufferSize: 65536 16:42:24.783 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.maxC= apacity.default: 262144 16:42:24.798 [main] DEBUG io.netty.buffer.ByteBufUtil - -Dio.netty.allocato= r.type: unpooled 16:42:24.814 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerato= r - Generated: io.netty.util.internal.__matchers__.io.netty.buffer.ByteBufM= atcher 16:42:24.814 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerato= r - Generated: io.netty.util.internal.__matchers__.org.apache.drill.exec.rp= c.OutboundRpcMessageMatcher 16:42:24.829 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerato= r - Generated: io.netty.util.internal.__matchers__.org.apache.drill.exec.rp= c.InboundRpcMessageMatcher 16:42:24.845 [Client-1] INFO o.a.drill.exec.rpc.user.UserClient - Channel = closed between local null and remote null SQLException : SQL state: null java.sql.SQLException: Failure while attempt= ing to connect to Drill. ErrorCode: 0 The resolution (for our test purposes) was to simply to pop an entry in = C:\Windows\System32\drivers\etc\hosts. But I'd still appreciate the answers to my questions --> =B7 Would we expect to be able to run a distr SQL query by connecti= ng via JDBC direct to specific drillbit ? Seems Not ? =B7 Would we expect to be able to run a distr SQL query by connecti= ng via ODBC direct to specific drillbit ? =B7 Would we expect to be able to run a distr SQL query by connecti= ng via JDBC via a zookeeper quorum connection ? =B7 Would we expect to be able to run a distr SQL query by connecti= ng via ODBC via a zookeeper quorum connection ? =B7 How can we identify the use of multiple nodes due to a distribu= ted SQL query via explain plan output or the JSON QEP ? Thanks, Chris From: Chris Drawater Sent: 22 October 2014 10:39 To: 'drill-user@incubator.apache.org' Subject: Distributed SQL on Drill question Hi, We have started to evaluate Apache Drill. Using multiple nodes/VMs we wish to stream JSON data (via Apache Storm) t= o persistent local filesystem (with a consistent dir structure so we can d= efine various storage plugins) and then use Drill to run distributed SQL qu= eries across these JSON files. If at all possible we don't wish to install Hadoop HDFS/Hbase/Hive. SQL execution would be hopefully be via ODBC and/or JDBC. So far , embedded drill experiments work just great but the data processed = is local to the sqlline. Using SQL against JSON works really well. Unfortunately, experiments to run distributed SQL queries have (so far) no= t been successful. We have tried =B7 a 3 node VM based system with a 3 node zookeeper quorum + 1 dr= illbit per node =B7 a 3 node VM based system with a single zookeeper instance (cov= ering all 3 nodes) + again 1 drillbit per node and 'select * from sys.drillbits' from sqlline outputs all 3 drill bits in = both case. Likewise, zookeeper confirms the existence of the drill clust= er. But we've not managed to run a distributed query. We have tested against both the 0.5 release and a 0.6 build on 64 bit Ubunt= u 14.04. So far, we can only connect via ODBC to a specific drillbit. We cannot get the JDBC driver (using squirrel) to work. Also ODBC to a zookeeper quorum doesn't appear to work. But we have teste= d client access using telnet IP_ADDR 2181 and that's OK. So my questions are : =B7 Would we expect to be able to run a distr SQL query by connecti= ng via JDBC direct to specific drillbit ? =B7 Would we expect to be able to run a distr SQL query by connecti= ng via ODBC direct to specific drillbit ? =B7 Would we expect to be able to run a distr SQL query by connecti= ng via JDBC via a zookeeper quorum connection ? =B7 Would we expect to be able to run a distr SQL query by connecti= ng via ODBC via a zookeeper quorum connection ? =B7 How can we identify the use of multiple nodes due to a distribu= ted SQL query via explain plan output or the JSON QEP ? =B7 Any ideas or issues why we can't connect via the a zookeeper qu= orum connection ? Any help or insights you can give would be most appreciated. Thanks. Chris Chris Drawater Database Architect [Description: AriesoA-JDSU-Mobility-Solution_logo 300px wide] Office +44 1635 232470 | Fax +44 1635 232471 Email chris.drawater@jdsu.com | Web www.a= rieso.com --_000_D36FF52CC334C44983BA4349D08953AB85B94A50EUEXMB01dsjdsun_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

We have now resolved o= ur  JDBC and ODBC connection issues ….   

It turns out that alth= ough throughout we used IP addresses in all our configurations … some= where down the line the IP Name of the DrillBit started to be used (in red = below)…

 

16:42:21.834 [main-SendThread(10.44.18.101:21= 81)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply sessionid:0x1493= 348d1c30005, packet:: clientPath:null serverPath:null finishe

d:false header:: 9,4  replyHeader:: 9,31= 6,0  request:: '/drill/drillbits1/1b8d393d-9802-4e0b-9115-b60e3154829a= ,F  response:: #a2431623864333933642d393830322d346530622d393131352d623= 6

3065333135343832396110fffffff1fffffff0ffffffa= 4ffffff9affffff93291a14a66472696c6c3110ffffffa2fffffff2118ffffffa3fffffff21= 20ffffffa4fffffff21,s{308,308,1413904677021,1413904677021,0,0

,0,92661655187685376,67,0,308}

16:42:22.006 [main] DEBUG i.n.c.MultithreadEv= entLoopGroup - -Dio.netty.eventLoopThreads: 8

16:42:22.021 [main] DEBUG io.netty.channel.ni= o.NioEventLoop - -Dio.netty.noKeySetOptimization: false

16:42:22.021 [main] DEBUG io.netty.channel.ni= o.NioEventLoop - -Dio.netty.selectorAutoRebuildThreshold: 512

16:42:22.053 [main] DEBUG o.a.drill.exec.clie= nt.DrillClient - Connecting to server drill1:31010

16:42:24.767 [main] DEBUG i.n.util.internal.T= hreadLocalRandom - -Dio.netty.initialSeedUniquifier: 0x2430146dd102a22c

16:42:24.783 [main] DEBUG i.n.channel.Channel= OutboundBuffer - -Dio.netty.threadLocalDirectBufferSize: 65536

16:42:24.783 [main] DEBUG io.netty.util.Recyc= ler - -Dio.netty.recycler.maxCapacity.default: 262144

16:42:24.798 [main] DEBUG io.netty.buffer.Byt= eBufUtil - -Dio.netty.allocator.type: unpooled

16:42:24.814 [Client-1] DEBUG i.n.u.i.Javassi= stTypeParameterMatcherGenerator - Generated: io.netty.util.internal.__match= ers__.io.netty.buffer.ByteBufMatcher

16:42:24.814 [Client-1] DEBUG i.n.u.i.Javassi= stTypeParameterMatcherGenerator - Generated: io.netty.util.internal.__match= ers__.org.apache.drill.exec.rpc.OutboundRpcMessageMatcher=

16:42:24.829 [Client-1] DEBUG i.n.u.i.Javassi= stTypeParameterMatcherGenerator - Generated: io.netty.util.internal.__match= ers__.org.apache.drill.exec.rpc.InboundRpcMessageMatcher<= /p>

16:42:24.845 [Client-1] INFO  o.a.drill.= exec.rpc.user.UserClient - Channel closed between local null and remote nul= l

SQLException : SQL state: null java.sql.SQLEx= ception: Failure while attempting to connect to Drill. ErrorCode: 0

 

The resolution (for ou= r test purposes) was to simply   to pop an  entry in  &= nbsp;C:\Windows\System32\drivers\etc\hosts.

 

But I’d still ap= preciate the answers to my questions =E0

 

=B7         Would we expect to be able to run a distr SQ= L query by connecting  via JDBC direct to specific drillbit ?  Seems Not ?

=B7         Would we expect to be able to run a distr SQ= L query by connecting  via ODBC direct to specific drillbit ?

=B7         Would we expect to be able to run a distr SQ= L query by connecting  via JDBC via a zookeeper quorum connection ?

=B7         Would we expect to be able to run a distr SQ= L query by connecting  via ODBC via a zookeeper quorum connection ?

=B7         How can we identify the use of multiple node= s due to a distributed  SQL query    via explain plan o= utput or the JSON QEP ?

 

Thanks,

    Chr= is

 

 

From:= Chris Drawater
Sent: 22 October 2014 10:39
To: 'drill-user@incubator.apache.org'
Subject: Distributed SQL on Drill question

 

 

Hi,

 

We have started to evaluate Apache Drill.=

 

Using  multiple nodes/VMs  we wish to stre= am JSON data (via Apache Storm) to  persistent local filesystem (with = a consistent dir structure so we can define various storage plugins) and th= en use Drill to run distributed SQL queries across these JSON files.

If at all possible we don't wish to install Hadoop H= DFS/Hbase/Hive.

 

 

SQL execution would be hopefully be via ODBC and/or = JDBC.

 

So far , embedded drill experiments work just great = but the data processed is local to the sqlline.

Using SQL against JSON works really well.

 

Unfortunately,  experiments to run distributed = SQL queries have (so far) not been successful. 

We have tried

=B7         a 3 node VM based system  with a 3 node= zookeeper quorum + 1 drillbit per node

=B7         a 3 node VM based system  with a single= zookeeper instance (covering all 3 nodes)   + again 1 drillb= it per node

and 'select * from sys.drillbits' from sqlline outpu= ts all 3 drill bits in both  case.   Likewise, zookeeper = confirms the existence of the drill cluster.

But we’ve not managed to run a distributed que= ry.

 

We have tested against both the 0.5 release and a 0.= 6 build on 64 bit Ubuntu 14.04. 

 

So far, we can only connect via ODBC to a specific d= rillbit. 

We cannot get the JDBC driver (using squirrel) to wo= rk.

Also ODBC to a zookeeper quorum  doesn't appear= to work.  But we have tested client access using telnet IP_ADDR 2181  and that's= OK.

 

So my questions are :

 

=B7         Would we expect to be able to run a distr SQ= L query by connecting  via JDBC direct to specific drillbit ?

=B7         Would we expect to be able to run a distr SQ= L query by connecting  via ODBC direct to specific drillbit ?

=B7         Would we expect to be able to run a distr SQ= L query by connecting  via JDBC via a zookeeper quorum connection ?

=B7         Would we expect to be able to run a distr SQ= L query by connecting  via ODBC via a zookeeper quorum connection ?

=B7         How can we identify the use of multiple node= s due to a distributed  SQL query    via explain plan o= utput or the JSON QEP ?

=B7         Any ideas or issues why we can't connect via= the a zookeeper quorum connection ?

 

Any help or insights you can give would be most appr= eciated.

 

Thanks.

   Chris

 

Chris Drawater

Database Architect

<= /b> &n= bsp;

Office +44 1635 232470  |  Fax +44= 1635 232471

Email chris.drawater@jdsu.com=   |  Web www.arieso.com<= /a>=

 

--_000_D36FF52CC334C44983BA4349D08953AB85B94A50EUEXMB01dsjdsun_-- --_004_D36FF52CC334C44983BA4349D08953AB85B94A50EUEXMB01dsjdsun_--