Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5737D200C2B for ; Thu, 16 Feb 2017 06:16:50 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 55DE8160B5E; Thu, 16 Feb 2017 05:16:50 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 53F90160B70 for ; Thu, 16 Feb 2017 06:16:49 +0100 (CET) Received: (qmail 17500 invoked by uid 500); 16 Feb 2017 05:16:48 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 17490 invoked by uid 99); 16 Feb 2017 05:16:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Feb 2017 05:16:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id E7FC8C05D7 for ; Thu, 16 Feb 2017 05:16:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.998 X-Spam-Level: X-Spam-Status: No, score=-1.998 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Baw-3T0lUHMh for ; Thu, 16 Feb 2017 05:16:45 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E929E5FE55 for ; Thu, 16 Feb 2017 05:16:44 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id E8DE9E07E1 for ; Thu, 16 Feb 2017 05:16:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1A4DF2413E for ; Thu, 16 Feb 2017 05:16:42 +0000 (UTC) Date: Thu, 16 Feb 2017 05:16:42 +0000 (UTC) From: "Khurram Faraaz (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Closed] (DRILL-3751) Query hang when zookeeper is stopped MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 16 Feb 2017 05:16:50 -0000 [ https://issues.apache.org/jira/browse/DRILL-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz closed DRILL-3751. --------------------------------- > Query hang when zookeeper is stopped > ------------------------------------ > > Key: DRILL-3751 > URL: https://issues.apache.org/jira/browse/DRILL-3751 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Affects Versions: 1.2.0 > Environment: 4 node cluster on CentOS > Reporter: Khurram Faraaz > Priority: Critical > Fix For: Future > > > I see an indefinite hang on sqlline prompt, issue a long running query and then stop zookeeper process when the query is still being executed. Sqlline prompt is never returned and it hangs showing the below stack trace. I am on master. > Steps to reproduce the problem > clush -g khurram service mapr-warden stop > clush -g khurram service mapr-warden start > Issue long running query from sqlline > While query is running, stop zookeeper using script. > To stop zookeeper > {code} > [root@centos-01 bin]# ./zkServer.sh stop > JMX enabled by default > Using config: /opt/mapr/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg > Stopping zookeeper ... STOPPED > {code} > Issue below long running query from sqlline > {code} > ./sqlline -u "jdbc:drill:schema=dfs.tmp" > 0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 8000000; > ... > | 7.40907649723E8 | g | > | 1.12378007695E9 | d | > 03:03:28.482 [CuratorFramework-0] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (10.10.100.201:5181) and timeout (5000) / elapsed (5013) > org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss > at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) [curator-client-2.5.0.jar:na] > at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [curator-client-2.5.0.jar:na] > at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) [curator-client-2.5.0.jar:na] > at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:807) [curator-framework-2.5.0.jar:na] > at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:793) [curator-framework-2.5.0.jar:na] > at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57) [curator-framework-2.5.0.jar:na] > at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275) [curator-framework-2.5.0.jar:na] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_45] > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > {code} > Here is the stack for sqlline process > {code} > [root@centos-01 bin]# /usr/java/jdk1.7.0_45/bin/jstack 32136 > 2015-09-05 03:21:52 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed mode): > "Attach Listener" daemon prio=10 tid=0x00007f8328003800 nid=0x27f1 waiting on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > "CuratorFramework-0-EventThread" daemon prio=10 tid=0x00000000012fd800 nid=0x26e1 waiting on condition [0x00007f8317c2e000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000007e2117798> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491) > "CuratorFramework-0-SendThread(centos-01.qa.lab:5181)" daemon prio=10 tid=0x0000000001109800 nid=0x26e0 waiting on condition [0x00007f8317b2d000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86) > at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:937) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:995) > "threadDeathWatcher-2-1" daemon prio=10 tid=0x00007f833043b800 nid=0x7e16 waiting on condition [0x00007f831751f000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at io.netty.util.ThreadDeathWatcher$Watcher.run(ThreadDeathWatcher.java:137) > at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) > at java.lang.Thread.run(Thread.java:744) > "Client-1" daemon prio=10 tid=0x00007f8378df7000 nid=0x7e15 runnable [0x00007f8317620000] > java.lang.Thread.State: RUNNABLE > at io.netty.channel.epoll.Native.epollWait0(Native Method) > at io.netty.channel.epoll.Native.epollWait(Native.java:148) > at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:180) > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:205) > at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:744) > "ServiceCache-0" daemon prio=10 tid=0x00007f8378d22000 nid=0x7e13 waiting on condition [0x00007f831792b000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000006fff9c658> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > "CuratorFramework-0" daemon prio=10 tid=0x00007f8378c95800 nid=0x7e12 waiting on condition [0x00007f8317a2c000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000006fff9ebd0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) > at java.util.concurrent.DelayQueue.take(DelayQueue.java:220) > at java.util.concurrent.DelayQueue.take(DelayQueue.java:68) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:781) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57) > at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > "ConnectionStateManager-0" daemon prio=10 tid=0x00007f8378c60800 nid=0x7e0f waiting on condition [0x00007f8317d2f000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000006fffb2288> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374) > at org.apache.curator.framework.state.ConnectionStateManager.processEvents(ConnectionStateManager.java:208) > at org.apache.curator.framework.state.ConnectionStateManager.access$000(ConnectionStateManager.java:42) > at org.apache.curator.framework.state.ConnectionStateManager$1.call(ConnectionStateManager.java:110) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > "NonBlockingInputStreamThread" daemon prio=10 tid=0x00007f8378836000 nid=0x7de0 in Object.wait() [0x00007f83186ab000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00000006fffb2438> (a jline.internal.NonBlockingInputStream) > at jline.internal.NonBlockingInputStream.run(NonBlockingInputStream.java:278) > - locked <0x00000006fffb2438> (a jline.internal.NonBlockingInputStream) > at java.lang.Thread.run(Thread.java:744) > "Service Thread" daemon prio=10 tid=0x00007f83780c1000 nid=0x7dcd runnable [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > "C2 CompilerThread1" daemon prio=10 tid=0x00007f83780be800 nid=0x7dcc waiting on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > "C2 CompilerThread0" daemon prio=10 tid=0x00007f83780bb800 nid=0x7dcb waiting on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > "Signal Dispatcher" daemon prio=10 tid=0x00007f83780b1800 nid=0x7dca runnable [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > "Finalizer" daemon prio=10 tid=0x00007f837809a800 nid=0x7dc9 in Object.wait() [0x00007f832c574000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00000006fffb2668> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) > - locked <0x00000006fffb2668> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) > at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189) > "Reference Handler" daemon prio=10 tid=0x00007f8378091000 nid=0x7dc8 in Object.wait() [0x00007f832c675000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00000006fffb2700> (a java.lang.ref.Reference$Lock) > at java.lang.Object.wait(Object.java:503) > at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) > - locked <0x00000006fffb2700> (a java.lang.ref.Reference$Lock) > "main" prio=10 tid=0x00007f8378011000 nid=0x7db4 waiting on condition [0x00007f837cac2000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0000000700d3a210> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) > at java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:519) > at java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:682) > at org.apache.drill.jdbc.impl.DrillResultSetImpl$ResultsListener.getNext(DrillResultSetImpl.java:1536) > at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:175) > at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:320) > at net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187) > at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:161) > at sqlline.IncrementalRows.hasNext(IncrementalRows.java:62) > at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) > at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) > at sqlline.SqlLine.print(SqlLine.java:1583) > at sqlline.Commands.execute(Commands.java:852) > at sqlline.Commands.sql(Commands.java:751) > at sqlline.SqlLine.dispatch(SqlLine.java:738) > at sqlline.SqlLine.begin(SqlLine.java:612) > at sqlline.SqlLine.start(SqlLine.java:366) > at sqlline.SqlLine.main(SqlLine.java:259) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)