drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-1804) random failures while running large number of queries
Date Mon, 29 Dec 2014 17:52:13 GMT

    [ https://issues.apache.org/jira/browse/DRILL-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260265#comment-14260265
] 

Jacques Nadeau commented on DRILL-1804:
---------------------------------------

I saw this behavior once before.  I believe that we are hitting a condition where we are trying
to create the node twice for the same query id.  I'm guessing it is some kind of failure state
issue in Foreman.

> random failures while running large number of queries
> -----------------------------------------------------
>
>                 Key: DRILL-1804
>                 URL: https://issues.apache.org/jira/browse/DRILL-1804
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.7.0
>            Reporter: Chun Chang
>            Assignee: Chris Westin
>             Fix For: 0.8.0
>
>
> #Tue Dec 02 14:38:34 EST 2014
> git.commit.id.abbrev=757e9a2
> Running Mondrian regression tests, out of over 6000 queries, sometimes I get one or two
random failures. Here is the stack when it happens:
> 2014-12-02 17:49:32,271 [2b8193d3-f0ca-aa7c-094a-d8234d76d068:foreman] ERROR o.a.drill.exec.work.foreman.Foreman
- Error aeae057b-ed0a-43aa-902d-fe3a41531511: Query failed: Unexpected exception during fragment
initialization.
> org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment
initialization.
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:194) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
>   at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254)
[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_45]
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_45]
>   at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: java.lang.RuntimeException: Failure while accessing Zookeeper. Failure while
accessing Zookeeper
>   at org.apache.drill.exec.store.sys.zk.ZkAbstractStore.put(ZkAbstractStore.java:111)
~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
>   at org.apache.drill.exec.work.foreman.QueryStatus.updateQueryStateInStore(QueryStatus.java:132)
~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
>   at org.apache.drill.exec.work.foreman.Foreman.recordNewState(Foreman.java:502) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
>   at org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:396) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
>   at org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:311) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
>   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:510) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:185) [drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
>   ... 4 common frames omitted
> Caused by: java.lang.RuntimeException: Failure while accessing Zookeeper
>   at org.apache.drill.exec.store.sys.zk.ZkEStore.createNodeInZK(ZkEStore.java:53) ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
>   at org.apache.drill.exec.store.sys.zk.ZkAbstractStore.put(ZkAbstractStore.java:106)
~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
>   ... 10 common frames omitted
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode
= NodeExists for /drill/running/2b8193d3-f0ca-aa7c-094a-d8234d76d068
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:119) ~[zookeeper-3.4.5-mapr-1406.jar:3.4.5-mapr-1406--1]
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.5-mapr-1406.jar:3.4.5-mapr-1406--1]
>   at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) ~[zookeeper-3.4.5-mapr-1406.jar:3.4.5-mapr-1406--1]
>   at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:676)
~[curator-framework-2.5.0.jar:na]
>   at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:660)
~[curator-framework-2.5.0.jar:na]
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[curator-client-2.5.0.jar:na]
>   at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:656)
~[curator-framework-2.5.0.jar:na]
>   at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:441)
~[curator-framework-2.5.0.jar:na]
>   at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:431)
~[curator-framework-2.5.0.jar:na]
>   at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
~[curator-framework-2.5.0.jar:na]
>   at org.apache.drill.exec.store.sys.zk.ZkEStore.createNodeInZK(ZkEStore.java:51) ~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]
>   ... 11 common frames omitted
> 2014-12-02 17:49:32,287 [2b8193d3-f0ca-aa7c-094a-d8234d76d068:frag:0:0] WARN  o.a.d.e.p.impl.SendingAccountor
- Failure while waiting for send complete.
> java.lang.InterruptedException: null
>   at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1301)
~[na:1.7.0_45]
>   at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) ~[na:1.7.0_45]
>   at org.apache.drill.exec.physical.impl.SendingAccountor.waitForSendComplete(SendingAccountor.java:44)
~[drill-java-exec-0.7.0-SNAPSHOT-rebuffed.jar:0.7.0-SNAPSHOT]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message