drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hanifi Gunes <hgu...@maprtech.com>
Subject Re: Threads left after Drillbit shutdown (in dev./unit tests)
Date Sat, 11 Jul 2015 00:56:23 GMT
Is there any way to re-produce this at a smaller scale? Have you tried
failing a couple of tests and dumping threads?

-Hanifi
Thanks

On Fri, Jul 10, 2015 at 1:10 PM, Daniel Barclay <dbarclay@maprtech.com>
wrote:

> Is Drill terminating threads correctly?
>
> In running jstack on a JVM running a dev. test run that ended up hung
> after getting about three test timeout errors, I see that there are
> 409 threads.
>
> Although 138 of those are not-unexpected ShutdownHook threads (since
> many tests are run in one VM), there are:
> - 138 "WorkManager.StatusThread" threads (hmm.... 138 again)
> -   7 "Client-1" threads
> -   4 "UserServer-1" threads
> -  21 "BitClient-1" threads
> -   4 "BitClient-2" threads
> -   3 "BitClient-3" threads
> -   8 "BitServer-1" threads
> -   8 "BitServer-2" threads
> -   7 "BitServer-3" threads
> -   7 "BitServer-4" threads
> -   7 "BitServer-5" threads
> -   6 "BitServer-6" threads
> -   6 "BitServer-7" threads
> -   6 "BitServer-8" threads
> -   5 "BitServer-9" threads
> -   5 "BitServer-10" threads
> (Other thread names have only 1 or 2 occurrences.)
>
> Regarding the 4 for the number of "UserServer-1" threads:  Three test
> methods had timeout failures plus one got hung.
>
>
> Here's the tail end of the output from the test running, including
> all the timeout errors and including the hang (except for repeated
> query-results data lines).
>
>
>
> dbarclay@dev-linux2 ~/work/git/incubator-drill $ time mvn install
>
> <TRIMMED>
>
> Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
> Running
> org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeOneEntryRun
> Running
> org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#twoBitOneExchangeTwoEntryRun
> Running
> org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRun
> Running
> org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRunLogical
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.117 sec
> - in org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
> Running org.apache.drill.exec.physical.impl.TestBroadcastExchange
> Running
> org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestSingleBroadcastExchangeWithTwoScans
> 00:44:34.017 [globalEventExecutor-1-523] ERROR
> o.a.z.server.NIOServerCnxnFactory - Thread
> Thread[globalEventExecutor-1-523,5,main] died
> java.lang.AssertionError: null
>         at
> io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(AbstractScheduledEventExecutor.java:83)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue(GlobalEventExecutor.java:110)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.util.concurrent.GlobalEventExecutor.takeTask(GlobalEventExecutor.java:95)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:226)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_72]
> Running
> org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestMultipleSendLocationBroadcastExchange
> 10000
> Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 111.599
> sec <<< FAILURE! - in
> org.apache.drill.exec.physical.impl.TestBroadcastExchange
> TestSingleBroadcastExchangeWithTwoScans(org.apache.drill.exec.physical.impl.TestBroadcastExchange)
> Time elapsed: 50.063 sec  <<< ERROR!
> java.lang.Exception: test timed out after 50000 milliseconds
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:503)
>         at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
>         at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
>         at
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
>         at
> org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
>         at com.google.common.io.Closeables.close(Closeables.java:77)
>         at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
>         at
> org.apache.drill.exec.rpc.data.DataConnectionCreator.close(DataConnectionCreator.java:70)
>         at com.google.common.io.Closeables.close(Closeables.java:77)
>         at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
>         at
> org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:88)
>         at com.google.common.io.Closeables.close(Closeables.java:77)
>         at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
>         at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
>         at
> org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestSingleBroadcastExchangeWithTwoScans(TestBroadcastExchange.java:62)
>
> TestMultipleSendLocationBroadcastExchange(org.apache.drill.exec.physical.impl.TestBroadcastExchange)
> Time elapsed: 50.014 sec  <<< ERROR!
> java.lang.Exception: test timed out after 50000 milliseconds
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:503)
>         at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
>         at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
>         at
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
>         at
> org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
>         at
> org.apache.drill.exec.rpc.user.UserServer.close(UserServer.java:283)
>         at com.google.common.io.Closeables.close(Closeables.java:77)
>         at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
>         at
> org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:87)
>         at com.google.common.io.Closeables.close(Closeables.java:77)
>         at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
>         at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
>         at
> org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestMultipleSendLocationBroadcastExchange(TestBroadcastExchange.java:88)
>
> Running
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender
> Running
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender#testPartitionSenderCostToThreads
> Running
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender#testAlgorithm
> ok      summary
> true    planner.slice_target updated.
> Total rows returned : 1.  Returned in 38ms.
> Jul 10, 2015 12:47:20 AM
> org.apache.calcite.sql.validate.SqlValidatorException <init>
> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table
> 'dfs./home/dbarclay/work/git/incubator-drill/exec/java-exec/target/junit5218680774082947123/junit6147831434075535799'
> not found
> Jul 10, 2015 12:47:20 AM org.apache.calcite.runtime.CalciteException <init>
> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1,
> column 64 to line 1, column 66: Table
> 'dfs./home/dbarclay/work/git/incubator-drill/exec/java-exec/target/junit5218680774082947123/junit6147831434075535799'
> not found
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 63.904 sec
> <<< FAILURE! - in
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender
> testPartitionSenderCostToThreads(org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender)
> Time elapsed: 50.023 sec  <<< ERROR!
> java.lang.Exception: test timed out after 50000 milliseconds
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:503)
>         at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
>         at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
>         at
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
>         at
> org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
>         at
> org.apache.drill.exec.rpc.user.UserServer.close(UserServer.java:283)
>         at com.google.common.io.Closeables.close(Closeables.java:77)
>         at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
>         at
> org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:87)
>         at com.google.common.io.Closeables.close(Closeables.java:77)
>         at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
>         at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
>         at
> org.apache.drill.BaseTestQuery.closeClient(BaseTestQuery.java:239)
>         at
> org.apache.drill.BaseTestQuery.updateTestCluster(BaseTestQuery.java:126)
>         at
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender.testPartitionSenderCostToThreads(TestPartitionSender.java:154)
>
> Running org.apache.drill.exec.physical.impl.xsort.TestSimpleExternalSort
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec - in
> org.apache.drill.exec.physical.impl.xsort.TestSimpleExternalSort
> Running org.apache.drill.exec.physical.impl.svremover.TestSVRemover
> Running
> org.apache.drill.exec.physical.impl.svremover.TestSVRemover#testSelectionVectorRemoval
> blue    red     green
> 2147483647        9223372036854775807     2147483647
> 2147483647        9223372036854775807     2147483647
>
> <TRIMMED>
>
> 2147483647        9223372036854775807     2147483647
> Total rows returned : 50.  Returned in 181ms.
> Running
> org.apache.drill.exec.physical.impl.svremover.TestSVRemover#testSVRWithNoFilter
> blue    red     green
> true    -9223372036854775808    -2147483648
> false   9223372036854775807     null
>
> <TRIMMED>
>
> true    -9223372036854775808    -2147483648
> false   9223372036854775807     null
> Total rows returned : 100.  Returned in 54ms.
>   C-c C-c
> real    708m38.962s
> user    11m3.332s
> sys     1m8.068s
> [130]dbarclay@dev-linux2 ~/work/git/incubator-drill $
>
>
>
>
>
> Daniel
> --
> Daniel Barclay
> MapR Technologies
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message