reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergiy Matusevych (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1753) Intermittent failures of CloseEvaluatorTest in local runtime
Date Wed, 22 Mar 2017 00:17:42 GMT

    [ https://issues.apache.org/jira/browse/REEF-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935571#comment-15935571
] 

Sergiy Matusevych commented on REEF-1753:
-----------------------------------------

I also see lots of threads still running at the end of the Driver process:
{code:none|title=driver.stderr}
2017-03-21 16:52:07,711 FINEST reef.util.ThreadLogger.logThreads main | Threads running after
REEFEnvironment.close():
	TG main THREAD EvaluatorMessageDispatcher:Node-1-1490140314328:pool-09:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-1-1490140314612:pool-13:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-1-1490140314870:pool-17:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-1-1490140315108:pool-21:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-2-1490140314159:pool-08:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-2-1490140314512:pool-12:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-2-1490140314783:pool-16:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-2-1490140315059:pool-20:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-3-1490140314461:pool-11:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-3-1490140314735:pool-15:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-3-1490140314976:pool-19:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-3-1490140315195:pool-23:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-4-1490140314399:pool-10:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-4-1490140314671:pool-14:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-4-1490140314902:pool-18:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD EvaluatorMessageDispatcher:Node-4-1490140315159:pool-22:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
		java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
		java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD globalEventExecutor-1-2 :: TIMED_WAITING, Alive, NOT interrupted, NOT daemon
		sun.misc.Unsafe.park(Native Method)
		java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
		java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
		java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
		io.netty.util.concurrent.GlobalEventExecutor.takeTask(GlobalEventExecutor.java:90)
		io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:334)
		io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD main :: RUNNABLE, Alive, NOT interrupted, NOT daemon
		java.lang.Thread.dumpThreads(Native Method)
		java.lang.Thread.getAllStackTraces(Thread.java:1607)
		org.apache.reef.util.ThreadLogger.getFormattedThreadList(ThreadLogger.java:79)
		org.apache.reef.util.ThreadLogger.logThreads(ThreadLogger.java:62)
		org.apache.reef.util.ThreadLogger.logThreads(ThreadLogger.java:45)
		org.apache.reef.runtime.common.REEFLauncher.main(REEFLauncher.java:186)
	TG main THREAD org.apache.reef.wake.remote.impl.RemoteSenderStage:pool-06:thread-001 :: WAITING,
Alive, NOT interrupted, NOT daemon
		java.lang.Object.wait(Native Method)
		java.lang.Object.wait(Object.java:502)
		io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:287)
		io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:135)
		io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:28)
		io.netty.util.concurrent.DefaultPromise.syncUninterruptibly(DefaultPromise.java:226)
		io.netty.channel.DefaultChannelPromise.syncUninterruptibly(DefaultChannelPromise.java:123)
		io.netty.channel.DefaultChannelPromise.syncUninterruptibly(DefaultChannelPromise.java:28)
		org.apache.reef.wake.remote.transport.netty.NettyMessagingTransport.open(NettyMessagingTransport.java:312)
		org.apache.reef.wake.remote.impl.ConnectCallable.call(RemoteSenderEventHandler.java:139)
		org.apache.reef.wake.remote.impl.ConnectCallable.call(RemoteSenderEventHandler.java:125)
		java.util.concurrent.FutureTask.run(FutureTask.java:266)
		java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
		java.util.concurrent.FutureTask.run(FutureTask.java:266)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD org.apache.reef.wake.remote.impl.RemoteSenderStage:pool-06:thread-002 :: WAITING,
Alive, NOT interrupted, NOT daemon
		java.lang.Object.wait(Native Method)
		java.lang.Object.wait(Object.java:502)
		org.apache.reef.wake.remote.transport.netty.NettyMessagingTransport.open(NettyMessagingTransport.java:294)
		org.apache.reef.wake.remote.impl.ConnectCallable.call(RemoteSenderEventHandler.java:139)
		org.apache.reef.wake.remote.impl.ConnectCallable.call(RemoteSenderEventHandler.java:125)
		java.util.concurrent.FutureTask.run(FutureTask.java:266)
		java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
		java.util.concurrent.FutureTask.run(FutureTask.java:266)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
	TG main THREAD org.apache.reef.wake.remote.impl.RemoteSenderStage:pool-06:thread-003 :: WAITING,
Alive, NOT interrupted, NOT daemon
		java.lang.Object.wait(Native Method)
		java.lang.Object.wait(Object.java:502)
		org.apache.reef.wake.remote.transport.netty.NettyMessagingTransport.open(NettyMessagingTransport.java:294)
		org.apache.reef.wake.remote.impl.ConnectCallable.call(RemoteSenderEventHandler.java:139)
		org.apache.reef.wake.remote.impl.ConnectCallable.call(RemoteSenderEventHandler.java:125)
		java.util.concurrent.FutureTask.run(FutureTask.java:266)
		java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
		java.util.concurrent.FutureTask.run(FutureTask.java:266)
		java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
		java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		java.lang.Thread.run(Thread.java:745)
{code}

> Intermittent failures of CloseEvaluatorTest in local runtime
> ------------------------------------------------------------
>
>                 Key: REEF-1753
>                 URL: https://issues.apache.org/jira/browse/REEF-1753
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF Driver, Wake
>            Reporter: Sergiy Matusevych
>            Assignee: Tae-Geon Um
>              Labels: shutdown
>
> {{mvn test}} reports:
> {code:none}
> Failed tests:
>   CloseEvaluatorTest.testCloseEvaluator:61 Job state after execution: FORCE_CLOSED
> {code}
> and in the driver logs I see the following:
> {code:none|title=faulty driver.stderr}
> Mar 21, 2017 2:06:51 PM org.apache.reef.runtime.common.driver.DriverStatusManager onComplete
> INFO: Clean shutdown of the Driver.
> Mar 21, 2017 2:06:52 PM org.apache.reef.wake.remote.transport.netty.NettyMessagingTransport
open
> WARNING: Connection refused. Retry 1 of 3
> Mar 21, 2017 2:06:52 PM org.apache.reef.wake.remote.transport.netty.NettyMessagingTransport
open
> WARNING: Connection refused. Retry 1 of 3
> Mar 21, 2017 2:07:01 PM org.apache.reef.wake.remote.impl.RemoteSenderStage close
> WARNING: Executor did not terminate in 10,000 ms.
> Mar 21, 2017 2:07:01 PM org.apache.reef.wake.remote.transport.netty.NettyMessagingTransport
open
> WARNING: Thread Thread[org.apache.reef.wake.remote.impl.RemoteSenderStage:pool-06:thread-003,5,main]
interrupted while sleeping
> Mar 21, 2017 2:07:01 PM org.apache.reef.wake.remote.transport.netty.NettyMessagingTransport
open
> WARNING: Thread Thread[org.apache.reef.wake.remote.impl.RemoteSenderStage:pool-06:thread-001,5,main]
interrupted while sleeping
> Mar 21, 2017 2:07:01 PM org.apache.reef.wake.remote.impl.RemoteSenderStage close
> WARNING: Executor dropped 0 tasks.
> Mar 21, 2017 2:07:04 PM org.apache.reef.runtime.common.driver.DriverRuntimeStopHandler
onNext
> INFO: Driver shutdown complete
> {code}
> Instead of
> {code:none|title=normal driver.stderr}
> Mar 21, 2017 3:18:46 PM org.apache.reef.runtime.common.driver.DriverStatusManager onComplete
> INFO: Clean shutdown of the Driver.
> Mar 21, 2017 3:18:48 PM org.apache.reef.runtime.common.driver.DriverRuntimeStopHandler
onNext
> INFO: Driver shutdown complete
> {code}
> *P.S.* Unfortunately, I am unable to reproduce the error with more granular logging level.
If someone can do that, please post your Driver's log here



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message