hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
Date Thu, 02 Jul 2015 00:26:05 GMT

     [ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Varun Saxena updated YARN-3878:
-------------------------------
    Description: 
The sequence of events is as under :
# RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This
leads to an Interrupted Exception being thrown.
# As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}},
we will check if all events have been drained and wait for event queue to drain(as RM State
Store dispatcher is configured for queue to drain on stop). 
# This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher
event queue to drain till JVM exits.

*Initial exception while posting RM State store event to queue*
{noformat}
2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452))
- Service: Dispatcher entered state STOPPED
2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247))
- AsyncDispatcher thread interrupted
java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
	at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
	at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
	at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
{noformat}

*JStack of AsyncDispatcher hanging on stop*
{noformat}
"AsyncDispatcher event handler" prio=10 tid=0x00007fb980222800 nid=0x4b1e waiting on condition
[0x00007fb9654e9000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000700b79250> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
        at java.lang.Thread.run(Thread.java:744)

"main" prio=10 tid=0x00007fb98000a800 nid=0x49c3 in Object.wait() [0x00007fb989851000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x0000000700b79430> (a java.lang.Object)
	at org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:156)
	- locked <0x0000000700b79430> (a java.lang.Object)
	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
	- locked <0x0000000700b79420> (a java.lang.Object)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:515)
	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
	- locked <0x0000000700b79630> (a java.lang.Object)
	at org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:599)
{noformat}

We keep on getting below logs
{noformat}
2015-06-27 20:08:35,926 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(140))
- AsyncDispatcher is draining to stop, igonring any new events.
2015-06-27 20:08:36,926 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2015-06-27 20:08:37,927 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2015-06-27 20:08:38,927 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2015-06-27 20:08:39,928 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2015-06-27 20:08:40,929 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2015-06-27 20:08:41,929 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2015-06-27 20:08:42,930 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2015-06-27 20:08:43,930 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2015-06-27 20:08:44,931 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2015-06-27 20:08:45,931 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2015-06-27 20:08:46,932 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
{noformat}

  was:
The sequence of events is as under :
# RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This
leads to an Interrupted Exception being thrown.
# As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}},
we will check if all events have been drained and wait for event queue to drain(as RM State
Store dispatcher is configured for queue to drain on stop). 
# This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher
event queue to drain till JVM exits.

*Initial exception while posting RM State store event to queue*
{noformat}
2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452))
- Service: Dispatcher entered state STOPPED
2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247))
- AsyncDispatcher thread interrupted
java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
	at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
	at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
	at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
{noformat}

*JStack of AsyncDispatcher hanging on stop*
{noformat}
"AsyncDispatcher event handler" prio=10 tid=0x00007fb980222800 nid=0x4b1e waiting on condition
[0x00007fb9654e9000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000700b79250> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
        at java.lang.Thread.run(Thread.java:744)

"main" prio=10 tid=0x00007fb98000a800 nid=0x49c3 in Object.wait() [0x00007fb989851000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x0000000700b79430> (a java.lang.Object)
	at org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:156)
	- locked <0x0000000700b79430> (a java.lang.Object)
	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
	- locked <0x0000000700b79420> (a java.lang.Object)
	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:515)
	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
	- locked <0x0000000700b79630> (a java.lang.Object)
	at org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:599)
{noformat}




> AsyncDispatcher can hang while stopping if it is configured for draining events on stop
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3878
>                 URL: https://issues.apache.org/jira/browse/YARN-3878
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>            Priority: Critical
>
> The sequence of events is as under :
> # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher.
This leads to an Interrupted Exception being thrown.
> # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}},
we will check if all events have been drained and wait for event queue to drain(as RM State
Store dispatcher is configured for queue to drain on stop). 
> # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly
for dispatcher event queue to drain till JVM exits.
> *Initial exception while posting RM State store event to queue*
> {noformat}
> 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452))
- Service: Dispatcher entered state STOPPED
> 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247))
- AsyncDispatcher thread interrupted
> java.lang.InterruptedException
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
> 	at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
> 	at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
> 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
> {noformat}
> *JStack of AsyncDispatcher hanging on stop*
> {noformat}
> "AsyncDispatcher event handler" prio=10 tid=0x00007fb980222800 nid=0x4b1e waiting on
condition [0x00007fb9654e9000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x0000000700b79250> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>         at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
>         at java.lang.Thread.run(Thread.java:744)
> "main" prio=10 tid=0x00007fb98000a800 nid=0x49c3 in Object.wait() [0x00007fb989851000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x0000000700b79430> (a java.lang.Object)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:156)
> 	- locked <0x0000000700b79430> (a java.lang.Object)
> 	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> 	- locked <0x0000000700b79420> (a java.lang.Object)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:515)
> 	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> 	- locked <0x0000000700b79630> (a java.lang.Object)
> 	at org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:599)
> {noformat}
> We keep on getting below logs
> {noformat}
> 2015-06-27 20:08:35,926 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(140))
- AsyncDispatcher is draining to stop, igonring any new events.
> 2015-06-27 20:08:36,926 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
> 2015-06-27 20:08:37,927 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
> 2015-06-27 20:08:38,927 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
> 2015-06-27 20:08:39,928 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
> 2015-06-27 20:08:40,929 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
> 2015-06-27 20:08:41,929 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
> 2015-06-27 20:08:42,930 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
> 2015-06-27 20:08:43,930 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
> 2015-06-27 20:08:44,931 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
> 2015-06-27 20:08:45,931 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
> 2015-06-27 20:08:46,932 INFO  [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144))
- Waiting for AsyncDispatcher to drain. Thread state is :WAITING
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message